Re: 2.6.22-rc2-mm1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 23 May 2007, Andrew Morton wrote:

> > > > This is intermittently getting resume-from-RAM failures.  It is not
> > > > sufficiently repeatable to be able to bisect.
> > > > 
> > > > [ 1381.119362] PM: Preparing system for mem sleep
> > > > [ 2331.798452] Stopping tasks ... 
> > > > [ 2351.760431] Stopping kernel threads timed out after 20 seconds (2 tasks refusing to freeze):
> > > > [ 2351.762385]  ksuspend_usbd
> > > > [ 2351.764374]  khubd
> > > > [ 2351.766338] Restarting tasks ... done.
> > > 
> > > Hmm, that seems to be related to usb-fix-suspend-to-ram.patch (probably one of
> > > the threads is waiting for a completion by some other thread that has been
> > > frozen already).
> > 
> > Is it possible to get an Alt-SysRq-T stack trace during those 20 
> > seconds?  Knowing what those threads are waiting for would be a big 
> > help.

> The trace is at http://userweb.kernel.org/~akpm/tasks.txt.  Interesting
> bits are
> 
> [  144.201264] khubd         D 00400005     0   160      2 (L-TLB)
> [  144.204358]        c207fe78 00000046 90399a85 00400005 00000246 c207fe60 c25b0cc4 c206f4cc 
> [  144.204539]        00000286 00000000 769e4cea 0040000a 90399a85 00400005 c32713c0 c207fed4 
> [  144.207754]        00000001 c207fe94 c207febc c02e8e1b 00000000 00000000 00000000 00000000 
> [  144.210934] Call Trace:
> [  144.217012]  [<c02e8e1b>] wait_for_completion+0x68/0x91
> [  144.220090]  [<c011824f>] default_wake_function+0x0/0x9
> [  144.223158]  [<c0127a41>] flush_cpu_workqueue+0x4d/0x55
> [  144.226223]  [<c0127a69>] wq_barrier_func+0x0/0x8
> [  144.229269]  [<c026343d>] usb_release_dev+0x28/0x63
> [  144.232340]  [<c0233011>] device_release+0x37/0x7c
> [  144.235431]  [<c01cb6c7>] kobject_cleanup+0x3d/0x54
> [  144.238520]  [<c01cb6de>] kobject_release+0x0/0x8
> [  144.241631]  [<c01cc2a7>] kref_put+0x75/0x82
> [  144.244699]  [<c0265482>] hub_thread+0x376/0xa74
> [  144.247768]  [<c01180c2>] pick_next_task_fair+0xf2/0x12a
> [  144.250815]  [<c0116af1>] __wake_up_common+0x31/0x4f
> [  144.253864]  [<c012a259>] autoremove_wake_function+0x0/0x35
> [  144.256902]  [<c026510c>] hub_thread+0x0/0xa74
> [  144.259944]  [<c012a102>] kthread+0x36/0x5c
> [  144.262891]  [<c012a0cc>] kthread+0x0/0x5c
> [  144.265757]  [<c010464b>] kernel_thread_helper+0x7/0x10
> [  144.268716]  =======================
> 
> 
> [  144.137704] ksuspend_usbd D 00400005     0   157      2 (L-TLB)
> [  144.140830]        c2085f18 00000046 9072767a 00400005 c20626f0 c010449b c3182118 c206288c 
> [  144.141011]        c3182120 c3182120 76d728df 0040000a 9072767a 00400005 c3271200 c3182118 
> [  144.144263]        c3182120 00000246 c20626f0 c02ea1c9 00000000 00000000 00000000 00000000 
> [  144.147576] Call Trace:
> [  144.153929]  [<c010449b>] common_interrupt+0x23/0x28
> [  144.157245]  [<c02ea1c9>] __down+0xba/0xc6
> [  144.160528]  [<c011824f>] default_wake_function+0x0/0x9
> [  144.163832]  [<c02664fc>] hcd_resume_work+0x0/0x43
> [  144.167126]  [<c02e9fd3>] __down_failed+0x7/0xc
> [  144.170372]  [<c0266518>] hcd_resume_work+0x1c/0x43
> [  144.173603]  [<c01278cf>] run_workqueue+0x6d/0xdf
> [  144.176780]  [<c0127b4c>] worker_thread+0x0/0xd0
> [  144.179885]  [<c0127b4c>] worker_thread+0x0/0xd0
> [  144.182930]  [<c0127c12>] worker_thread+0xc6/0xd0
> [  144.185964]  [<c012a259>] autoremove_wake_function+0x0/0x35
> [  144.189056]  [<c012a102>] kthread+0x36/0x5c
> [  144.192118]  [<c012a0cc>] kthread+0x0/0x5c
> [  144.195153]  [<c010464b>] kernel_thread_helper+0x7/0x10


Okay, it's clear that the two threads are in deadlock.  It's not clear 
how the deadlock arose to begin with -- apparently there was a remote 
wakeup request for a root hub at the same time as a device below that 
root hub was disconnected, which doesn't make much sense.

Anyway, this looks like a good place to use cancel_work_sync().  The 
patch below is highly untested, so Andrew, you're the guinea pig.  :-)
If it seems to help, I'll submit it with a proper Changelog entry.

Alan Stern


Index: usb-2.6/drivers/usb/core/hub.c
===================================================================
--- usb-2.6.orig/drivers/usb/core/hub.c
+++ usb-2.6/drivers/usb/core/hub.c
@@ -1294,6 +1294,7 @@ void usb_disconnect(struct usb_device **
 	*pdev = NULL;
 	spin_unlock_irq(&device_state_lock);
 
+#ifdef	CONFIG_USB_SUSPEND
 	/* Synchronize with the ksuspend thread to prevent any more
 	 * autosuspend requests from being submitted, and decrement
 	 * the parent's count of unsuspended children.
@@ -1303,6 +1304,10 @@ void usb_disconnect(struct usb_device **
 		usb_autosuspend_device(udev->parent);
 	usb_pm_unlock(udev);
 
+	cancel_delayed_work(&udev->autosuspend);
+	cancel_work_sync(&udev->autosuspend.work);
+#endif
+
 	put_device(&udev->dev);
 }
 
Index: usb-2.6/drivers/usb/core/usb.c
===================================================================
--- usb-2.6.orig/drivers/usb/core/usb.c
+++ usb-2.6/drivers/usb/core/usb.c
@@ -184,10 +184,6 @@ static void usb_release_dev(struct devic
 
 	udev = to_usb_device(dev);
 
-#ifdef	CONFIG_USB_SUSPEND
-	cancel_delayed_work(&udev->autosuspend);
-	flush_workqueue(ksuspend_usb_wq);
-#endif
 	usb_destroy_configuration(udev);
 	usb_put_hcd(bus_to_hcd(udev->bus));
 	kfree(udev->product);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux