Re: [linux-usb-devel] Re: 2.6.14-rc1 load average calculation broken?

Alan Stern wrote:
> On Fri, 16 Sep 2005, Jan Dittmer wrote:
> 
> 
>>>Can you post a stack dump for those two threads?  Normally they are idle,
>>>in an interruptible wait, so they shouldn't be in D state.  Since they
>>>are, maybe there's some sort of error recovery attempt going on.  Like
>>>hald doing its periodic checking of hotpluggable storage devices while
>>>your monitor is off.
>>
>>They don't appear in lsusb or /proc/scsi/scsi anymore, so I don't know what
>>you mean.
>>
>>[4327082.342000] usb-storage   D 000F4261     0  3308      1          4671
>>2487 (L-TLB)
>>[4327082.342000] ded95f0c c9185a70 c04ae888 000f4261 00000010 ded94000
>>00000000 cd393840
>>[4327082.342000]        000f4261 dfabcb98 dfabca70 ce5b2300 000f4261 ded94000
>>09c67100 00000000
>>[4327082.342000]        c0410b24 00000286 c0410b2c dfabca70 c03adb5d 00000001
>>dfabca70 c011a2f0
>>[4327082.342000] Call Trace:
>>[4327082.342000]  [<c03adb5d>] __down+0xdd/0x140
>>[4327082.342000]  [<c011a2f0>] default_wake_function+0x0/0x20
>>[4327082.342000]  [<c03ac35f>] __down_failed+0x7/0xc
>>[4327082.342000]  [<c02c5050>] scsi_host_dev_release+0x0/0x90
>>[4327082.342000]  [<c0134dd4>] .text.lock.kthread+0xb/0x27
>>[4327082.342000]  [<c02c5087>] scsi_host_dev_release+0x37/0x90
>>[4327082.342000]  [<c020a4be>] kobject_cleanup+0x4e/0xa0
>>[4327082.342000]  [<c020a510>] kobject_release+0x0/0x10
>>[4327082.342000]  [<c020af3f>] kref_put+0x2f/0x80
>>[4327082.342000]  [<c020a53e>] kobject_put+0x1e/0x30
>>[4327082.342000]  [<c020a510>] kobject_release+0x0/0x10
>>[4327082.342000]  [<e0869288>] usb_stor_control_thread+0x68/0x240 [usb_storage]
>>[4327082.342000]  [<c010322e>] ret_from_fork+0x6/0x14
>>[4327082.342000]  [<e0869220>] usb_stor_control_thread+0x0/0x240 [usb_storage]
>>[4327082.342000]  [<e0869220>] usb_stor_control_thread+0x0/0x240 [usb_storage]
>>[4327082.342000]  [<c01013ad>] kernel_thread_helper+0x5/0x18
> 
> 
> I recognize the problem.  This experimental patch should fix it:

It fixes the described problem but introduces some others:

[4294822.152000] ACPI: PCI Interrupt 0000:00:0d.1[A] -> GSI 17 (level, low) ->
IRQ 18
[4294822.781000] libata version 1.12 loaded.
[4294822.795000] sata_via version 1.1
[4294822.795000] ACPI: PCI Interrupt 0000:00:0f.0[B] -> GSI 20 (level, low) ->
IRQ 17
[4294822.796000] PCI: Via IRQ fixup for 0000:00:0f.0, from 10 to 1
[4294822.798000] sata_via(0000:00:0f.0): routed to hard irq line 1
[4294822.800000] Unable to handle kernel NULL pointer dereference at virtual
address 00000048
[4294822.801000]  printing eip:
[4294822.803000] e5c4d384
[4294822.805000] *pde = 00000000
[4294822.806000] Oops: 0000 [#1]
[4294822.806000] PREEMPT
[4294822.806000] Modules linked in: sata_via libata snd_bt87x pl2303 usbserial
usblp usbhid snd_via82xx snd_mpu401_uart w83627hf w83781d i2c_viapro tun vfat
fat loop via_agp intel_agp agpgart lp parport_pc parport tuner tvaudio msp3400
bttv video_buf firmware_class v4l2_common btcx_risc tveeprom videodev eeprom
asb100 hwmon_vid hwmon i2c_dev i2c_isa i2c_i801 snd_emu10k1_synth
snd_emux_synth snd_seq_virmidi snd_seq_midi_emul snd_seq_dummy snd_seq_oss
snd_seq_midi snd_seq_midi_event snd_seq snd_emu10k1 snd_rawmidi snd_seq_device
snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_ac97_bus
snd_page_alloc snd_util_mem snd_hwdep snd soundcore usb_storage ehci_hcd
uhci_hcd usbcore button processor ac e100 rtc
[4294822.806000] CPU:    0
[4294822.806000] EIP:    0060:[<e5c4d384>]    Not tainted VLI
[4294822.806000] EFLAGS: 00010282   (2.6.14-rc1-git1-via)
[4294822.806000] EIP is at ata_scsi_error+0x14/0x30 [libata]
[4294822.806000] eax: dfabb254   ebx: dfabb000   ecx: df8de000   edx: 00000000
[4294822.806000] esi: deb82000   edi: dfabb000   ebp: c02c8310   esp: deb83fa8
[4294822.806000] ds: 007b   es: 007b   ss: 0068
[4294822.806000] Process scsi_eh_5 (pid: 6417, threadinfo=deb82000 task=df5fda70)
[4294822.806000] Stack: dfabb254 dfabb000 c02c836d dfabb000 fffffffc df8dfde0
c0134bb6 dfabb000
[4294822.806000]        deb83fd0 00000000 ffffffff ffffffff c0134b00 00000000
00000000 00000000
[4294822.806000]        c01013ad df8dfde0 00000000 00000000 00000000 00000000
[4294822.806000] Call Trace:
[4294822.806000]  [<c02c836d>] scsi_error_handler+0x5d/0xa0
[4294822.806000]  [<c0134bb6>] kthread+0xb6/0xc0
[4294822.806000]  [<c0134b00>] kthread+0x0/0xc0
[4294822.806000]  [<c01013ad>] kernel_thread_helper+0x5/0x18
[4294822.806000] Code: 83 bd 63 da 83 c4 08 31 c0 5b c3 8d b6 00 00 00 00 8d
bf 00 00 00 00 53 83 ec 04 8b 5c 24 0c 8d 83 54 02 00 00 8b 50 04 89 04 24
<ff> 52 48 8d 43 38 ff 4b 60 89 43 38 89 43 3c 83 c4 04 5b 31 c0
[4294822.806000]  <6>ata1: SATA max UDMA/133 cmd 0xEC00 ctl 0xE802 bmdma
0xDC00 irq 17
[4294822.845000] Unable to handle kernel NULL pointer dereference at virtual
address 00000048
[4294822.846000]  printing eip:
[4294822.848000] e5c4d384
[4294822.850000] *pde = 00000000
[4294822.851000] Oops: 0000 [#2]
[4294822.851000] PREEMPT
[4294822.851000] Modules linked in: sata_via libata snd_bt87x pl2303 usbserial
usblp usbhid snd_via82xx snd_mpu401_uart w83627hf w83781d i2c_viapro tun vfat
fat loop via_agp intel_agp agpgart lp parport_pc parport tuner tvaudio msp3400
bttv video_buf firmware_class v4l2_common btcx_risc tveeprom videodev eeprom
asb100 hwmon_vid hwmon i2c_dev i2c_isa i2c_i801 snd_emu10k1_synth
snd_emux_synth snd_seq_virmidi snd_seq_midi_emul snd_seq_dummy snd_seq_oss
snd_seq_midi snd_seq_midi_event snd_seq snd_emu10k1 snd_rawmidi snd_seq_device
snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_ac97_bus
snd_page_alloc snd_util_mem snd_hwdep snd soundcore usb_storage ehci_hcd
uhci_hcd usbcore button processor ac e100 rtc
[4294822.851000] CPU:    0
[4294822.851000] EIP:    0060:[<e5c4d384>]    Not tainted VLI
[4294822.851000] EFLAGS: 00010282   (2.6.14-rc1-git1-via)
[4294822.851000] EIP is at ata_scsi_error+0x14/0x30 [libata]
[4294822.851000] eax: df1c6254   ebx: df1c6000   ecx: df8de000   edx: 00000000
[4294822.851000] esi: deb82000   edi: df1c6000   ebp: c02c8310   esp: deb83fa8
[4294822.851000] ds: 007b   es: 007b   ss: 0068
[4294822.851000] Process scsi_eh_6 (pid: 6418, threadinfo=deb82000 task=df5fda70)
[4294822.851000] Stack: df1c6254 df1c6000 c02c836d df1c6000 fffffffc df8dfde0
c0134bb6 df1c6000
[4294822.851000]        deb83fd0 00000000 ffffffff ffffffff c0134b00 00000000
00000000 00000000
[4294822.851000]        c01013ad df8dfde0 00000000 00000000 00000000 00000000
[4294822.851000] Call Trace:
[4294822.851000]  [<c02c836d>] scsi_error_handler+0x5d/0xa0
[4294822.851000]  [<c0134bb6>] kthread+0xb6/0xc0
[4294822.851000]  [<c0134b00>] kthread+0x0/0xc0
[4294822.851000]  [<c01013ad>] kernel_thread_helper+0x5/0x18
[4294822.851000] Code: 83 bd 63 da 83 c4 08 31 c0 5b c3 8d b6 00 00 00 00 8d
bf 00 00 00 00 53 83 ec 04 8b 5c 24 0c 8d 83 54 02 00 00 8b 50 04 89 04 24
<ff> 52 48 8d 43 38 ff 4b 60 89 43 38 89 43 3c 83 c4 04 5b 31 c0
[4294822.851000]  <6>ata2: SATA max UDMA/133 cmd 0xE400 ctl 0xE002 bmdma
0xDC08 irq 17
[4294823.096000] ata1: no device found (phy stat 00000000)
[4294823.098000] scsi5 : sata_via
[4294823.302000] ata2: no device found (phy stat 00000000)
[4294823.304000] scsi6 : sata_via

I attached your message for reference

-- 
Jan

James:

This patch (as561) fixes the error handler's thread-exit code.  The 
kthread_stop call won't wake the thread from a down_interruptible, so 
the patch gets rid of the semaphore and simply does

	set_current_state(TASK_INTERRUPTIBLE);

Can you please check that the

	if (shost->host_busy == shost->host_failed ...

condition for waking up is really correct?  Thanks.

Alan Stern



Signed-off-by: Alan Stern <stern@rowland.harvard.edu>

Index: usb-2.6/drivers/scsi/scsi_error.c
===================================================================
--- usb-2.6.orig/drivers/scsi/scsi_error.c
+++ usb-2.6/drivers/scsi/scsi_error.c
@@ -50,7 +50,7 @@
 void scsi_eh_wakeup(struct Scsi_Host *shost)
 {
 	if (shost->host_busy == shost->host_failed) {
-		up(shost->eh_wait);
+		wake_up_process(shost->ehandler);
 		SCSI_LOG_ERROR_RECOVERY(5,
 				printk("Waking error handler thread\n"));
 	}
@@ -69,9 +69,6 @@ int scsi_eh_scmd_add(struct scsi_cmnd *s
 	struct Scsi_Host *shost = scmd->device->host;
 	unsigned long flags;
 
-	if (shost->eh_wait == NULL)
-		return 0;
-
 	spin_lock_irqsave(shost->host_lock, flags);
 
 	scmd->eh_eflags |= eh_flag;
@@ -1582,37 +1579,33 @@ int scsi_error_handler(void *data)
 {
 	struct Scsi_Host *shost = (struct Scsi_Host *) data;
 	int rtn;
-	DECLARE_MUTEX_LOCKED(sem);
 
 	current->flags |= PF_NOFREEZE;
-	shost->eh_wait = &sem;
-
-	/*
-	 * Wake up the thread that created us.
-	 */
-	SCSI_LOG_ERROR_RECOVERY(3, printk("Wake up parent of"
-					  " scsi_eh_%d\n",shost->host_no));
 
 	while (1) {
-		/*
-		 * If we get a signal, it means we are supposed to go
-		 * away and die.  This typically happens if the user is
-		 * trying to unload a module.
-		 */
 		SCSI_LOG_ERROR_RECOVERY(1, printk("Error handler"
 						  " scsi_eh_%d"
 						  " sleeping\n",shost->host_no));
 
 		/*
-		 * Note - we always use down_interruptible with the semaphore
-		 * even if the module was loaded as part of the kernel.  The
-		 * reason is that down() will cause this thread to be counted
-		 * in the load average as a running process, and down
-		 * interruptible doesn't.  Given that we need to allow this
-		 * thread to die if the driver was loaded as a module, using
-		 * semaphores isn't unreasonable.
-		 */
-		down_interruptible(&sem);
+		 * Note - we always use TASK_INTERRUPTIBLE even if the
+		 * module was loaded as part of the kernel.  The reason
+		 * is that UNINTERRUPTIBLE would cause this thread to be
+		 * counted in the load average as a running process, and
+		 * an interruptible wait doesn't.
+		 */
+		for (;;) {
+			set_current_state(TASK_INTERRUPTIBLE);
+			if (shost->host_busy == shost->host_failed ||
+					kthread_should_stop()) {
+				set_current_state(TASK_RUNNING);
+				break;
+			}
+
+			/* Even though we are INTERRUPTIBLE, ignore signals */
+			flush_signals(current);
+			schedule();
+		}
 		if (kthread_should_stop())
 			break;
 
@@ -1648,10 +1641,6 @@ int scsi_error_handler(void *data)
 	SCSI_LOG_ERROR_RECOVERY(1, printk("Error handler scsi_eh_%d"
 					  " exiting\n",shost->host_no));
 
-	/*
-	 * Make sure that nobody tries to wake us up again.
-	 */
-	shost->eh_wait = NULL;
 	return 0;
 }
 
Index: usb-2.6/include/scsi/scsi_host.h
===================================================================
--- usb-2.6.orig/include/scsi/scsi_host.h
+++ usb-2.6/include/scsi/scsi_host.h
@@ -465,8 +465,6 @@ struct Scsi_Host {
 
 	struct list_head	eh_cmd_q;
 	struct task_struct    * ehandler;  /* Error recovery thread. */
-	struct semaphore      * eh_wait;   /* The error recovery thread waits
-					      on this. */
 	struct semaphore      * eh_action; /* Wait for specific actions on the
                                           host. */
 	unsigned int            eh_active:1; /* Indicates the eh thread is awake and active if

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

References:
- Re: [linux-usb-devel] Re: 2.6.14-rc1 load average calculation broken?
  - From: Alan Stern <stern@rowland.harvard.edu>

Prev by Date: Re: I request inclusion of reiser4 in the mainline kernel
Next by Date: Re: best way to access device driver functions
Previous by thread: Re: [linux-usb-devel] Re: 2.6.14-rc1 load average calculation broken?
Next by thread: Re: [linux-usb-devel] Re: 2.6.14-rc1 load average calculation broken?
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind]