Re: [patch] timer-irq-driven soft-watchdog, cleanups

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Bartlomiej Zolnierkiewicz wrote:

>On 2/17/06, Ingo Molnar <[email protected]> wrote:
>  
>
>>* Bartlomiej Zolnierkiewicz <[email protected]> wrote:
>>
>>    
>>
>>>Sorry but I have enough more high priority issues to take care of and
>>>I'm not going to spend any more time on soft lockups even if they are
>>>really problems in IDE subsystem.  If this is not fixed before 2.6.16
>>>I'm submitting patch to Linus making DETECT_SOFTLOCKUP depend on
>>>"CONFIG_IDE=n"... at least users will be able to use their systems
>>>instead of seeing lockups.
>>>      
>>>
>>i have lots of IDE based systems (they dont use PIO though) and i'm not
>>seeing these. I'll oppose such a patch if it's to hide genuine issues -
>>the 10 seconds tolerance is already generous i think. I'll of course fix
>>any false positives which are the fault of the softlockup-watchdog, but
>>from your mails it appears to me that the IDE warnings are indeed
>>genuine.
>>
>>If the source of the delay is hard to fix you can temporarily work it
>>around in the code by putting in the touch_softlockup_watchdog() lines -
>>that will also document the places that cause long delays - which is a
>>good thing.
>>
>>It is entirely feasible to put a touch_softlockup_watchdog() call into
>>every PIO OP - even a single-byte PIO related IN/OUT instruction takes a
>>couple of microseconds, so a touch_softlockup_watchdog() wont even show
>>up on the radar.
>>    
>>
>
>OK, I'll just add touch_softlockup_watchdog() if needed but first lets
>wait for results of your patch.
>
>Note that I'll invest my time on this which could be invested into other
>things and I don't see it as top-priority issue if you differ in your opinion
>you should be the person fixing affected drivers.
>
>The conclusion of the rant is:  people making changes at higher layers
>should start paying maintenance costs of their changes.  Over few years
>of maintaining IDE I learned quite a lot about block layer, VFS, VM, ACPI,
>PM, IRQ routing, scheduling, sysfs etc (I'm not talking about interface
>changes but about bugs/changes which are reported by end users
>and driver maintainers are end-point).  This is all good but distracts me
>from my primary task and now it is turn for people hacking on generic
>code to learn few driver specific things... :)
>
>No wonder that nobody wants to hack drivers: less fame, more flames,
>and actually besides knowing hardware you need to know a lot about
>kernel in general to do your job right.  I hope that Andrew is reading this.
>
>End of whining.
>
>  
>
>>>DETECT_SOFTLOCKUP should be an aim in development not a method of
>>>forcing driver maintainers to work on specific issues...
>>>      
>>>
>>well, 10+ seconds delays on a running system are not really acceptable,
>>and can cause other problems. The softlockup-watchdog is optional and
>>can be easily turned off in the .config.
>>    
>>
>
>It is "y" by default so anybody saying "y" to DEBUG_KERNEL will get it as
>added bonus and moreover DEBUG_KERNEL is "y" in x86_64 defconfig.
>
>Bartlomiej
>  
>
Guys,
sorry for having been silent for so long ... since i had the problem, i
upgraded to 2.6.15.6 and i experienced the problem again over the week-end.
My point it that it appears that the BUG: soft lockup detected on CPU#0!
message is legitimate in my case since my machine really freezes
periodically because of ide timeouts.
What i'm trying to figure out is why are the IDE timeouts occuring ?
Both my DVD burner & CD burner usually work fine ( i had been able to
burn a DVD a few hours before the problem occured , i'm able to read CDs
& DVDs fine ).
However, the problem seems to occur only when trying to rip CDs. It
appears that both drives get confused .... ( see logs, both are
reporting timeouts/erros ).
Mar 11 17:41:47 ruault kernel: hdd: ATAPI reset
complete                                                                   
Mar 11 17:41:57 ruault kernel: hdc: irq timeout: status=0x80 { Busy }
Mar 11 17:41:57 ruault kernel: ide: failed opcode was:
unknown                                                             
Mar 11 17:41:57 ruault kernel: hdd: status timeout: status=0x80 { Busy }
Mar 11 17:41:57 ruault kernel: ide: failed opcode was:
unknown                                                             
Mar 11 17:41:57 ruault kernel: hdd: drive not ready for command
Mar 11 17:41:57 ruault kernel: hdd: ATAPI reset
complete                                                                   
Mar 11 17:42:17 ruault kernel: hdc: irq timeout: status=0x80 { Busy }
Mar 11 17:42:17 ruault kernel: ide: failed opcode was:
unknown                                                             
Mar 11 17:42:17 ruault kernel: hdd: status timeout: status=0x80 { Busy }
Mar 11 17:42:17 ruault kernel: ide: failed opcode was:
unknown                                                             
Mar 11 17:42:17 ruault kernel: hdd: drive not ready for command
Mar 11 17:42:17 ruault kernel: BUG: soft lockup detected on
CPU#0!                                                         
Mar 11 17:42:17 ruault kernel:
Mar 11 17:42:17 ruault kernel: Pid: 0, comm:             
swapper                                                          
Mar 11 17:42:17 ruault kernel: EIP: 0060:[<c0272c55>] CPU: 0
Mar 11 17:42:17 ruault kernel: EIP is at
ide_inb+0x5/0x10                                                                  

Mar 11 17:42:17 ruault kernel:  EFLAGS: 00000206    Tainted: P      
(2.6.15.6)
Mar 11 17:42:17 ruault kernel: EAX: 00000180 EBX: 03aba395 ECX: f2b9d721
EDX: 00000177                                     
Mar 11 17:42:17 ruault kernel: ESI: 00000088 EDI: c0419e30 EBP: c0419ec4
DS: 007b ES: 007b
Mar 11 17:42:17 ruault kernel: CR0: 8005003b CR2: b7fe2000 CR3: 32e1b000
CR4: 000006d0

And i can confirm that when this happens the machine is totally
unresponsive, as before i had to reboot the machine with a hard reset
after a while.
So my question is what could be causing this behaviour ? is this a bug
in the IDE driver ? a hardware problem with my drives ( as i said they
appear to be working fine otherwise ), another hardware problem with my
motherboard ?
What could i do to help troubleshoot the problem ?
Thanks in advance.

PS: I do not subscribe to the list so please CC me when replying ....

-- 
Charles-Edouard Ruault
GPG key Id E4D2B80C

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux