Re: [patch] Real-Time Preemption, -RT-2.6.12-rc6-V0.7.48-00

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 27 Jun 2005, Ingo Molnar wrote:

> > I still haven't been able to get any NMI watchdog traces with the 
> > lockups induced by VLC and burnP6.  Early printk is enabled on the 
> > serial console. I have noticed, however, that scheduling performance 
> > slowly degrades during the ~1 minute before locking up.  Once I was 
> > able to get a delayed SysRq response (~30s) from the serial console 
> > after the X console became unresponsive.  Is this possibly a scheduler 
> > starvation issue that affects everything, including the NMI watchdog?  
> > Any more suggestions for catching a trace?
> 
> hm. Nothing should starve the NMI watchdog. Only a nasty, complete 
> kernel crash or a hardware failure can lock up the box in a way that not 
> even the NMI watchdog can get some message out to the console. Just in 
> case, could you try the latest patch (-50-24 or later), there were a 
> couple of bugs fixed.

OK.  Running on -50-25 now.  The burnP6 starvation doesn't seem to affect
the whole system, but comes close enough to require the reset button every
time.  I usually, but not always, lose network, X, the keyboard, mouse,
and serial console.  I'm still unable to get any sort of a trace from
these lockups, since it's looking more like a bunch of processes starving 
than a kernel crash or a full lockup.

Once, with VLC (viewing a 5mbit/s mcast/udp stream) and two burnP6
instances running, I was able to fire up top on the serial console and
found out that the IRQ thread for the ns83820 nic was using 99% of one
CPU.

Once, two burnP6 instances made the system unresponsive without running 
VLC.  Everything froze as soon as I fired the second one up.  This may be 
a good one to try to reproduce on an HT machine, since it doesn't require 
VLC and a multicast video stream to make the system unresponsive.

Once, with a normal desktop load and a yum update, this came across on the 
serial console:

cat/2100[CPU#1]: BUG in update_out_trace at kernel/latency.c:587
 [<c01041b3>] dump_stack+0x23/0x30 (20)
 [<c0122e0b>] __WARN_ON+0x6b/0x90 (52)
 [<c0140530>] update_out_trace+0x3d0/0x440 (104)
 [<c0140881>] l_stax2e1/0x310 (72)
 [<c0191eab>] seq_read+0x7b/0x2d0 (60)
 [<c016ecd4>] vfs_read+0xd4/0x140 (36)
 [<c016efb0>] sys_read+0x50/0x80 (44)
 [<c01031c0>] sysenter_past_esp+0x61/0x89 (-8116)
---------------------------
| preempt count: 00000002 ]
| 2-level deep critical section nesting:
----------------------------------------
.. [<c038dd9f>] .... _raw_spin_lock_irqsave+0x1f/0xb0
.....[<c0122dc7>] ..   ( <= __WARN_ON+0x27/0x90)
.. [<c014280d>] .... print_traces+0x1d/0x60
.....[<c01040b2>] ..   ( <= show_trace+0xc2/0xf0)

------------------------------
| showing all locks held by: |  (cat/2100 [deca9320, 117]):
------------------------------

#001:             [d9ac47a4] {(struct semaphore *)(&p->sem)}
... acquired at:  seq_read+0x2b/0x2d0

#002:             [c03fb264] {out_mutex.lock}
... acquired at:  l_start+0x1b/0x310

#003:             [c04d1404] {max_mutex.lock}
... acquired at:  l_start+0x2dc/0x310

CPU0: 000000abfe71626e (000000abfe71654e) ... #3 (000000abfe71654e) 000000abfe7176be
CPU1: 000000abfe6d3f0e (000000abfe6d4efe) ... #41 (000000abfe70fa46) 000000abfe71227e
CPU1 entries: 41
first stamp: 000000abfe71626e
 last stamp: 000000abfe71626e
cat/2101[CPU#1]: BUG in update_out_trace at kernel/latency.c:587
 [<c01041b3>] dump_stack+0x23/0x30 (20)
 [<c0122e0b>] __WARN_ON+0x6b/0x90 (52)
 [<c0140530>] update_out_trace+0x3d0/0x440 (104)
 [<c0140881>] l_start+0x2e1/0x310 (72)
 [<c0191eab>] seq_read+0x7b/0x2d0 (60)
 [<c016ecd4>] vfs_read+0xd4/0x140 (36)
 [<c016efb0>] sys_read+0x50/0x80 (44)
 [<c01031c0>] sysenter_past_esp+0x61/0x89 (-8116)
---------------------------
| preempt count: 00000002 ]
| 2-level deep critical section nesting:
----------------------------------------
.. [<c038dd9f>] .... _raw_spin_lock_irqsave+0x1f/0xb0
.....[<c0122dc7>] ..   ( <= __WARN_ON+0x27/0x90)
.. [<c014280d>] .... print_traces+0x1d/0x60
.....[<c01040b2>] ..   ( <= show_trace+0xc2/0xf0)

------------------------------
| showing all locks held by: |  (cat/2101 [deca9320, 116]):
------------------------------

#001:             [d9ac47a4] {(struct semaphore *)(&p->sem)}
... acquired at:  seq_read+0x2b/0x2d0

#002:             [c04d1404] {max_mutex.lock}
... acquired at:  l_start+0x2dc/0x310

CPU0: 000000abfe71626e (000000abfe71654e) ... #3 (00000071654e) 000000abfe7176be
CPU1: 000000abfe6d3f0e (000000abfe6d4efe) ... #41 (000000abfe70fa46) 000000abfe71227e
CPU1 entries: 41
first stamp: 000000abfe71626e
 last stamp: 000000abfe71626e


Best Regards,
--ww

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux