Re: Processes stuck on D state on Dual Opteron

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tuesday 05 April 2005 03:12, Andrew Morton wrote:
> Claudio Martins <[email protected]> wrote:
> >    While stress testing 2.6.12-rc2 on an HP DL145 I get processes stuck
> > in D state after some time.
> >    This machine is a dual Opteron 248 with 2GB (ECC) on one node (the
> > other node has no RAM modules plugged in, since this board works only
> > with pairs).
> >
> >    I was using stress (http://weather.ou.edu/~apw/projects/stress/) with
> > the following command line:
> >
> >  stress -v -c 20 -i 12 -m 10 -d 20
> >
> >    This causes a constant load avg. of around 70, makes the machine go
> > into swap a little, and writes up to about 20GB of random data to disk
> > while eating up all CPU. After about half and hour random processes like
> > top, df, etc get stuck in D state. Half of the 60 or so stress processes
> > are also in D state. The machine keeps being responsive for maybe some 15
> > minutes but then the shells just hang and sshd stops responding to
> > connections, though the machine replies to pings (I don't have console
> > acess till tomorrow).
> >
> >    The system is using ext3 with md software Raid1.
> >
> >   I'm interested in knowing if anyone out there with dual Opterons can
> >  reproduce this or not. I also have access to an HP DL360 Dual Xeon, so I
> > will try to find out if this is AMD64 specific as soon as possible.
> > Please let me know if you want me to run some other tests or give some
> > more info to help solve this one.
>
> Can you capture the output from alt-sysrq-T?


     Hi Andrew,

  Due to other tasks, only now was I able to repeat the tests and capture the 
the output from alt-sysrq-T. I booted with serial console, put stress to work 
and when the processes started to get hung on D state I managed to capture 
the following:

 SysRq : Show State

                                                       sibling
  task                 PC          pid father child younger older
init          D ffff81007fcfe0d8     0     1      0     2               
(NOTLB)
ffff810003253768 0000000000000082 ffff81007fd19170 0000007d00000000 
       ffff81007fd19170 ffff810003251470 000000000000271b ffff810074468e70 
       ffff810003251680 ffffffff8027a79a 
Call Trace:<ffffffff8027a79a>{__make_request+1274} 
<ffffffff8037ab68>{__down+152} 
       <ffffffff8012f4f0>{default_wake_function+0} 
<ffffffff80158de4>{mempool_alloc+164} 
       <ffffffff8037c649>{__down_failed+53} 
<ffffffff802ed53d>{.text.lock.md+155} 
       <ffffffff802d8204>{make_request+868} 
<ffffffff8015db7d>{cache_alloc_refill+413} 
       <ffffffff8027abd1>{generic_make_request+545} 
<ffffffff8014a230>{autoremove_wake_function+0} 
       <ffffffff8014a230>{autoremove_wake_function+0} 
<ffffffff8027accf>{submit_bio+223} 
       <ffffffff8015c39b>{test_set_page_writeback+203} 
<ffffffff8016e9d8>{swap_writepage+184} 
       <ffffffff80161bc6>{shrink_zone+2678} 
<ffffffff8037b3e0>{thread_return+0} 
       <ffffffff8037b438>{thread_return+88} 
<ffffffff80162187>{try_to_free_pages+311} 
       <ffffffff8014a230>{autoremove_wake_function+0} 
<ffffffff8015a685>{__alloc_pages+533} 
       <ffffffff8015a88e>{__get_free_pages+14} 
<ffffffff8018c72a>{__pollwait+74} 
       <ffffffff80185c72>{pipe_poll+66} <ffffffff8018caa5>{do_select+725} 
       <ffffffff8018c6e0>{__pollwait+0} <ffffffff8018ceef>{sys_select+735} 
       <ffffffff8010db06>{system_call+126} 
migration/0   S ffff810002c12720     0     2      1             3       
(L-TLB)
ffff81007ff0fea8 0000000000000046 ffff810074806ef0 0000007500000001 
       ffff81007ff0fe58 ffff8100032506f0 0000000000000129 ffff810075281230 
       ffff810003250900 ffff810072ffde88 
Call Trace:<ffffffff80130a24>{migration_thread+532} 
<ffffffff80130810>{migration_thread+0} 
       <ffffffff80149c09>{kthread+217} <ffffffff8010e6ef>{child_rip+8} 
       <ffffffff80149b30>{kthread+0} <ffffffff8010e6e7>{child_rip+0} 
       
ksoftirqd/0   S 0000000000000000     0     3      1             4     2 
(L-TLB)
ffff81007ff11f08 0000000000000046 ffff810072e00430 0000007d00000000 
       ffff810002c194e0 ffff810003250030 00000000000000d1 ffff810072f3a030 
       ffff810003250240 0000000000000000 
Call Trace:<ffffffff801393e1>{__do_softirq+113} 
<ffffffff801399c0>{ksoftirqd+0} 
       <ffffffff801399c0>{ksoftirqd+0} <ffffffff80139a23>{ksoftirqd+99} 
       <ffffffff801399c0>{ksoftirqd+0} <ffffffff80149c09>{kthread+217} 
       <ffffffff8010e6ef>{child_rip+8} <ffffffff80149b30>{kthread+0} 
       <ffffffff8010e6e7>{child_rip+0} 
migration/1   S ffff810002c1a720     0     4      1             5     3 
(L-TLB)
ffff81007ff15ea8 0000000000000046 ffff810072d1cff0 0000007300000001 
       ffff810079fe7e98 ffff81007ff134b0 00000000000000a3 ffff810075281230 
       ffff81007ff136c0 ffff81003381de88 
Call Trace:<ffffffff80130a24>{migration_thread+532} 
<ffffffff80130810>{migration_thread+0} 
       <ffffffff80149c09>{kthread+217} <ffffffff8010e6ef>{child_rip+8} 
       <ffffffff80149b30>{kthread+0} <ffffffff8010e6e7>{child_rip+0} 
       
ksoftirqd/1   S 0000000000000001     0     5      1             6     4 
(L-TLB)
ffff81007ff19f08 0000000000000046 ffff810075376db0 00000077802b8e7e 
       ffff810002c114e0 ffff81007ff12df0 00000000000001b4 ffff810074125130 
       ffff81007ff13000 0000000000000000 
Call Trace:<ffffffff801393e1>{__do_softirq+113} 
<ffffffff801399c0>{ksoftirqd+0} 
       <ffffffff801399c0>{ksoftirqd+0} <ffffffff80139a23>{ksoftirqd+99} 
       <ffffffff801399c0>{ksoftirqd+0} <ffffffff80149c09>{kthread+217} 
       <ffffffff8010e6ef>{child_rip+8} <ffffffff80149b30>{kthread+0} 
       <ffffffff8010e6e7>{child_rip+0} 
events/0      S 0000094f2f7a804e     0     6      1             7     5 
(L-TLB)
ffff81007ff3be58 0000000000000046 0000000000000246 ffffffff8013d00d 
       000000007ffe0c00 ffff81007ff12730 0000000000000c80 ffffffff803f40c0 
       ffff81007ff12940 0000000000000000 
Call Trace:<ffffffff8013d00d>{__mod_timer+317} 
<ffffffff8015f470>{cache_reap+0} 
       <ffffffff80145331>{worker_thread+305} 
<ffffffff8012f4f0>{default_wake_function+0} 
       <ffffffff8012f4f0>{default_wake_function+0} 
<ffffffff80145200>{worker_thread+0} 
       <ffffffff80149c09>{kthread+217} <ffffffff8010e6ef>{child_rip+8} 
       <ffffffff80149b30>{kthread+0} <ffffffff8010e6e7>{child_rip+0} 
       
events/1      S 0000094ef3e03d58     0     7      1             8     6 
(L-TLB)
ffff81007ff3de58 0000000000000046 ffff810003250db0 0000000000000246 
       0000000000000246 ffff81007ff12070 00000000000000a4 ffff810003250db0 
       ffff81007ff12280 0000000000000000 
Call Trace:<ffffffff80252610>{flush_to_ldisc+0} 
<ffffffff80145331>{worker_thread+305} 
       <ffffffff8012f4f0>{default_wake_function+0} 
<ffffffff8012f4f0>{default_wake_function+0} 
       <ffffffff80145200>{worker_thread+0} <ffffffff80149c09>{kthread+217} 
       <ffffffff8010e6ef>{child_rip+8} <ffffffff80149b30>{kthread+0} 
       <ffffffff8010e6e7>{child_rip+0} 
khelper       S ffff810074815b18     0     8      1            13     7 
(L-TLB)
ffff81007ff43e58 0000000000000046 ffff810074815bc8 0000006f00000001 
       ffff810074815bc8 ffff81007ff414f0 000000000000006c ffff810074292f70 
       ffff81007ff41700 0000000000000001 
Call Trace:<ffffffff80144d50>{__call_usermodehelper+0} 
<ffffffff80145331>{worker_thread+305} 
       <ffffffff8012f4f0>{default_wake_function+0} 
<ffffffff8012f4f0>{default_wake_function+0} 
       <ffffffff80145200>{worker_thread+0} <ffffffff80149c09>{kthread+217} 
       <ffffffff8010e6ef>{child_rip+8} 
<ffffffff8011b0b0>{flat_send_IPI_mask+0} 
       <ffffffff80149b30>{kthread+0} <ffffffff8010e6e7>{child_rip+0} 
       
kthread       S ffff81002a48bd18     0    13      1    24     169     8 
(L-TLB)
ffff81007ff55e58 0000000000000046 ffffffff8012f4f0 0000006f00000000 
       0000000000000000 ffff81007ff40e30 00000000000000ac ffff8100745941b0 
       ffff81007ff41040 0000000000000001 
Call Trace:<ffffffff8012f4f0>{default_wake_function+0} 
<ffffffff80149c50>{keventd_create_kthread+0} 
       <ffffffff80145331>{worker_thread+305} 
<ffffffff8012f4f0>{default_wake_function+0} 
       <ffffffff8012f4f0>{default_wake_function+0} 
<ffffffff80145200>{worker_thread+0} 
       <ffffffff80149c09>{kthread+217} <ffffffff8010e6ef>{child_rip+8} 
       <ffffffff80149b30>{kthread+0} <ffffffff8010e6e7>{child_rip+0} 
       
kacpid        S 000000000c378373     0    24     13           105       
(L-TLB)
ffff81000334be58 0000000000000046 0000000000000000 0000000000000000 
       ffff810002c114e0 ffff810003349530 0000000000000209 ffff810003250db0 
       ffff810003349740 0000000000000000 
Call Trace:<ffffffff80149c50>{keventd_create_kthread+0} 
<ffffffff80145200>{worker_thread+0} 
       <ffffffff80145331>{worker_thread+305} 
<ffffffff8012f4f0>{default_wake_function+0} 
       <ffffffff8012f4f0>{default_wake_function+0} 
<ffffffff80149c50>{keventd_create_kthread+0} 
       <ffffffff80145200>{worker_thread+0} 
<ffffffff80149c50>{keventd_create_kthread+0} 
       <ffffffff80149c09>{kthread+217} <ffffffff8010e6ef>{child_rip+8} 
       <ffffffff80149c50>{keventd_create_kthread+0} 
<ffffffff80149b30>{kthread+0} 
       <ffffffff8010e6e7>{child_rip+0} 
kblockd/0     S ffff81007fd19830     0   105     13           106    24 
(L-TLB)
ffff8100033a1e58 0000000000000046 0000000000000001 0000007600000000 
       ffff810019992230 ffff810003348e70 0000000000000d97 ffff810074125130 
       ffff810003349080 0000000000000001 
Call Trace:<ffffffff80278f30>{blk_unplug_work+0} 
<ffffffff80145331>{worker_thread+305} 
       <ffffffff8012f4f0>{default_wake_function+0} 
<ffffffff8012f4f0>{default_wake_function+0} 
       <ffffffff80149c50>{keventd_create_kthread+0} 
<ffffffff80145200>{worker_thread+0} 
       <ffffffff80149c50>{keventd_create_kthread+0} 
<ffffffff80149c09>{kthread+217} 
       <ffffffff8010e6ef>{child_rip+8} 
<ffffffff80149c50>{keventd_create_kthread+0} 
       <ffffffff80149b30>{kthread+0} <ffffffff8010e6e7>{child_rip+0} 
       
kblockd/1     S 000009309d720cf6     0   106     13           170   105 
(L-TLB)
ffff8100033a3e58 0000000000000046 ffff81007fcf8e00 ffffffff8027f2a6 
       ffff81007fcf6a00 ffff8100033487b0 0000000000000ae1 ffff810003250db0 
       ffff8100033489c0 0000000000000000 
Call Trace:<ffffffff8027f2a6>{as_move_to_dispatch+342} 
<ffffffff80280530>{as_work_handler+0} 
       <ffffffff80145331>{worker_thread+305} 
<ffffffff8012f4f0>{default_wake_function+0} 
       <ffffffff8012f4f0>{default_wake_function+0} 
<ffffffff80149c50>{keventd_create_kthread+0} 
       <ffffffff80145200>{worker_thread+0} 
<ffffffff80149c50>{keventd_create_kthread+0} 
       <ffffffff80149c09>{kthread+217} <ffffffff8010e6ef>{child_rip+8} 
       <ffffffff80149c50>{keventd_create_kthread+0} 
<ffffffff80149b30>{kthread+0} 
       <ffffffff8010e6e7>{child_rip+0} 
kswapd0       D ffff81007fcfe0d8     0   169      1           758    13 
(L-TLB)
ffff81007fc0d8e8 0000000000000046 ffff8100133b5900 0000007600000001 
       ffff81007fd19170 ffff81007ff400b0 0000000000003643 ffff810074193170 
       ffff81007ff402c0 ffffffff8027abd1 
Call Trace:<ffffffff8027abd1>{generic_make_request+545} 
<ffffffff8014a230>{autoremove_wake_function+0} 
       <ffffffff8037ab68>{__down+152} 
<ffffffff8012f4f0>{default_wake_function+0} 
       <ffffffff80158de4>{mempool_alloc+164} 
<ffffffff8037c649>{__down_failed+53} 
       <ffffffff802ed53d>{.text.lock.md+155} 
<ffffffff802d8204>{make_request+868} 
       <ffffffff8027abd1>{generic_make_request+545} 
<ffffffff8014a230>{autoremove_wake_function+0} 
       <ffffffff8014a230>{autoremove_wake_function+0} 
<ffffffff8027accf>{submit_bio+223} 
       <ffffffff8015c39b>{test_set_page_writeback+203} 
<ffffffff8016e9d8>{swap_writepage+184} 
       <ffffffff80161bc6>{shrink_zone+2678} 
<ffffffff8037b3e0>{thread_return+0} 
       <ffffffff8037b438>{thread_return+88} 
<ffffffff8014a230>{autoremove_wake_function+0} 
       <ffffffff801624e9>{balance_pgdat+601} <ffffffff801627a7>{kswapd+327} 
       <ffffffff8014a230>{autoremove_wake_function+0} 
<ffffffff8014a230>{autoremove_wake_function+0} 
       <ffffffff8012df70>{schedule_tail+64} <ffffffff8010e6ef>{child_rip+8} 
       <ffffffff8011b0b0>{flat_send_IPI_mask+0} <ffffffff80162660>{kswapd+0} 
       <ffffffff8010e6e7>{child_rip+0} 
aio/0         S ffff81000337d000     0   170     13           171   106 
(L-TLB)
ffff81007fc1fe58 0000000000000046 0000000000000000 0000007500000000 
       0000000000000000 ffff81007fc08eb0 000000000000011f ffff810003251470 
       ffff81007fc090c0 0000000000000000 
Call Trace:<ffffffff80149c50>{keventd_create_kthread+0} 
<ffffffff80145200>{worker_thread+0} 
       <ffffffff80145331>{worker_thread+305} 
<ffffffff8012f4f0>{default_wake_function+0} 
       <ffffffff8012f4f0>{default_wake_function+0} 
<ffffffff80149c50>{keventd_create_kthread+0} 
       <ffffffff80145200>{worker_thread+0} 
<ffffffff80149c50>{keventd_create_kthread+0} 
       <ffffffff80149c09>{kthread+217} <ffffffff8010e6ef>{child_rip+8} 
       <ffffffff80149c50>{keventd_create_kthread+0} 
<ffffffff80149b30>{kthread+0} 
       <ffffffff8010e6e7>{child_rip+0} 
aio/1         S 0000000048dcc53f     0   171     13          2425   170 
(L-TLB)
ffff81007fc21e58 0000000000000046 0000000000000000 0000000000000000 
       ffff810002c114e0 ffff81007fc087f0 000000000000011e ffff810003250db0 
       ffff81007fc08a00 0000000000000000 
Call Trace:<ffffffff80149c50>{keventd_create_kthread+0} 
<ffffffff80145200>{worker_thread+0} 
       <ffffffff80145331>{worker_thread+305} 
<ffffffff8012f4f0>{default_wake_function+0} 
       <ffffffff8012f4f0>{default_wake_function+0} 
<ffffffff80149c50>{keventd_create_kthread+0} 
       <ffffffff80145200>{worker_thread+0} 
<ffffffff80149c50>{keventd_create_kthread+0} 
       <ffffffff80149c09>{kthread+217} <ffffffff8010e6ef>{child_rip+8} 
       <ffffffff80149c50>{keventd_create_kthread+0} 
<ffffffff80149b30>{kthread+0} 
       <ffffffff8010e6e7>{child_rip+0} 
kseriod       S 00000007606f165c     0   758      1           825   169 
(L-TLB)
ffff81007fd05eb8 0000000000000046 0000000000000000 ffffffff801b1df9 
       0000000000000246 ffff81007ff40770 00000000000001f6 ffff810003250db0 
       ffff81007ff40980 0000000000000000 
Call Trace:<ffffffff801b1df9>{sysfs_make_dirent+41} 
<ffffffff8027288d>{driver_create_file+61} 
       <ffffffff80267b21>{serio_thread+689} 
<ffffffff8014a230>{autoremove_wake_function+0} 
       <ffffffff8014a230>{autoremove_wake_function+0} 
<ffffffff8012df70>{schedule_tail+64} 
       <ffffffff8010e6ef>{child_rip+8} <ffffffff80267870>{serio_thread+0} 
       <ffffffff8010e6e7>{child_rip+0} 
scsi_eh_0     S ffff81007fd59ef8     0   825      1           826   758 
(L-TLB)
ffff81007fd59df8 0000000000000046 ffffffff80145f9f 00000075801461ba 
       ffff81007fc08130 ffff81007fc08130 00000000000003b1 ffff810003251470 
       ffff81007fc08340 0000000000000202 
Call Trace:<ffffffff80145f9f>{attach_pid+47} 
<ffffffff8012d5c3>{recalc_task_prio+323} 
       <ffffffff8037acad>{__down_interruptible+205} 
<ffffffff8012f4f0>{default_wake_function+0} 
       <ffffffff8037c683>{__down_failed_interruptible+53} 
       <ffffffff802a0be4>{.text.lock.scsi_error+45} 
<ffffffff8010e6ef>{child_rip+8} 
       <ffffffff802a0150>{scsi_error_handler+0} 
<ffffffff8010e6e7>{child_rip+0} 
       
ahc_dv_0      S 000000061ef1cc1c     0   826      1           845   825 
(L-TLB)
ffff81007fd5de08 0000000000000046 ffff81000327b400 000000867ff0da40 
       0000000000000000 ffff81007fd5b5b0 000000000000029f ffff81007ff12df0 
       ffff81007fd5b7c0 ffff81007fc6ac00 
Call Trace:<ffffffff80276ab5>{elv_next_request+261} 
<ffffffff8037acad>{__down_interruptible+205} 
       <ffffffff8012f4f0>{default_wake_function+0} 
<ffffffff8037c683>{__down_failed_interruptible+53} 
       <ffffffff80212e90>{kobject_release+0} 
<ffffffff802be9eb>{.text.lock.aic7xxx_osm+85} 
       <ffffffff8010e6ef>{child_rip+8} 
<ffffffff802bd340>{ahc_linux_dv_thread+0} 
       <ffffffff8010e6e7>{child_rip+0} 
md3_raid1     S ffff81007fdb6b00     0   845      1           847   826 
(L-TLB)
ffff81007fddfeb8 0000000000000046 ffff810074534ef0 0000007d00000001 
       0000000002c114e0 ffff81007fd5a170 000000000000009f ffff810074534ef0 
       ffff81007fd5a380 0000000000000000 
Call Trace:<ffffffff802ea015>{md_thread+277} 
<ffffffff8014a230>{autoremove_wake_function+0} 
       <ffffffff8014a230>{autoremove_wake_function+0} 
<ffffffff8012df70>{schedule_tail+64} 
       <ffffffff802d8920>{raid1d+0} <ffffffff8010e6ef>{child_rip+8} 
       <ffffffff802d8920>{raid1d+0} <ffffffff802e9f00>{md_thread+0} 
       <ffffffff8010e6e7>{child_rip+0} 
md2_raid1     D ffff81007fcfe0d8     0   847      1           849   845 
(L-TLB)
ffff81007fdf1558 0000000000000046 ffff81000b4d9000 0000007d8015d9ad 
       ffff81007ffef4f8 ffff81007fd5a830 0000000000001a9e ffff810074ffa2f0 
       ffff81007fd5aa40 ffff81007ffef480 
Call Trace:<ffffffff8015db7d>{cache_alloc_refill+413} 
<ffffffff8037ab68>{__down+152} 
       <ffffffff8012f4f0>{default_wake_function+0} 
<ffffffff80158de4>{mempool_alloc+164} 
       <ffffffff8037c649>{__down_failed+53} 
<ffffffff802ed53d>{.text.lock.md+155} 
       <ffffffff802d8204>{make_request+868} 
<ffffffff8015db7d>{cache_alloc_refill+413} 
       <ffffffff8027abd1>{generic_make_request+545} 
<ffffffff8014a230>{autoremove_wake_function+0} 
       <ffffffff8014a230>{autoremove_wake_function+0} 
<ffffffff8027accf>{submit_bio+223} 
       <ffffffff8015c39b>{test_set_page_writeback+203} 
<ffffffff8016e9d8>{swap_writepage+184} 
       <ffffffff80161bc6>{shrink_zone+2678} 
<ffffffff80162187>{try_to_free_pages+311} 
       <ffffffff8014a230>{autoremove_wake_function+0} 
<ffffffff8015a685>{__alloc_pages+533} 
       <ffffffff80172633>{alloc_page_interleave+67} 
<ffffffff8015d74e>{cache_grow+270} 
       <ffffffff8015db95>{cache_alloc_refill+437} 
<ffffffff8015d636>{kmem_cache_alloc+54} 
       <ffffffff80158e1c>{mempoolNMI Watchdog detected LOCKUP on CPU1CPU 1 
Modules linked in: tg3 i2c_amd756 i2c_core ohci_hcd usbcore dm_mod
Pid: 0, comm: swapper Not tainted 2.6.12-rc2
RIP: 0010:[<ffffffff8026cfe7>] <ffffffff8026cfe7>{serial_in+87}
RSP: 0018:ffff81000325faf0  EFLAGS: 00000002
RAX: 00000000ffffff20 RBX: 0000000000000020 RCX: 0000000000000000
RDX: 00000000000003fd RSI: 0000000000000005 RDI: ffffffff804f5120
RBP: 0000000000002463 R08: 000000000000006c R09: 0000000000000002
R10: 00000000ffffffff R11: 0000000000000000 R12: ffffffff804f5120
R13: ffffffff804acc52 R14: 000000000000001a R15: 0000000000000025
FS:  00002aaaab3a34a0(0000) GS:ffffffff80510bc0(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00002aaaaadc55c0 CR3: 0000000073456000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffff810003256000, task ffff810003250db0)
Stack: ffffffff8026f42d 000000050325fd35 ffffffff8041a720 0000000000008378 
       0000000000000025 ffffffffffffbeca 0000000000000025 0000000000000046 
       ffffffff801342ac 000000000000839d 
Call Trace:<IRQ> <ffffffff8026f42d>{serial8250_console_write+125} 
<ffffffff801342ac>{__call_console_drivers+76} 
       <ffffffff801345aa>{release_console_sem+330} 
<ffffffff801348d0>{vprintk+656} 
       <ffffffff8026f54f>{serial8250_console_write+415} 
<ffffffff80158e1c>{mempool_alloc+220} 
       <ffffffff8013498d>{printk+141} <ffffffff80158e1c>{mempool_alloc+220} 
       <ffffffff801348d0>{vprintk+656} <ffffffff801517d8>{kallsyms_lookup+200} 
       <ffffffff8015d636>{kmem_cache_alloc+54} 
<ffffffff80158e1c>{mempool_alloc+220} 
       <ffffffff8010ed2c>{printk_address+140} 
<ffffffff80158e1c>{mempool_alloc+220} 
       <ffffffff8010ef2a>{show_trace+410} <ffffffff8010f07e>{show_stack+270} 
       <ffffffff80130732>{show_state+498} 
<ffffffff802611b0>{__handle_sysrq+144} 
       <ffffffff8026d658>{receive_chars+360} 
<ffffffff8026d9e7>{serial8250_interrupt+119} 
       <ffffffff8015461c>{handle_IRQ_event+44} 
<ffffffff80154749>{__do_IRQ+249} 
       <ffffffff80110a52>{do_IRQ+66} <ffffffff8010e0ad>{ret_from_intr+0} 
        <EOI> <ffffffff8010e1de>{retint_kernel+38} 
<ffffffff8010bb90>{default_idle+0} 
       <ffffffff8010bbb0>{default_idle+32} <ffffffff8010be1a>{cpu_idle+74} 
       <ffffffff8052291c>{start_secondary+476} 

Code: 0f b6 c0 c3 66 66 90 66 90 0f b6 4f 22 0f b6 47 23 41 89 d0 
console shuts up ...
 <0>Kernel panic - not syncing: Aiee, killing interrupt handler!
 

------------------------------------


  Unfortunately the system Oopsed in the middle of dumping the tasks, but from 
what I can see I'm tempted to think that this might be related to the MD 
code. md2_raid1 is blocked on D state and, although not shown on the dump, I 
know from ps command that md0_raid1 (the swap partition) was also on D state 
(along with the stress processes which are responsible for hogging memory, 
and top and df). There were about 200MB swapped out, but the swap partition 
size is 1GB.

  I repeated the test to try to get more output from alt-sysreq-T, but it 
oopsed again with even less output. 
  By the way, I have also tested 2.6.11.6 and I get stuck processes in the 
same way. With 2.6.9 I get a hard lockup with no working alt-sysrq, after 
about 30 to 60mins of stress.

  This is with preempt enabled (as well as BKL preempt). I want to test also 
without preempt and also without using MD Raid1, but I'll have to reach the 
machine and hit the power button, so not possible until tomorrow :-(

 The original original message in this thread containing the details of the 
setup and a .config is at:

http://marc.theaimsgroup.com/?l=linux-kernel&m=111266784320156&w=2

  I am happy to test any patches and also wonder if enabling any of the 
options in the kernel debugging section could help in trying to find where 
the deadlock is.

  Thanks

Claudio

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux