Re: what is the expected behaviour under extreme high load.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 6 Sep 2006 16:53:55 -0700
"Om Narasimhan" <[email protected]> wrote:

> Hi,
> I am running a stress test on my SunFire 4600 (8x2core, 64G) using the
> mem_test available from
> http://carpanta.dc.fi.udc.es/~quintela/memtest. I am using SuSE
> enterprise 9 SP3.
> 
> I am wondering what is the expected behaviour of a machine under
> extreme VM stress.
> When I stress the system to the limits, it practically becomes
> unresponsive. It runs for almost half an hour and then it crashes
> because of a CPU lockup.
> 
> Any pointers from where I can start debugging this issue?

It's an old kernel.  Newer ones might be better.

> >From top,
> load average: 190.70, 132.45, 60.43
> # cat /proc/sys/vm/overcommit_memory
> 0
> 
> The final crashing message on the ser. console is,
> 
> ...
>
> Free swap:            0kB

It ran out of swapspace.  oom-killing is the correct behaviour.

> 16777208 pages of RAM
> 430115 reserved pages
> 69705619 pages shared
> 32 pages swap cached
> Out of Memory: Killed process 12726 (mtest).
> NMI Watchdog detected LOCKUP on CPU14, registers:
> CPU 14
> Pid: 12756, comm: mtest Tainted: G   U   (2.6.5-7.244-smp-dbg )
> RIP: 0010:[<ffffffff80180342>] <ffffffff80180342>{.text.lock.vmscan+18}
> RSP: 0000:0000010dfbd31a38  EFLAGS: 00000082
> RAX: 0000010603945a10 RBX: 0000000000000000 RCX: 0000010601cb7128
> RDX: 0000010dfbd31b48 RSI: 0000010600004408 RDI: 0000000000000000
> RBP: 0000010600004380 R08: 0000000000000001 R09: 0000000000000001
> R10: 0000000000000064 R11: 0000000000000080 R12: 0000010600004380
> R13: 0000010600004400 R14: 0000010dfbd31b48 R15: 0000010dfbd31b28
> FS:  0000002a95894b00(0000) GS:ffffffff80610c80(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000002ae2f57010 CR3: 00000001fff2d000 CR4: 00000000000006e0
> Process mtest (pid: 12756, threadinfo 0000010dfbd30000, task 0000010dfede0280)
> Stack: 0000000000000000 0000000100000000 0000010dfbd31c08 0000008000000246
>        0000000000000246 0000000000000246 0000010dfede0280 0000000000000286
>        0000010975a5ad30 ffffffff805353e0
> Call Trace:<ffffffff8017f5c2>{shrink_zone+226}
> <ffffffff8017fed8>{try_to_free_pages+280}
>        <ffffffff80140cb0>{autoremove_wake_function+0}
> <ffffffff80174a0f>{__alloc_pages+559}
>        <ffffffff80185393>{handle_mm_fault+2323}
> <ffffffff80123e80>{do_page_fault+0}
>        <ffffffff80124054>{do_page_fault+468} <ffffffff8013be17>{thread_return+0}
>        <ffffffff80111465>{error_exit+0}
> 

That isn't.  Could be that the scanner simply spent too long with
interrupts disabled.  Or some interaction with the oom-killer caused the
scanner to lock up.

But that's an awfully old kernel - it could be that whatever problem caused
this has been fixed in the intervening years.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux