Overcommit problems with 2.6.12-rc4 (on AMD64)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



(Please Cc me on answers, I don't follow LKML.)

Hi,

Suddenly one of our servers, a Dual Opteron with 2GB memory (running 32-bit
userland, but 64-bit kernel) started to behave oddly:

  imapd[31528]: segfault at 00000000fff00000 rip 00000000556a1a6d rsp 00000000ffffd394 error 4
  imapd[31527]: segfault at 00000000fff00000 rip 00000000556a1a6d rsp 00000000ffffcbe4 error 4
  sh[31530]: segfault at 00000000ffff7ff4 rip 000000005555e556 rsp 00000000ffff7ff8 error 6
  sh[31531]: segfault at 00000000ffff7e5c rip 00000000555dc575 rsp 00000000ffff7e60 error 6
  Unable to load interpreter /lib/ld-linux.so.2
  Unable to load interpreter /lib/ld-linux.so.2
  (ad infinitum)

It turned out it had some sort of memory problem:

  Jun  2 11:56:02 cassarossa smbd[7171]: oplock_break: malloc fail for input buffer. 
  Jun  2 11:56:02 cassarossa smbd[7171]: open_mode_check: FAILED when breaking oplock (3) on file login.bat, dev = 900, inode = 110665 

This wasn't a RAM problem, as the machine has ECC RAM and we received no
warnings from it. Also, we definitely had enough swap:

  cassarossa:~# free
               total       used       free     shared    buffers     cached
  Mem:       2058300    2041136      17164          0      39576    1601468
  -/+ buffers/cache:     400092    1658208
  Swap:      3903712          0    3903712

It looks like somehow, the kernel couldn't really distinguish between memory
used as cache and just "used". It couldn't even swapoff:

  cassarossa:~# swapoff -a
  swapoff: /dev/sda5: Cannot allocate memory
  swapoff: /dev/sdf5: Cannot allocate memory

However, we run with vm.overcommit_memory=2, so we figured out it was worth a
shot:

  cassarossa:~# echo 0 > /proc/sys/vm/overcommit_memory 
  cassarossa:~# swapoff -a
  cassarossa:~# swapon -a 
  cassarossa:~# free -m
               total       used       free     shared    buffers     cached
  Mem:          2010       1993         16          0         39       1595
  -/+ buffers/cache:        358       1651
  Swap:         3812          0       3812

Suddenly everything seems to be back to normal (ie. we could swapoff, and the
programs stopped running out of memory; no changes in the cache used,
though), and after a quick restart of services, everything is back to normal.
So to me, it looks like vm.overcommit_memory=2 is broken, at least on AMD64.
Any ideas why this would happen?

for the record:

  cassarossa:~# uname -a
  Linux cassarossa 2.6.12-rc4 #1 SMP Fri May 13 18:49:40 CEST 2005 x86_64 unknown

No kernel patches except for a microscopic forward-port of the ELF fix from
2.6.11.9.

/* Steinar */
-- 
Homepage: http://www.sesse.net/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux