Re: Memory Management — Linux Kernel

Roger Heflin wrote:

I have seen RH3.0 crash on 32GB systems because it has too
much memory tied up in write cache. It required update 2(this was a while ago) and a change of a parameter in /proc
to prevent the crash, it was because of a overagressive
write caching change RH implemented in the kernel resulted
in the crash.  This crash was an OOM related crash.  To
duplicate the bug, you booted the machine and ran a dd
to create a very large file filling the disk.

We did test and did determine that it did not appear to have
the issue if you had less than 28GB of ram, this was on an
itanium machine, so I don't know if it occurs on other arches,
and if it occurs at the same memory limits on the other arches
either.

                       Roger
-----Original Message-----
From: linux-kernel-owner@vger.kernel.org[mailto:linux-kernel-owner@vger.kernel.org] On Behalf OfMárcio Oliveira
Sent: Friday, July 22, 2005 2:42 PM
To: Neil Horman
Cc: arjanv@redhat.com; linux-kernel@vger.kernel.org
Subject: Re: Memory Management

Neil Horman wrote:
On Fri, Jul 22, 2005 at 11:32:52AM -0300, Márcio Oliveira wrote:
Neil Horman wrote:
On Thu, Jul 21, 2005 at 10:40:54AM -0300, Márcio Oliveira wrote:
http://people.redhat.com/nhorman/papers/rhel3_vm.pdf
I wrote this with norm awhile back.  It may help you out.
Regards
Neil
Neil,

Thanks.~10-12GB of total RAM (16GB) are
How can Proc virtual memory parameters like
inactive_clean_percent,
overcommit_memory, overcommit_ratio and page_cache help
me to solve
/ reduce Out Of Memory conditions on servers with 16GB
RAM and lots
of GB swap?
I wouldn't touch memory overcommit if you are already
seeing out of
memory issues. If you are using lots of pagecache, I
would suggest
increasing inactive_clean percent, reducing the
pagecahce.max value,
and modifying the bdflush parameters in the above document
such that
bdflush runs sooner, more often, and does more work per
iteration.
This will help you move data in pagecache back to disk moreaggressively so that memory will be available for other purposes,like heap allocations. Also if you're using a Red Hat
kernel and you
have 16GB of ram in your system, you're a good candidate for thehugemem kernel. Rather than a straightforward out of memorycondition, you may be seeing a exhaustion of your kernels addressspace (check LowFree in /proc/meminfo). In this even the hugememkernel will help you in that it increases your Low Memory addressspace from 1GB to 4GB, preventing some OOM conditions.
Kernel does not free cached memory (~10-12GB of total RAM
- 16GB). Is
there some way to force the kernel to free cached memory?
Cached memory is freed on demand. Just because its listed
under the
cached line
below doesn't mean it can't be freed and used for another
purpose.
Implement
the tunings above, and your situation should improve.

Regards
Neil
/proc/meminfo:

         total:    used:    free:  shared: buffers:  cached:
Mem: 16603488256 16523333632 80154624 0
70651904 13194563584
Swap:   17174257664 11771904 17162485760
MemTotal:     16214344 kB
MemFree:         78276 kB
Buffers:         68996 kB
Cached:       12874808 kB

Thanks to all.

Marcio.
Neil,

Thanks for the answers!

The following lines are the Out Of Memory log:
Jul 20 13:45:44 server kernel: Out of Memory: Killed
process 23716 (oracle).
Jul 20 13:45:44 server kernel: Fixed up OOM kill of mm-less task
Jul 20 13:45:45 server su(pam_unix)[3848]: session closed
for user root
Jul 20 13:45:48 server kernel: Mem-info:
Jul 20 13:45:48 server kernel: Zone:DMA freepages: 1884 min: 0low: 0 high: 0Jul 20 13:45:48 server kernel: Zone:Normal freepages: 1084
min: 1279
low:  4544 high:  6304
Jul 20 13:45:48 server kernel: Zone:HighMem
freepages:386679 min: 255
low: 61952 high: 92928
Jul 20 13:45:48 server kernel: Free pages: 389647
(386679 HighMem)
Jul 20 13:45:48 server kernel: ( Active: 2259787/488777,inactive_laundry: 244282, inactive_clean: 244366, free: 389647 )
Jul 20 13:45:48 server kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:1884
Jul 20 13:45:48 server kernel: aa:1620 ac:1801 id:231
il:15 ic:0 fr:1085
Jul 20 13:45:48 server kernel: aa:1099230 ac:1157136 id:488536il:244277 ic:244366 fr:386679Jul 20 13:45:48 server kernel: 0*4kB 0*8kB 1*16kB 1*32kB
1*64kB 0*128kB
1*256kB 0*512kB 1*1024kB 1*2048kB 1*4096kB = 7536kB)Jul 20 13:45:48server kernel: 55*4kB 9*8kB 19*16kB 9*32kB 0*64kB 1*128kB 1*256kB0*512kB 1*1024kB 1*2048kB 0*4096kB = 4340kB)Jul 20 13:45:48 server kernel: 291229*4kB 46179*8kB 711*16kB 1*32kB1*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB =
1546716kB)
Jul 20 13:45:48 server kernel: Swap cache: add 192990,
delete 189665,
find 21145/90719, race 0+0
Jul 20 13:45:48 server kernel: 139345 pages of slabcache
Jul 20 13:45:48 server kernel: 1890 pages of kernel stacks
Jul 20 13:45:48 server kernel: 0 lowmem pagetables, 274854 highmempagetables
Jul 20 13:45:48 server kernel: Free swap:       16749720kB
Jul 20 13:45:49 server kernel: 4194304 pages of RAM
Jul 20 13:45:49 server kernel: 3899360 pages of HIGHMEM
Jul 20 13:45:49 server kernel: 140718 reserved pages
Jul 20 13:45:49 server kernel: 35350398 pages shared
Jul 20 13:45:49 server kernel: 3325 pages swap cached

/proc/meminfo LowFree info:
LowFree: 17068 kB ------> Do you think this
value is too low?
No that should be plenty of lowFree, but that number can
change quickly
depending on workload.
Zone:Normal freepages: 1084 min: 1279 low: 4544 high:
6304 ---->
(freepages < min) It's normal?
Zone:HighMem freepages:386679 min: 255 low: 61952 high:
92928 ---->
(freepages < min) It's normal?
You're beneath your low water mark in the normal (lowmem)
zone for free pages,
so your kernel is likely trying to get lots of data moved to
disk.  Although
given that you're largest buddy list has a 2048K chunk free,
I'm hard pressed to
see how you aren't able to get memory when you need it. Do
you have a module
loaded in your kernel that might require such large memory
allocations.
Neil
Thanks a lot Neil!

Márcio Oliveira.
Neil,

  Thanks for the help.
I have a storage attached to the server. Maybe the storage modulerequire lots of memory.Maybe the "LowFree" be wrong (out of OOM time), so thereis possiblethat "LowFree" value be too small on the OOM condition.
Is there a way to identify if the Low Memory is too small? (someprogram, command, daemon...)
The server has 16GB RAM and 16GB swap. When the OOM kill conditionshappens, the system has ~6GB RAM used, ~10GB RAM cached and 16GB freeswap. Is that indicate that the server can't allocate Low Memory andstarts OOM conditions? Because the High Memory is OK, right?
Thanks again!

Márcio.
-
To unsubscribe from this list: send the line "unsubscribelinux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Roger, thanks for the information.

I'm using Update 4 kernels (2.4.21-27.ELsmp - This kernel have somemm / oom fixes) and don't have big problems when create large files,plus the server is a 32-bit machine.

  Neil said that the problem can be Low Memory and I think it too.

  I read the following message on the list:

http://marc.theaimsgroup.com/?l=linux-kernel&m=112044530919567&w=4

The problem seems like a I/O issue. Can I/O (like storage devices)consume a large amount of low memory? Neil also said that the kernel istrying to move lots of data to the disk and it's a module might requiresuch large memory. Somebody know how can I identify what is using LowMemory on my system?

  The older questions in the message are:

The server has 16GB RAM and 16GB swap. When the OOM kill conditions happens, the system has ~6GB RAM used, ~10GB RAM cached and 16GB free swap. Is that indicate that the server can't allocate Low Memory and starts OOM conditions? Because the High Memory is OK, right?

Thanks to all!

Regards,
Márcio


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: Memory Management
  - From: Neil Horman <nhorman@redhat.com>

References:
- RE: Memory Management
  - From: "Roger Heflin" <rheflin@atipa.com>

Prev by Date: Re: Merging relayfs?
Next by Date: Re: Whats in this vaddr segment 0xffffe000-0xfffff000 ---p ?
Previous by thread: RE: Memory Management
Next by thread: Re: Memory Management
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind]