[PATCH][0/8] (Targeting 2.6.17) Posix memory locking and balanced mlock-LRU semantic

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Both one of my friends(who is working on a DBMS oriented from
PostgreSQL) and i had encountered unexpected OOMs with mlock/mlockall.

After careful code-reading and tests,i found out that the reason of the
OOM is that VM's LRU algorithm treating mlocked pages as Active/Inactive,
regardless of that the mlocked pages could not be reclaimed.

Mlocking many pages will easily cause unbalance between LRU and slab:
VM tend to reclaim from Active/Inactive list,most of which are mlocked,
thus OOM may be triggered. While in fact,there are enough pages to be
reclaimed in slab.
( Setting a large "vfs_cache_pressure" may help to avoid the OOM
  under this situation, but i think it's better "do things right" than
  depending on the "vfs_cache_pressure" tunable)

We think that it's wrong semantic treating mlocked as Active/Inactive.
Mlocked pages should not be counted in page-reclaiming algorithm,
for in fact they will never be affected by page reclaims.

Following patch patch try to fix this, with some additions.

The patch brings Linux with:
1. Posix mlock/munlock/mlockall/munlockall.
   Get mlock/munlock/mlockall/munlockall to Posix definiton: transaction-like,
   just as described in the manpage(2) of mlock/munlock/mlockall/munlockall.
   Thus users of mlock system call series will always have an clear map of
   mlocked areas.
2. More consistent LRU semantics in Memory Management.
   Mlocked pages is placed on a separate LRU list: Wired List.
   The pages dont take part in LRU algorithms,for they could never be swapped,
   until munlocked.
3. Output the Wired(mlocked) pages count through /proc/meminfo.
   One line is added to /proc/meminfo: "Wired:     N kB",thus Linux system
   administrators/programmers can have a clearer map of physical memory usage.


Test of the patch:

Test envioronment:
     RHEL4.
     Totoal physical memory size: 256MB,no swap.
     One ext3 directory("/mnt/test") with about 256 thousand small
files (each size: 2kB).

Step 1. run a task mlocking 220 MB
Step 2. run: "find /mnt/test -size 100"


Case A. Standard kernel.org kernel 2.6.15

Linux soon run OOM, OOM-time memory info:

[root@Linux ~]# cat /proc/meminfo
MemTotal:       254248 kB
MemFree:          3144 kB
Buffers:           124 kB
Cached:           1584 kB
SwapCached:          0 kB
Active:         229308 kB
Inactive:          596 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:       254248 kB
LowFree:          3144 kB
SwapTotal:           0 kB
SwapFree:            0 kB
Dirty:               0 kB
Writeback:           0 kB
Mapped:         228556 kB
Slab:            20076 kB
CommitLimit:    127124 kB
Committed_AS:   238424 kB
PageTables:        584 kB
VmallocTotal:   770040 kB
VmallocUsed:       180 kB
VmallocChunk:   769844 kB
HugePages_Total:     0
HugePages_Free:      0
Hugepagesize:     4096 kB


Case B. Patched 2.6.15

No OOM happened.

[root@Linux ~]# cat /proc/meminfo
MemTotal:       254344 kB
MemFree:          3508 kB
Buffers:          6352 kB
Cached:           2684 kB
SwapCached:          0 kB
Active:           7140 kB
Inactive:         4732 kB
Wired:          225284 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:       254344 kB
LowFree:          3508 kB
SwapTotal:           0 kB
SwapFree:            0 kB
Dirty:              72 kB
Writeback:           0 kB
Mapped:         229208 kB
Slab:            12552 kB
CommitLimit:    127172 kB
Committed_AS:   238168 kB
PageTables:        572 kB
VmallocTotal:   770040 kB
VmallocUsed:       180 kB
VmallocChunk:   769844 kB
HugePages_Total:     0
HugePages_Free:      0
Hugepagesize:     4096 kB


A lot thanks to Mel Gorman for your book: <Understanding the Linux Virtual
Memory Manager>. Also, thanks to other 2 great Linux kernel books: ULK3 and
LDD3.

FreeBSD's VM implementation enlightened me,thanks to FreeBSD guys.

Attachment is the full patch,following mails are what it splits up,.

Shaoping Wang

Attachment: patch-2.6.15-memlock
Description: Binary data


[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux