[PATCH] vm: enhance __alloc_pages to prioritize pagecache eviction when pressed for memory

Hey all-
     I was recently shown this issue, wherein, if the kernel was kept full of
pagecache via applications that were constantly writing large amounts of data to
disk, the box could find itself in a position where the vm, in __alloc_pages
would invoke the oom killer repetatively within try_to_free_pages, until such
time as the box had no candidate processes left to kill, at which point it would
panic.  While this seems like the right thing to do in general, it occured to me
that if we could simply force some additional evictions from pagecache before we
tried to reclaim memory in try_to_free_pages, we stood a good chance of avoiding
the need to invoke the oom killer at all (assuming that the pages freed from
pagecache were physically contiguous).  The following patch preforms this
operation and in my testing, and that of the origional reporter, results in
avoidance of the oom killer being invoked for the workloads which would
previously oom kill the box to the point that it would panic.

Thanks & Regards
Neil

Signed-off-by: Neil Horman <nhorman@redhat.com>


 include/linux/writeback.h |    1 +
 mm/page-writeback.c       |   17 +++++++++++++++++
 mm/page_alloc.c           |   10 ++++++++++
 3 files changed, 28 insertions(+)


diff --git a/include/linux/writeback.h b/include/linux/writeback.h
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -86,6 +86,7 @@ static inline void wait_on_inode(struct 
  * mm/page-writeback.c
  */
 int wakeup_pdflush(long nr_pages);
+void clean_pagecache(long nr_pages);
 void laptop_io_completion(void);
 void laptop_sync_completion(void);
 void throttle_vm_writeout(void);
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -350,6 +350,23 @@ static void background_writeout(unsigned
 }
 
 /*
+ * Writeback nr_pages from pagecache to disk synchronously
+ * blocks until the writeback is complete
+ */
+void clean_pagecache(long nr_pages)
+{
+	struct writeback_control wbc = {
+		.bdi            = NULL,
+		.sync_mode      = WB_SYNC_ALL,
+		.older_than_this = NULL,
+		.nr_to_write    = nr_pages,
+		.nonblocking    = 0,
+	};
+
+	writeback_inodes(&wbc);
+}
+
+/*
  * Start writeback of `nr_pages' pages.  If `nr_pages' is zero, write back
  * the whole world.  Returns 0 if a pdflush thread was dispatched.  Returns
  * -1 if all pdflush threads were busy.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -949,6 +949,16 @@ rebalance:
 	reclaim_state.reclaimed_slab = 0;
 	p->reclaim_state = &reclaim_state;
 
+	/*
+	 * We're pinched for memory, so before we try to reclaim some 
+	 * pages synchronously, lets try to force some more pages out
+	 * of pagecache, to raise our chances of this succeding.
+	 * specifically, lets write out the number of pages that this
+	 * allocation is requesting, in the hopes that they will be
+	 * contiguous
+	 */
+	clean_pagecache(1<<order);
+
 	did_some_progress = try_to_free_pages(zonelist->zones, gfp_mask);
 
 	p->reclaim_state = NULL;
-- 
/***************************************************
 *Neil Horman
 *Software Engineer
 *gpg keyid: 1024D / 0x92A74FA1 - http://pgp.mit.edu
 ***************************************************/

Attachment: pgpQEsgo4pcny.pgp
Description: PGP signature

Follow-Ups:
- Re: [PATCH] vm: enhance __alloc_pages to prioritize pagecache eviction when pressed for memory
  - From: Andrew Morton <akpm@osdl.org>

Prev by Date: RE: [RFC]add ACPI hooks for IDE suspend/resume
Next by Date: [RFC][Patch 0/5] Per-task delay accounting
Previous by thread: [PATCH] swap migration: Fix lru drain
Next by thread: Re: [PATCH] vm: enhance __alloc_pages to prioritize pagecache eviction when pressed for memory
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]