Hey all- I was recently shown this issue, wherein, if the kernel was kept full of pagecache via applications that were constantly writing large amounts of data to disk, the box could find itself in a position where the vm, in __alloc_pages would invoke the oom killer repetatively within try_to_free_pages, until such time as the box had no candidate processes left to kill, at which point it would panic. While this seems like the right thing to do in general, it occured to me that if we could simply force some additional evictions from pagecache before we tried to reclaim memory in try_to_free_pages, we stood a good chance of avoiding the need to invoke the oom killer at all (assuming that the pages freed from pagecache were physically contiguous). The following patch preforms this operation and in my testing, and that of the origional reporter, results in avoidance of the oom killer being invoked for the workloads which would previously oom kill the box to the point that it would panic. Thanks & Regards Neil Signed-off-by: Neil Horman <[email protected]> include/linux/writeback.h | 1 + mm/page-writeback.c | 17 +++++++++++++++++ mm/page_alloc.c | 10 ++++++++++ 3 files changed, 28 insertions(+) diff --git a/include/linux/writeback.h b/include/linux/writeback.h --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -86,6 +86,7 @@ static inline void wait_on_inode(struct * mm/page-writeback.c */ int wakeup_pdflush(long nr_pages); +void clean_pagecache(long nr_pages); void laptop_io_completion(void); void laptop_sync_completion(void); void throttle_vm_writeout(void); diff --git a/mm/page-writeback.c b/mm/page-writeback.c --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -350,6 +350,23 @@ static void background_writeout(unsigned } /* + * Writeback nr_pages from pagecache to disk synchronously + * blocks until the writeback is complete + */ +void clean_pagecache(long nr_pages) +{ + struct writeback_control wbc = { + .bdi = NULL, + .sync_mode = WB_SYNC_ALL, + .older_than_this = NULL, + .nr_to_write = nr_pages, + .nonblocking = 0, + }; + + writeback_inodes(&wbc); +} + +/* * Start writeback of `nr_pages' pages. If `nr_pages' is zero, write back * the whole world. Returns 0 if a pdflush thread was dispatched. Returns * -1 if all pdflush threads were busy. diff --git a/mm/page_alloc.c b/mm/page_alloc.c --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -949,6 +949,16 @@ rebalance: reclaim_state.reclaimed_slab = 0; p->reclaim_state = &reclaim_state; + /* + * We're pinched for memory, so before we try to reclaim some + * pages synchronously, lets try to force some more pages out + * of pagecache, to raise our chances of this succeding. + * specifically, lets write out the number of pages that this + * allocation is requesting, in the hopes that they will be + * contiguous + */ + clean_pagecache(1<<order); + did_some_progress = try_to_free_pages(zonelist->zones, gfp_mask); p->reclaim_state = NULL; -- /*************************************************** *Neil Horman *Software Engineer *gpg keyid: 1024D / 0x92A74FA1 - http://pgp.mit.edu ***************************************************/
Attachment:
pgpQEsgo4pcny.pgp
Description: PGP signature
- Follow-Ups:
- Prev by Date: RE: [RFC]add ACPI hooks for IDE suspend/resume
- Next by Date: [RFC][Patch 0/5] Per-task delay accounting
- Previous by thread: [PATCH] swap migration: Fix lru drain
- Next by thread: Re: [PATCH] vm: enhance __alloc_pages to prioritize pagecache eviction when pressed for memory
- Index(es):