Re: [PATCH] vm: enhance __alloc_pages to prioritize pagecache eviction when pressed for memory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Neil Horman <[email protected]> wrote:
>
> Hey all-
>      I was recently shown this issue, wherein, if the kernel was kept full of
> pagecache via applications that were constantly writing large amounts of data to
> disk, the box could find itself in a position where the vm, in __alloc_pages
> would invoke the oom killer repetatively within try_to_free_pages, until such
> time as the box had no candidate processes left to kill, at which point it would
> panic.

That's pretty bad.  Are you able to provide a description which would permit
others to reproduce this?

>  /*
> + * Writeback nr_pages from pagecache to disk synchronously
> + * blocks until the writeback is complete
> + */
> +void clean_pagecache(long nr_pages)
> +{
> +	struct writeback_control wbc = {
> +		.bdi            = NULL,
> +		.sync_mode      = WB_SYNC_ALL,
> +		.older_than_this = NULL,
> +		.nr_to_write    = nr_pages,
> +		.nonblocking    = 0,
> +	};
> +
> +	writeback_inodes(&wbc);
> +}

Interesting.

> +/*
>   * Start writeback of `nr_pages' pages.  If `nr_pages' is zero, write back
>   * the whole world.  Returns 0 if a pdflush thread was dispatched.  Returns
>   * -1 if all pdflush threads were busy.
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -949,6 +949,16 @@ rebalance:
>  	reclaim_state.reclaimed_slab = 0;
>  	p->reclaim_state = &reclaim_state;
>  
> +	/*
> +	 * We're pinched for memory, so before we try to reclaim some 
> +	 * pages synchronously, lets try to force some more pages out
> +	 * of pagecache, to raise our chances of this succeding.
> +	 * specifically, lets write out the number of pages that this
> +	 * allocation is requesting, in the hopes that they will be
> +	 * contiguous
> +	 */
> +	clean_pagecache(1<<order);
> +
>  	did_some_progress = try_to_free_pages(zonelist->zones, gfp_mask);

I suspect that we shuld be passing more than (1<<order) into
clean_pagecache() - if we're going to do this sort of writeback then we
might as well do a decent amount.  Maybe something like (number of pages on
the eligible LRUs * proportion of dirty memory) or something.  But then,
page reclaim does writeback off the LRU, so none of this should be
needed...   Need to work out why it broke.

And we should not be calling into filesystem writeback unless the caller
specified __GFP_FS.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux