Re: [15/17] SLUB: Support virtual fallback via SLAB_VFALLBACK

On Sun, 30 Sep 2007 05:09:28 +1000 Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> On Sunday 30 September 2007 05:20, Andrew Morton wrote:
> > On Sat, 29 Sep 2007 06:19:33 +1000 Nick Piggin <nickpiggin@yahoo.com.au> 
> wrote:
> > > On Saturday 29 September 2007 19:27, Andrew Morton wrote:
> > > > On Sat, 29 Sep 2007 11:14:02 +0200 Peter Zijlstra
> > > > <a.p.zijlstra@chello.nl>
> > >
> > > wrote:
> > > > > > oom-killings, or page allocation failures?  The latter, one hopes.
> > > > >
> > > > > Linux version 2.6.23-rc4-mm1-dirty (root@dyad) (gcc version 4.1.2
> > > > > (Ubuntu 4.1.2-0ubuntu4)) #27 Tue Sep 18 15:40:35 CEST 2007
> > > > >
> > > > > ...
> > > > >
> > > > >
> > > > > mm_tester invoked oom-killer: gfp_mask=0x40d0, order=2, oomkilladj=0
> > > > > Call Trace:
> > > > > 611b3878:  [<6002dd28>] printk_ratelimit+0x15/0x17
> > > > > 611b3888:  [<60052ed4>] out_of_memory+0x80/0x100
> > > > > 611b38c8:  [<60054b0c>] __alloc_pages+0x1ed/0x280
> > > > > 611b3948:  [<6006c608>] allocate_slab+0x5b/0xb0
> > > > > 611b3968:  [<6006c705>] new_slab+0x7e/0x183
> > > > > 611b39a8:  [<6006cbae>] __slab_alloc+0xc9/0x14b
> > > > > 611b39b0:  [<6011f89f>] radix_tree_preload+0x70/0xbf
> > > > > 611b39b8:  [<600980f2>] do_mpage_readpage+0x3b3/0x472
> > > > > 611b39e0:  [<6011f89f>] radix_tree_preload+0x70/0xbf
> > > > > 611b39f8:  [<6006cc81>] kmem_cache_alloc+0x51/0x98
> > > > > 611b3a38:  [<6011f89f>] radix_tree_preload+0x70/0xbf
> > > > > 611b3a58:  [<6004f8e2>] add_to_page_cache+0x22/0xf7
> > > > > 611b3a98:  [<6004f9c6>] add_to_page_cache_lru+0xf/0x24
> > > > > 611b3ab8:  [<6009821e>] mpage_readpages+0x6d/0x109
> > > > > 611b3ac0:  [<600d59f0>] ext3_get_block+0x0/0xf2
> > > > > 611b3b08:  [<6005483d>] get_page_from_freelist+0x8d/0xc1
> > > > > 611b3b88:  [<600d6937>] ext3_readpages+0x18/0x1a
> > > > > 611b3b98:  [<60056f00>] read_pages+0x37/0x9b
> > > > > 611b3bd8:  [<60057064>] __do_page_cache_readahead+0x100/0x157
> > > > > 611b3c48:  [<60057196>] do_page_cache_readahead+0x52/0x5f
> > > > > 611b3c78:  [<60050ab4>] filemap_fault+0x145/0x278
> > > > > 611b3ca8:  [<60022b61>] run_syscall_stub+0xd1/0xdd
> > > > > 611b3ce8:  [<6005eae3>] __do_fault+0x7e/0x3ca
> > > > > 611b3d68:  [<6005ee60>] do_linear_fault+0x31/0x33
> > > > > 611b3d88:  [<6005f149>] handle_mm_fault+0x14e/0x246
> > > > > 611b3da8:  [<60120a7b>] __up_read+0x73/0x7b
> > > > > 611b3de8:  [<60013177>] handle_page_fault+0x11f/0x23b
> > > > > 611b3e48:  [<60013419>] segv+0xac/0x297
> > > > > 611b3f28:  [<60013367>] segv_handler+0x68/0x6e
> > > > > 611b3f48:  [<600232ad>] get_skas_faultinfo+0x9c/0xa1
> > > > > 611b3f68:  [<60023853>] userspace+0x13a/0x19d
> > > > > 611b3fc8:  [<60010d58>] fork_handler+0x86/0x8d
> > > >
> > > > OK, that's different.  Someone broke the vm - order-2 GFP_KERNEL
> > > > allocations aren't supposed to fail.
> > > >
> > > > I'm suspecting that did_some_progress thing.
> > >
> > > The allocation didn't fail -- it invoked the OOM killer because the
> > > kernel ran out of unfragmented memory.
> >
> > We can't "run out of unfragmented memory" for an order-2 GFP_KERNEL
> > allocation in this workload.  We go and synchronously free stuff up to make
> > it work.
> >
> > How did this get broken?
> 
> Either no more order-2 pages could be freed, or the ones that were being
> freed were being used by something else (eg. other order-2 slab allocations).

No.  The current design of reclaim (for better or for worse) is that for
order 0,1,2 and 3 allocations we just keep on trying until it works.  That
got broken and I think it got broken at a design level when that
did_some_progress logic went in.  Perhaps something else we did later
worsened things.

>
> > > Probably because higher order
> > > allocations are the new vogue in -mm at the moment ;)
> >
> > That's a different bug.
> >
> > bug 1: We shouldn't be doing higher-order allocations in slub because of
> > the considerable damage this does to atomic allocations.
> >
> > bug 2: order-2 GFP_KERNEL allocations shouldn't fail like this.
> 
> I think one causes 2 as well -- it isn't just considerable damage to atomic
> allocations but to GFP_KERNEL allocations too.

Well sure, because we already broke GFP_KERNEL allocations.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: [15/17] SLUB: Support virtual fallback via SLAB_VFALLBACK
  - From: Nick Piggin <nickpiggin@yahoo.com.au>

References:
- [00/17] [RFC] Virtual Compound Page Support
  - From: Christoph Lameter <clameter@sgi.com>
- Re: [15/17] SLUB: Support virtual fallback via SLAB_VFALLBACK
  - From: Nick Piggin <nickpiggin@yahoo.com.au>
- Re: [15/17] SLUB: Support virtual fallback via SLAB_VFALLBACK
  - From: Andrew Morton <akpm@linux-foundation.org>
- Re: [15/17] SLUB: Support virtual fallback via SLAB_VFALLBACK
  - From: Nick Piggin <nickpiggin@yahoo.com.au>

Prev by Date: Re: [PATCH] cpuset and sched domains: sched_load_balance flag
Next by Date: Re: [PATCH] Version 3 (2.6.23-rc8) Smack: Simplified Mandatory Access Control Kernel
Previous by thread: Re: [15/17] SLUB: Support virtual fallback via SLAB_VFALLBACK
Next by thread: Re: [15/17] SLUB: Support virtual fallback via SLAB_VFALLBACK
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]