Re: [RFC 0/4] Object reclaim via the slab allocator V1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Christoph Lameter <[email protected]> wrote:
>
> We currently have a set of problem with slab reclaim:
> 
> 1. On a NUMA system there are excessive cross node accesses.
>    Locks are taken remotely which leads to a significant slowdown
>    if concurrent reclaim is performeed on the dcache/icache from
>    a number of nodes.
> 
> 2. We need to free an excessive number of elements from the LRU
>    in order to reclaim enough pages from the slab.
> 
> 3. After reclaim we have a large number of sparsely used slab
>    objects.
> 
> The fundamental problem in the current reclaim approaches with the
> icache/dcache is that the reclaim is LRU and object based. Multiple
> objects can share the same slab. So removing one object from a slab
> just removes some cache information that may have been useful later
> but may not give us what we want: More free pages.
> 
> I propose that we replace the LRU based object management by using
> the slab oriented lists in the slab allocator itself for reclaim.
> 
> The slab allocator already has references to all pages used by the
> dcache and icache. It has awareness of which objects are located
> in a certain slab and therefore would be able to free specific
> objects that would make pages freeable if the slab knew
> their state.
> 
> In order to allow the slab allocator to know about an objects
> state we add another flag SLAB_RECLAIM. SLAB_RECLAIM means that
> the following is true of a slab:
> 
> 1. The object contains a reference counter of atomic_t as the
>    first element that follows the following conventions:
> 
>    Counter = 0	-> Object is being freed or is free.
>    Counter = 1  -> Object is not used and contains cache information.
>    		   The object may be freed.
> 
>    Counter > 1	-> Object is in use and cannot be freed.
> 
> 2. A destructor was provided during kmem_cache_create().
>    If SLAB_DTOR_FREE is passed in the flags of the destructor
>    then a best effort attempt will be made to free that object.
> 

It would be better to make the higher-level code register callbacks for
this sort of thing.  That code knows how to determine if an object is
freeable, can manage aging info, etc.

> Memory can then be reclaimed from a slab by calling
> 
> kmem_cache_reclaim(kmem_cache_t *cachep, unsigned long page)
> 
> kmem_cache_reclaim returns an error code or the number of pages reclaimed.
> 
> 
> The reclaim works by walking through the lists of full and partially
> allocated slabs. We begin at the end of thet fully allocated slabs because
> these slabs have been around for a long time (This basically preserves the LRU
> lists to some extend).
> 
> For slab we check all the objects in the slab. If all object have
> a refcount of one then we free all the objects and return the pages of the
> object to the page allocator.

That seems like quite a drawback.  A single refcount=2 object on the page
means that nothing gets freed from that page at all.  It'd be easy
(especially with dcache) to do tons of work without achieving anything.

So it might be better to drop the freeable objects from the page even if
the page has non-freeable objects.  If only because this might make
directory dentries on _other_ pages reclaimable.

But that won't really help much, because the basic problem remains
unsolved: internal fragmentation.

The proposed approach doesn't really solve internal fragmentation.  To do
that we'd need to either:

a) compact dentries by copying them around or, perhaps,

b) make dentry reclaim be guided by the dcache tree: do a bottom-up
   reclaim, or a top-down reclaim when we hit a directory, etc.  Something
   which understands the graph rather than the plain global LRU.



I expect this patchset's approach will help, but I also expect there will
be pathological dcache internal fragmentation patterns (whch can occur
fairly easily) in which it won't help much at all.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux