Re: NUMA aware slab allocator V3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2005-05-16 at 14:10 -0700, Jesse Barnes wrote:
> On Monday, May 16, 2005 11:08 am, Martin J. Bligh wrote:
> > > I have never seen such a machine. A SMP machine with multiple
> > > "nodes"? So essentially one NUMA node has multiple discontig
> > > "nodes"?
> >
> > I believe you (SGI) make one ;-) Anywhere where you have large gaps
> > in the physical address range within a node, this is what you really
> > need. Except ia64 has this wierd virtual mem_map thing that can go
> > away once we have sparsemem.
> 
> Right, the SGI boxes have discontiguous memory within a node, but it's 
> not represented by pgdats (like you said, one 'virtual memmap' spans 
> the whole address space of a node).  Sparse can help simplify this 
> across platforms, but has the potential to be more expensive for 
> systems with dynamically sized holes, due to the additional calculation 
> and potential cache miss associated with indexing into the correct 
> memmap (Dave can probably correct me here, it's been awhile).  With a 
> virtual memmap, you only occasionally take a TLB miss on the struct 
> page access after indexing into the array.

The sparsemem calculation costs are quite low.  One of the main costs is
bringing the actual 'struct page' into the cache so you can use the
hints in page->flags.  In reality, after almost every pfn_to_page(), you
go ahead and touch the 'struct page' anyway.  So, this cost is
effectively zero.  In fact, it's kinda like doing a prefetch, so it may
even speed some things up.

After you have the section index from page->flags (which costs just a
shift and a mask), you access into a static array, and do a single
subtraction.  Here's the I386) disassembly this function with
SPARSEMEM=y:

        unsigned long page_to_pfn_stub(struct page *page)
        {
                return page_to_pfn(page);
        }

    1c30:       8b 54 24 04             mov    0x4(%esp),%edx
    1c34:       8b 02                   mov    (%edx),%eax
    1c36:       c1 e8 1a                shr    $0x1a,%eax
    1c39:       8b 04 85 00 00 00 00    mov    0x0(,%eax,4),%eax
    1c40:       24 fc                   and    $0xfc,%al
    1c42:       29 c2                   sub    %eax,%edx
    1c44:       c1 fa 05                sar    $0x5,%edx
    1c47:       89 d0                   mov    %edx,%eax
    1c49:       c3                      ret

Other than popping the arguments off the stack, I think there are only
two loads in there: the page->flags load, and the mem_section[]
dereference.  So, in the end, the only advantage of the vmem_map[]
approach is saving that _one_ load.  The worst-case-scenario for this
load in the sparsemem case is a full cache miss.  The worst case in the
vmem_map[] case is a TLB miss, which is probably hundreds of times
slower than even a full cache miss.

BTW, the object footprint of sparsemem is lower than discontigmem, too:

		SPARSEMEM 	DISCONTIGMEM
pfn_to_page:	      25b	         41b
page_to_pfn:	      25b		 33b

So, that helps out things like icache footprint.

-- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux