Re: [PATCH] VM: add vm.free_node_memory sysctl

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wednesday 03 August 2005 15:08, Andi Kleen wrote:

> >
> > Hmmm.... What happens if there are already mapped pages (e. g. mapped in
> > the sense that pages are mapped into an address space) on the node and
> > you want to allocate some more, but can't because the node is full of
> > clean page cache pages?   Then one would have to set the memhog argument
> > to the right thing to
>

> If you have a bind policy in the memory grabbing program then the standard
> try_to_free_pages should DTRT. That is because we generated a custom zone
> list only containing nodes in that zone and the zone reclaim only looks
> into those.
>

It may depend on what your definition of DTRT is here.  :-)

As I understand things, if we have a node that has some mapped memory 
allocated, and if one starts up a numactl -bind node memhog nodesize-slop so 
as to clear some clean page cache pages from that node, then unless the 
"slop" is sized in proportion to the amount of mapped memory used on the 
node, then the existing mapped memory will get swapped out in order to 
satisfy the new request.  In addition, clean page-cache pages will get 
discarded.  I think what Martin and I would prefer to see is an interface 
that allows one to just get rid of the clean page cache (or at least enough 
of it) so that additional mapped page allocations will occur locally to the 
node without causing swapping.

AFAIK, the number of mapped pages on the node is not exported to user space 
(by, for example, /sys).   So there is no good way to size the "slop" to 
allow for an existing allocation.  If there was, then using a bound memory 
hog would likely be a reasonable replacement for Martin's syscall to release 
all free page cache, at least for small to medium sized sized systems.

> With prefered or other policies it's different though, in that cases
> t_t_f_p will also look into other nodes because the policy is not binding.
>
> That said it might be probably possible to even make non bind policies more
> aggressive at freeing in the current node before looking into other nodes.
> I think the zone balancing has been mostly tuned on non NUMA systems, so
> some improvements might be possible here.
>
> Most people don't use BIND and changing the default policies like this
> might give NUMA systems a better "out of the box" experience.  However this
> memory balance is very subtle code and easy to break, so this would need
> some care.
>

Of course!

> I don't think sysctls or new syscalls are the way to go here though.
>

The reason we ended up with a sysctl/syscall (to control the aggressiveness 
with which __alloc_pages will try to free page cache before spilling) is that 
deciding whether or not  to spend the effort to free up page cache pages on 
the local node before  spilling is a workload dependent optimization.   For 
an HPC application it is  typically worth the effort to try to free local 
node page cache before spilling off node because the program will run 
sufficiently long to make the improvement due to getting local storage 
dominates the extra cost of doing the page allocation.   For file server 
workloads, for example, it is typically important to minimize the time to do 
the page allocation; if it turns out to be on a remote node it really doesn't 
matter that much.   So it seems to me that we need some way for the 
application to tell the system which approach it prefers based on the type of 
workload it is -- hence the sysctl or syscall approach.

> -Andi

-- 
Ray Bryant
AMD Performance Labs                   Austin, Tx
512-602-0038 (o)                 512-507-7807 (c)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]
  Powered by Linux