Re: [PATCH 1/5] cpuset memory spread basic implementation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> I'm wondering whether it would be 
> enough to simply extend madvise and fadvise to 'task' scope as well, and 
> change the pagecache allocation pattern to 'spread out' pages on NUMA, 
> if POSIX_FADV_RANDOM / MADV_RANDOM is specified.

This would not seem to work, at least for the needs I am aware of.

The tasks we are talking about do -not- want a default RANDOM
policy.  They want node-local allocation for per-thread data
(data and stack for example), and at the same time spread
allocation for kernel space (page and slab cache).

For another thing, memory spreading is not the same as RANDOM
policies.  RANDOM policies apply to page read ahead and retention
strategies, not to page placement (what node they are faulted into)
policies.

For a third thing, madvise takes a virtual address range, which
is irrelevant for specifying kernel address space page and slab
cache pages.

But the biggest difficulty, from my perspective, would be that
this strategy is normally selected per-cpuset, not per-task.

We are managing memory placement across the cpuset, on a per-job
basis.  It's the system administrator or batch scheduler, not the
individual application coder, who will likely want to enforce this
alternative memory placement strategy.

The madvise and posix_fadvise calls have no provision for one task
to affect another.  They just apply to the current task.  As Andrew
noted, it doesn't make much sense for different tasks in the same
cpuset to disagree on this placement policy choice.  We need a cpuset
wide policy choice, and a cpuset wide mechanism for making the choice,
not a per-task task internal mechanism.

Or at the very least, an attribute that is inherited across fork
and exec, so that a job grandfather (founding father) task can
set the policy, for all descendents.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <[email protected]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux