* Andrew Morton <[email protected]> wrote:
> We're all on the same page here. I'm questioning whether slab and
> pagecache should be inextricably lumped together though.
>
> Is it possible to integrate the slab and pagecache allocation policies
> more cleanly into a process's mempolicy? Right now, MPOL_* are
> disjoint.
>
> (Why is the spreading policy part of cpusets at all? Shouldn't it be
> part of the mempolicy layer?)
the whole mempolicy design seems to be too coarse: it is a fundamentally
per-node thing, while workloads often share nodes. So it seems to me the
approach Paul took was to make things more finegrained via cpusets - as
that seems to be the preferred method to isolate workloads anyway.
Cpusets are a limited form of virtualization / resource allocation, they
allow the partitioning of a workload to a set of CPUs and a workload's
memory allocations to a set of nodes.
in that sense, if we accept cpusets as the main abstraction for workload
isolation on NUMA systems, it would be a natural and minimal extension
to attach an access pattern hint to the cpuset - which is the broadest
container of the workload. Mempolicies are pretty orthogonal to this and
do not allow the separate handling of two workloads living in two
different cpusets.
once we accept cpusets as the main abstraction, i dont think there is
any fundamentally cleaner solution than the one presented by Paul. The
advantage of having a 'global, per-cpuset' hint is obvious: the
administrator can set it without having to change applications. Since it
is global for the "virtual machine" (that is represented by the cpuset),
the natural controls are limited to kernel entities: slab caches,
pagecache, anonymous allocations.
what feels hacky is the knowledge about kernel-internal caches, but
there's nothing else to control i think. Making it finegrained to the
object level would make it impractical to use in the cpuset abstraction.
if we do not accept cpusets as the main abstraction, then per-task and
per-object hints seem to be the right control - which would have to be
used by the application.
the cpuset solution is certainly simpler to implement: the cpuset is
already available to the memory allocator, so it's a simple step to
extend it. Object-level flags would have to be passed down to the
allocators - we dont have those right now as allocations are mostly
anonymous.
also, maybe application / object level hints are _too_ finegrained: if a
cpuset is used as a container for a 'project', then it's easy and
straightforward to attach an allocation policy to it. Modifying hundreds
of apps, some of which might be legacy, seems impractical - and the
access pattern might very much depend on the project it is used in.
so to me the cpuset level seems to be the most natural point to control
this: it is the level where resources are partitioned, and hence anyone
configuring them should have a good idea about the expected access
patterns of the project the cpuset belongs to. The application writer
has little idea about the circumstances the app gets used in.
if we want to reduce complexity, i'd suggest to consolidate the MPOL_*
mechanism into cpusets, and phase out the mempolicy syscalls. (The sysfs
interface to cpusets is much cleaner anyway.)
Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]