Re: [PATCH 1/5] cpuset memory spread basic implementation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



* Paul Jackson <[email protected]> wrote:

> First it might be most useful to explain a detail of your proposal 
> that I don't get, which is blocking me from considering it seriously.
> 
> I understand mount options, but I don't know what mechanisms (at the 
> kernel-user API) you have in mind to manage per-directory and per-file 
> options.

well, i thought of nothing overly complex: it would have to be a 
persistent flag attached to the physical inode. Lets assume XFS added 
this - e.g. as an extended attribute. That raw inode attribute/flag gets 
inherited by dentries, and propagates down into child dentries. (There 
is a global default that the root dentry starts with, and mountpoints 
may override the flag too.) If any directory down in the hierarchy has a 
different flag, it overrides the current one. No flag means "inherit 
parent's flag". So there would be 3 possible states for every inode:

 - default: the vast majority of inodes would have no flag set

 - some would have a 'cache:local' flag

 - some would have a 'cache:global' flag

which would result in every inode getting flagged as either 'local' or 
'global'. When the pagecache (and inode/dentry cache) gets populated, 
the kernel will always know what the current allocation strategy is for 
any given object:

- if an inode ends up being flagged as 'global', then all its pagecache 
  allocations will be roundrobined across nodes.

- if an inode is flagged 'local', it will be allocated to the node/cpu 
  that makes use of it.

workloads may share the same object and may want to use it in different 
ways. E.g. there's one big central database file, and one job uses it in 
a 'local' way, another one uses it in a 'global' way. Each job would 
have to set the attribute to the right value. Setting the flag for the 
inode results in all existing pages for that inode to be flushed. The 
jobs need to serialize their access to the object, as the kernel can 
only allocate according to one policy.

	Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux