Re: [PATCH 1/5] cpuset memory spread basic implementation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



* Andi Kleen <[email protected]> wrote:

> On Tuesday 07 February 2006 01:19, Ingo Molnar wrote:
> > 
> > * Paul Jackson <[email protected]> wrote:
> > 
> > > First it might be most useful to explain a detail of your proposal 
> > > that I don't get, which is blocking me from considering it seriously.
> > > 
> > > I understand mount options, but I don't know what mechanisms (at the 
> > > kernel-user API) you have in mind to manage per-directory and per-file 
> > > options.
> > 
> > well, i thought of nothing overly complex: it would have to be a 
> > persistent flag attached to the physical inode. Lets assume XFS added 
> > this - e.g. as an extended attribute.
> 
> There used to be a patch floating around to do policy for file caches 
> (or rather arbitary address spaces) It used special ELF headers to set 
> the policy. I thought about these policy EAs long ago. The main reason 
> I never liked them much is that on some EA implementations you have to 
> fetch an separate block to get at the EA. And this policy EA would 
> need to be read all the time, thus adding lots of additional seeks. 
> That didn't seem worth it.

EAs would be fine - and they dont have to be propagated down into the 
hierarchy. There's no reason to propagate them if the lack of EA means 
'inherit from parent'. Btw., not all filesystems need an extra block 
seek - e.g. ext3 embedds them nicely into the raw inode.

> >  - default: the vast majority of inodes would have no flag set
> > 
> >  - some would have a 'cache:local' flag
> > 
> >  - some would have a 'cache:global' flag
> 
> If you do policy you could as well do the full policy states from
> mempolicy.c. Both cache:local and cache:global can be expressed in it.

sure.

> > which would result in every inode getting flagged as either 'local' or 
> > 'global'. When the pagecache (and inode/dentry cache) gets populated, 
> > the kernel will always know what the current allocation strategy is for 
> > any given object:
> 
> In practice it will probably only set for a small minority of objects 
> if at all. I could imagine admining this policy could be a PITA too.

It's so much cleaner and more flexible. I bet it's even a noticeable 
speedup for users of the current cpuset-based approach: the cpuset 
method forces _all_ 'large' objects to be allocated in a spread-out 
form. While with the EA method you can pinpoint those few files (or 
directories) that include the data that needs the spreading out. E.g.  
/tmp would default to 'local' (unless the app creates a big /tmp file, 
for which it should set the spread-out attribute).

another thing: on NUMA, are the pagecache portions of readonly files 
(such as /usr binaries, etc.) duplicated across nodes in current 
kernels, or is it still random which node gets it? This too could be an 
EA caching attribute: whether to create per-node caches for file 
content.

This kind of stuff is precisely what EAs were invented for.

	Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux