Re: [patch 3/3] cpusets: add memory_spread_user option

On Thu, 25 Oct 2007, Paul Jackson wrote:

> I'm figuring that when someone looks at the cpuset flag:
> 
> 	memory_spread_user
> 
> they will expect that turning it on will cause user space memory to be
> spread over the nodes of the cpuset.  Sure makes sense that it would
> mean that.
> 
> But, for the most part, it doesn't.  Only tasks that have
> previously called set_mempolicy(MPOL_INTERLEAVE), and only
> after the 'mems' of the cpuset are subsequently changed,
> will have their user memory forced to be spread across the
> cpuset as a result of this flag setting.
> 

And also tasks which are attached to a cpuset that has memory_spread_user 
enabled and had preexisting MPOL_INTERLEAVE memory policies.  
cpuset_attach() rebinds memory policies; if memory_spread_user is set, the 
the interleaved nodemask for the task's mempolicy will be set to the 
cpuset's mems_allowed.

> Any chance, David, that you could have this flag mean:
> 
>   	Spread user memory allocations over the cpuset,
> 	period, anytime it is set, regardless of what
> 	mempolicy calls the task has made and regardless
> 	of whether or not or when the cpusets 'mems' were
> 	last changed.
> 

This would override the custom mempolicies of all tasks attached to the 
cpuset.  All tasks would have MPOL_INTERLEAVE memory policies with a 
nodemask of the cpuset's mems_allowed and could not be changed.

With my current proposal, tasks receive the full interleaved nodemask of 
the cpuset's mems_allowed when they have preexisting MPOL_INTERLEAVE 
memory policies and:

 - the 'mems' change in a cpuset enabled with memory_spread_user,

 - it is attached to a cpuset enabled with memory_spread_user, or

 - memory_spread_page is enabled for its cpuset.

We respect the changes that tasks make to their mempolicies with 
MPOL_INTERLEAVE after any of those three scenarios.

Your change would force all tasks in a memory_spread_user cpuset to have 
MPOL_INTERLEAVE with a nodemask of mems_allowed.  That's very easy to 
define but is going to require additional cpusets to be created with 
duplicate settings (excluding memory_spread_user) if you want different 
behavior for its tasks.  I won't argue against that.

> Most power, or excessive confusion?  Straight forward consistency and
> simple predictability are far more important in almost all cases.  The
> usual exception is when you have a serious use case requiring
> something that can only be done in a more obscure fashion.
> 

I don't think cpuset files such as memory_spread_{page,slab,user} or 
memory_pressure, etc, are completely and accurately descriptive of what 
they do anyway.  That's why anybody who is going to use cpusets is going 
to refer to Documentation/cpusets.txt where the semantics are explicitly 
written.

> There is always a price paid for supporting such complexities in an API
> however, the price being increased confusion, frustration, errors and
> bugs on the part of most users of the API.
> 

We can certainly code it the way you suggested: memory_spread_user 
requires all tasks to be MPOL_INTERLEAVE with a static nodemask of 
mems_allowed.  If this use-case is so narrow that any sane implementation 
would never want several different tasks with different contextualized 
interleaved memory policies in a dynamically changing cpuset, that's the 
perfect way to code it.

> ... Now most likely you will claim you have such a use case, and when
> I ask for it, I will be frustrated at the lack of compelling detail of
> what is going on in user space - what sorts of users, apps and systems
> involved.  Ok, no biggie.  If this goes down that path, then perhaps
> at least I need to reconsider the name:
> 
> 	memory_spread_user
> 

I don't actually have any use-cases where I want two different 
MPOL_INTERLEAVE tasks sharing the same cpuset and only one of them 
adjusted on a mems change and the other to remain static unless I 
explicitly call set_mempolicy().

It all comes down to the decision of whether we want to permit 
set_mempolicy() calls for tasks and respect the nodemask passed in a 
memory_spread_user cpuset.  If so, we must do it my way where the 
set_mempolicy() occurs after attachment or the setting of the flag.  
There's just no other time where you can allow them to differ from the 
memory_spread_user behavior that the cpuset is configured with.

So if you don't have any issue with a hard and fast rule of requiring 
tasks in memory_spread_user cpusets to have MPOL_INTERLEAVE policies with 
the nodemask of mems_allowed and not giving them the option of changing 
it, I have no objection.  I simply coded it so that you could work around 
the cpuset flag through the mempolicy interface.  I don't have any express 
need for it.

		David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: [patch 3/3] cpusets: add memory_spread_user option
  - From: Paul Jackson <[email protected]>

References:
- [patch 1/3] cpusets: extract mmarray loading from update_nodemask
  - From: David Rientjes <[email protected]>
- [patch 2/3] mempolicy: mpol_rebind_policy cleanup
  - From: David Rientjes <[email protected]>
- [patch 3/3] cpusets: add memory_spread_user option
  - From: David Rientjes <[email protected]>
- Re: [patch 3/3] cpusets: add memory_spread_user option
  - From: Paul Jackson <[email protected]>

Prev by Date: Re: sysfs: WARNING: at fs/sysfs/dir.c:424 sysfs_add_one() - with ALSA
Next by Date: Re: [AppArmor 34/45] Factor out sysctl pathname code
Previous by thread: Re: [patch 3/3] cpusets: add memory_spread_user option
Next by thread: Re: [patch 3/3] cpusets: add memory_spread_user option
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]