Re: [RFC] cpuset: add interface to isolated cpus

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


On Sun, Oct 22, 2006 at 10:51:08PM -0700, Paul Jackson wrote:
> Nick wrote:
> > A cool option would be to determine the partitions according to the
> > disjoint set of unions of cpus_allowed masks of all tasks. I see this
> > getting computationally expensive though, probably O(tasks*CPUs)... I
> > guess that isn't too bad.
> Yeah - if that would work, from the practical perspective of providing
> us with a useful partitioning (get those humongous sched domains carved
> down to a reasonable size) then that would be cool.
> I'm guessing that in practice, it would be annoying to use.  One would
> end up with stray tasks that happened to be sitting in one of the bigger
> cpusets and that did not have their cpus_allowed narrowed, stopping us
> from getting a useful partitioning.  Perhaps anilliary tasks associated
> with the batch scheduler, or some paused tasks in an inactive job that
> were it active would need load balancing across a big swath of cpus.
> These would be tasks that we really didn't need to load balance, but they
> would appear as if they needed it because of their fat cpus_allowed.
> Users (admins) would have to hunt down these tasks that were getting in
> the way of a nice partitioning and whack their cpus_allowed down to
> size.
> So essentially, one would end up with another userspace API, backdoor
> again.  Like those magic doors in the libraries of wealthy protagonists
> in mystery novels, where you have to open a particular book and pull
> the lamp cord to get the door to appear and open.

Also we need to be careful with malicious users partitioning the systems wrongly
based on the cpus_allowed for their tasks.

> Automatic chokes and transmissions are great - if they work.  If not,
> give me a knob and a stick.
> ===
> Another idea for a cpuset-based API to this ...
> From our internal perspective, it's all about getting the sched domain
> partitions cut down to a reasonable size, for performance reasons.

we need consider resource partitioning too..  and this is where
google interests are probably coming from?

> But from the users perspective, the deal we are asking them to
> consider is to trade in fully automatic, all tasks across all cpus,
> load balancing, in turn for better performance.
> Big system admins would often be quite happy to mark the top cpuset
> as "no need to load balance tasks in this cpuset."  They would
> take responsibility for moving any non-trivial, unpinned tasks into
> lower cpusets (or not be upset if something left behind wasn't load
> balancing.)
> And the batch scheduler would be quite happy to mark its top cpuset as
> "no need to load balance".  It could mark any cpusets holding inactive
> jobs the same way.
> This "no need to load balance" flag would be advisory.  The kernel
> might load balance anyway.  For example if the batch scheduler were
> running under a top cpuset that was -not- so marked, we'd still have
> to load balance everyone.  The batch scheduler wouldn't care.  It would
> have done its duty, to mark which of its cpusets didn't need balancing.
> All we need from them is the ok to not load balance certain cpusets,
> and the rest is easy enough.  If they give us such ok on enough of the
> big cpusets, we give back a nice performance improvement.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at
Please read the FAQ at

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux