On Sun, Oct 22, 2006 at 10:51:08PM -0700, Paul Jackson wrote:
> Nick wrote:
> > A cool option would be to determine the partitions according to the
> > disjoint set of unions of cpus_allowed masks of all tasks. I see this
> > getting computationally expensive though, probably O(tasks*CPUs)... I
> > guess that isn't too bad.
> Yeah - if that would work, from the practical perspective of providing
> us with a useful partitioning (get those humongous sched domains carved
> down to a reasonable size) then that would be cool.
> I'm guessing that in practice, it would be annoying to use. One would
> end up with stray tasks that happened to be sitting in one of the bigger
> cpusets and that did not have their cpus_allowed narrowed, stopping us
> from getting a useful partitioning. Perhaps anilliary tasks associated
> with the batch scheduler, or some paused tasks in an inactive job that
> were it active would need load balancing across a big swath of cpus.
> These would be tasks that we really didn't need to load balance, but they
> would appear as if they needed it because of their fat cpus_allowed.
> Users (admins) would have to hunt down these tasks that were getting in
> the way of a nice partitioning and whack their cpus_allowed down to
> So essentially, one would end up with another userspace API, backdoor
> again. Like those magic doors in the libraries of wealthy protagonists
> in mystery novels, where you have to open a particular book and pull
> the lamp cord to get the door to appear and open.
Also we need to be careful with malicious users partitioning the systems wrongly
based on the cpus_allowed for their tasks.
> Automatic chokes and transmissions are great - if they work. If not,
> give me a knob and a stick.
> Another idea for a cpuset-based API to this ...
> From our internal perspective, it's all about getting the sched domain
> partitions cut down to a reasonable size, for performance reasons.
we need consider resource partitioning too.. and this is where
google interests are probably coming from?
> But from the users perspective, the deal we are asking them to
> consider is to trade in fully automatic, all tasks across all cpus,
> load balancing, in turn for better performance.
> Big system admins would often be quite happy to mark the top cpuset
> as "no need to load balance tasks in this cpuset." They would
> take responsibility for moving any non-trivial, unpinned tasks into
> lower cpusets (or not be upset if something left behind wasn't load
> And the batch scheduler would be quite happy to mark its top cpuset as
> "no need to load balance". It could mark any cpusets holding inactive
> jobs the same way.
> This "no need to load balance" flag would be advisory. The kernel
> might load balance anyway. For example if the batch scheduler were
> running under a top cpuset that was -not- so marked, we'd still have
> to load balance everyone. The batch scheduler wouldn't care. It would
> have done its duty, to mark which of its cpusets didn't need balancing.
> All we need from them is the ok to not load balance certain cpusets,
> and the rest is easy enough. If they give us such ok on enough of the
> big cpusets, we give back a nice performance improvement.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Video 4 Linux]
[Linux for the blind]