On Monday 01 October 2007 04:07, Paul Jackson wrote:
> Nick wrote:
> > The user should just be able to specify exactly the partitioning of
> > tasks required, and cpusets should ask the scheduler to do the best
> > job of load balancing possible.
>
> If the cpusets which have 'sched_load_balance' enabled are disjoint
> (their 'cpus' cpus_allowed masks don't overlap) then you get exactly
> what you're asking for. In that case there is exactly one sched domain
> for the 'cpus' allowed by each cpuset that has sched_load_balanced
> enabled.
But you could do that just by having the current cpuset scheme able
to properly partition the system. You can't (easily) do this now because
you have so many tasks in the root cpuset that it is impossible to know
whether or not you actually want to load balance them.
You would do this by creating partitioning cpusets which carve up the
root cpuset (basically -- have multiple roots).
> But there is another case in which one does not want what you ask for.
>
> That case involves the situation where one is running a third part
> batch scheduler on part of ones big system, and doing other stuff
> (perhaps Ingo's realtime stuff) on another part of the system.
In this case the admin would simply not partition the system (they
would retain a single root cpuset).
Neither approach is really fundamentally more or less powerful than
the other, but what I object to in yours is adding these flags which
don't allow the admin to specify what they want, but to specify how they
want it done.
Moreover, sched_load_balance doesn't really sound like a good name
for asking for a partition. It's more like you're just asking to have better
load balancing over that set, which you could equally achieve by adding
a second set of sched domains (and the global domains could keep
globally balancing).
Basically: the admin doesn't know best when it comes to how the
scheduler should work; the admin knows best about how they intend
the system to be used.
> In that case, the system admin will be advised to turn off
> sched_load_balance on the top cpuset. But in that case the system
> admin will -not- know from moment to moment what jobs the batch
> scheduler is running on the cpus assigned to its control. Only the
> batch scheduler knows that.
>
> The batch scheduler is code that was written by someone else, in
> some other company, some other time. That code does not get to
> control the overall sched domain partitioning of the entire system.
> The batch scheduler gets to say, in affect:
>
> Here's where I need load balancing to occur, in the normal fashion,
> and here's where I don't need it.
Rather than require the admin to know the intricate details about
how and why the scheduler load balancing gets broken, and when they
might or might not need to use this flag, they can just specify what they
want to be done, and the kernel can choose the optimal strategy.
> In short, you insisting that only a single administrative point of
> control determine the systems sched domains. Sometimes that fits
> the way the system is managed, and my patch lets you do that. But
> sometimes this is a shared responsibility, between a piece of third
> party software and the system admin, and my patch allows for that
> case as well.
>
> This is a typical sort of situation that arises from having hierarchical
> cpuset definitions, and highlights the reason (and the use case,
> involving third party batch schedulers) that I went with a hierarchical
> cpuset architecture in the first place.
No, I'm insisting that *no* single administrative point of control
determines the sched domains. Not directly. The kernel should.
cpusets API should be rich enough that the kernel can derive tihs
information from what the admin has intended.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]