Balbir Singh wrote:
Peter Williams wrote:
Balbir Singh wrote:
Peter Williams wrote:
Andrew Morton wrote:
On Sun, 18 Jun 2006 18:26:38 +1000
Peter Williams <[email protected]> wrote:
People are going to want to extend this to capping a *group* of
tasks, with
some yet-to-be-determined means of tying those tasks together. How
well
suited is this code to that extension?
Quite good. It can be used from outside the scheduler to impose
caps on arbitrary groups of tasks. Were the PAGG interface
available I could knock up a module to demonstrate this. When/if
the "task watchers" patch is included I will try and implement a
higher level mechanism using that. The general technique is to get
an estimate of the "effective number" of tasks in the group (similar
to load) and give each task in the group a cap which is the group's
cap divided by the effective number of tasks (or the group cap
whichever is smaller -- i.e. the effective number of tasks could be
less than one).
)
There is one possible issue with this approach. Lets assume that we
desire
a cap of 10 for a set of two tasks. As discussed earlier, each task
would get a limit of 5% if they are equally busy.
Lets call the group as G1 and the tasks as T1 and T2.
If we have another group called G2 with tasks T3, T4 and T5 and a soft
cap of 90. Then each of T3, T4 and T5 would get a soft cap of
30% (assuming that they are equally busy). Now if T5 stops using its
limit
for a while let say its cpu utilization is 10% - how do we divide the
saved
20% between T1, T2, T3 and T4.
In a group scenario, the balance 20% should be shared between T3 and T4.
You're mixing up the method described above with the other one we
discussed where the group's cap is divided among its tasks in
proportion to their demand. With the model I describe above reduced
demand by any tasks in a group would be reflected in a reduced value
for the "effective number of tasks" in the group with a consequent
increase in the cap applied to all group members.
Thanks for clarifying. How frequently is the reduction in effective number
of tasks calculated and how frequently is the cap updated?
I'll answer that when I've done an implementation.
Does it require
setting the cap values of all the tasks in the group again (O(N), N is the
number of tasks in the group)?
Probably but it's not on a fast path.
Is it possible that the effective tasks
is greater than the limit of the group?
Yes.
How do we handle this scenario?
You've got the problem back to front. If the number of effective tasks
is less than the group limit then you have the situation that needs
special handling (not the other way around). I.e. if the number of
effective tasks is less than the group limit then (strictly speaking)
there's no need to do any capping at all as the demand is less than the
limit. However, in the case where the group limit is less than one CPU
(i.e. less than 1000) the recommended thing to do would be set the limit
of each task in the group to the group limit.
Obviously, group limits can be greater than one CPU (i.e. 1000).
The number of CPUs on the system also needs to be taken into account for
group capping as if the group cap is greater than the number of CPUs
there's no way it can be exceeded and tasks in this group would not need
any processing.
I think both methods will work and the main difference would be in
their complexity.
An implementation or prototype when available will be interesting to play
around and experiment with. I think it will help clarify if the task
mechanism
will indeed work for groups or may expose some limitations of the
mechanism.
I'm going to start looking at the "task tracking" patches with a view to
using them for an implementation. The "executive overview" for these
patches indicates that they're sufficiently similar to PAGG for this
purpose.
Also mathematically
A group is a superset of task
It is hard to implement things for a task and make it work for groups,
I disagree. If the low level control is there at the task level or
(if we were managing memory) the address space level then it is
relatively simple (even if boring) to do arbitrary resource control
for groups from the outside.
One of the key advantages of doing it from the outside is that any
locking that is required at the group level is unlikely to get tangled
up with the existing locking mechanisms such as the run queue lock.
This is not true if group management is done on the inside e.g. in the
scheduling code.
The f-series controller from ckrm does so without changing or getting
tangled with the existing locking system.
I didn't say it was impossible just in need of care.
but if we had something for groups, we could easily adapt it to tasks
by making each group equal to a task
You seem to have a flair for adding unnecessary overhead for those who
won't use this functionality. :-)
Doing it inside the scheduler is also doable but would have some
locking issues. The run queue lock could no longer be used to
protect the data as there's no guarantee that all the tasks in the
group are associated with the same queue.
I should have elaborated here that (conceptually) modifying this code
to apply caps to groups of tasks instead of individual tasks is
simple. It mainly involves moving most the data (statistics plus cap
values) to a group structure and then modifying the code to update
statistics for the group instead of the task and then make the
decisions about whether a task should have a cap enforced (i.e. moved
to one of the soft cap priorities or sin binned) based on the group
statistics.
However, maintaining and accessing the group statistics will require
additional locking as the run queue lock will no longer be able to
protect the data as not all tasks in the group will be associated with
the same CPU. Care will be needed to ensure that this new locking
doesn't lead to dead locks with the run queue locks.
In addition to the extra overhead caused by these locking
requirements, the code for gathering the statistics will need to be
more complex also adding to the overhead. There is also the issue of
increased serialization (there is already some due to load balancing)
of task scheduling to be considered although, to be fair, this
increased serialization will be within groups.
The f-series CPU controller does all of what you say in 403 lines
(including
comments and copyright). I think the biggest advantage of maintaining the
group statistics in the kernel is that certain scheduling decisions can be
made based on group statistics rather than task statistics, which makes the
mechanism independent of the number of tasks in the group (isolates the
groups from changes in number of tasks).
Yes, that's one of its advantages. Both methods have advantages and
disadvantages.
Peter
--
Peter Williams [email protected]
"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]