Re: [ckrm-tech] [PATCH 0/4] sched: Add CPU rate caps

Peter Williams wrote:

Balbir Singh wrote:
Peter Williams wrote:
Andrew Morton wrote:
On Sun, 18 Jun 2006 18:26:38 +1000
Peter Williams <[email protected]> wrote:
People are going to want to extend this to capping a *group* of tasks,with
some yet-to-be-determined means of tying those tasks together.  How well
suited is this code to that extension?
Quite good. It can be used from outside the scheduler to impose caps onarbitrary groups of tasks. Were the PAGG interface available I couldknock up a module to demonstrate this. When/if the "task watchers"patch is included I will try and implement a higher level mechanismusing that. The general technique is to get an estimate of the"effective number" of tasks in the group (similar to load) and give eachtask in the group a cap which is the group's cap divided by theeffective number of tasks (or the group cap whichever is smaller -- i.e.the effective number of tasks could be less than one).
)
There is one possible issue with this approach. Lets assume that we desire
a cap of 10 for a set of two tasks. As discussed earlier, each task
would get a limit of 5% if they are equally busy.

Lets call the group as G1 and the tasks as T1 and T2.

If we have another group called G2 with tasks T3, T4 and T5 and a soft
cap of 90. Then each of T3, T4 and T5 would get a soft cap of
30% (assuming that they are equally busy). Now if T5 stops using its limit
for a while let say its cpu utilization is 10% - how do we divide the saved
20% between T1, T2, T3 and T4.

In a group scenario, the balance 20% should be shared between T3 and T4.
You're mixing up the method described above with the other one wediscussed where the group's cap is divided among its tasks in proportionto their demand. With the model I describe above reduced demand by anytasks in a group would be reflected in a reduced value for the"effective number of tasks" in the group with a consequent increase inthe cap applied to all group members.


Thanks for clarifying. How frequently is the reduction in effective number
of tasks calculated and how frequently is the cap updated? Does it require
setting the cap values of all the tasks in the group again (O(N), N is the
number of tasks in the group)? Is it possible that the effective tasks
is greater than the limit of the group? How do we handle this scenario?

I think both methods will work and the main difference would be in theircomplexity.


An implementation or prototype when available will be interesting to play
around and experiment with. I think it will help clarify if the task mechanism
will indeed work for groups or may expose some limitations of the mechanism.

Also mathematically

A group is a superset of task

It is hard to implement things for a task and make it work for groups,
I disagree. If the low level control is there at the task level or (ifwe were managing memory) the address space level then it is relativelysimple (even if boring) to do arbitrary resource control for groups fromthe outside.
One of the key advantages of doing it from the outside is that anylocking that is required at the group level is unlikely to get tangledup with the existing locking mechanisms such as the run queue lock.This is not true if group management is done on the inside e.g. in thescheduling code.


The f-series controller from ckrm does so without changing or getting
tangled with the existing locking system.

but if we had something for groups, we could easily adapt it to tasks
by making each group equal to a task
You seem to have a flair for adding unnecessary overhead for those whowon't use this functionality. :-)
Doing it inside the scheduler is also doable but would have some lockingissues. The run queue lock could no longer be used to protect the dataas there's no guarantee that all the tasks in the group are associatedwith the same queue.
I should have elaborated here that (conceptually) modifying this code toapply caps to groups of tasks instead of individual tasks is simple. Itmainly involves moving most the data (statistics plus cap values) to agroup structure and then modifying the code to update statistics for thegroup instead of the task and then make the decisions about whether atask should have a cap enforced (i.e. moved to one of the soft cappriorities or sin binned) based on the group statistics.
However, maintaining and accessing the group statistics will requireadditional locking as the run queue lock will no longer be able toprotect the data as not all tasks in the group will be associated withthe same CPU. Care will be needed to ensure that this new lockingdoesn't lead to dead locks with the run queue locks.
In addition to the extra overhead caused by these locking requirements,the code for gathering the statistics will need to be more complex alsoadding to the overhead. There is also the issue of increasedserialization (there is already some due to load balancing) of taskscheduling to be considered although, to be fair, this increasedserialization will be within groups.


The f-series CPU controller does all of what you say in 403 lines (including
comments and copyright). I think the biggest advantage of maintaining the
group statistics in the kernel is that certain scheduling decisions can be
made based on group statistics rather than task statistics, which makes the
mechanism independent of the number of tasks in the group (isolates the
groups from changes in number of tasks).

If we can achieve something similar with low overhead in user space, I would
certainly love to see it.

If the task can exceed its cap without impacting any other tasks (ie:there
is spare idle capacity), what happens?
That's the difference between soft and hard caps. If it's a soft capthen the task is allowed to exceed it if there's spare capacity. Ifit's a hard cap it's not.
By how much is the task allowed to exceed if there is spare capacity?
Up to the amount of spare capacity.
Will the spare capacity allocation require resetting of caps to implement
the new caps?
No.  It's part of the soft cap mechanism.
I trust that spare capacity gets
used?  (Is this termed "work conserving"?)
Soft caps, yes.  Hard caps, no.
In summary, these patches are a good basis for doing capping for groupsof tasks by at least two means:
1. modification to do group capping in the scheduler, or
2. implementing group capping from outside the scheduler.
I intend spending more effort looking at the second of these optionsthan looking at the first.
Peter



--
	Double the cheers,
	Balbir Singh,
	Linux Technology Center,
	IBM Software Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: [ckrm-tech] [PATCH 0/4] sched: Add CPU rate caps
  - From: Peter Williams <[email protected]>
- Re: [ckrm-tech] [PATCH 0/4] sched: Add CPU rate caps
  - From: Peter Williams <[email protected]>

References:
- [PATCH 0/4] sched: Add CPU rate caps
  - From: Peter Williams <[email protected]>
- Re: [PATCH 0/4] sched: Add CPU rate caps
  - From: Andrew Morton <[email protected]>
- Re: [PATCH 0/4] sched: Add CPU rate caps
  - From: Peter Williams <[email protected]>
- Re: [PATCH 0/4] sched: Add CPU rate caps
  - From: Balbir Singh <[email protected]>
- Re: [ckrm-tech] [PATCH 0/4] sched: Add CPU rate caps
  - From: Peter Williams <[email protected]>

Prev by Date: Re: [patch 0/5] [PATCH,RFC] vfs: per-superblock unused dentries list (2nd version)
Next by Date: Re: Linux v2.6.17
Previous by thread: Re: [ckrm-tech] [PATCH 0/4] sched: Add CPU rate caps
Next by thread: Re: [ckrm-tech] [PATCH 0/4] sched: Add CPU rate caps
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]