Re: [RFC] [PATCH 0/3] Add group fairness to CFS

On Wed, May 23, 2007 at 11:03:16AM -0700, William Lee Irwin III wrote:
> Well, SMP load balancing is what makes all this hard.

Agreed. I am optimistic that we can achieve good degree of SMP
fairness using similar mechanism as smpnice ..

> On Wed, May 23, 2007 at 10:18:59PM +0530, Srivatsa Vaddagiri wrote:
> > Salient points which needs discussion:
> > 1. This patch reuses CFS core to achieve fairness at group level also.
> >    To make this possible, CFS core has been abstracted to deal with generic 
> >    schedulable "entities" (tasks, users etc).
> 
> The ability to handle deeper hierarchies would be useful for those
> who want such semantics.

sure, although the more levels of hierarchy scheduler recoginizes, more
the (accounting/scheduling) cost is!

> On Wed, May 23, 2007 at 10:18:59PM +0530, Srivatsa Vaddagiri wrote:
> > 2. The per-cpu rb-tree has been split to be per-group per-cpu.
> >    schedule() now becomes two step on every cpu : pick a group first (from
> >    group rb-tree) and a task within that group next (from that group's task
> >    rb-tree)
> 
> That assumes per-user scheduling groups; other configurations would
> make it one step for each level of hierarchy. It may be possible to
> reduce those steps to only state transitions that change weightings
> and incremental updates of task weightings. By and large, one needs
> the groups to determine task weightings as opposed to hierarchically
> scheduling, so there are alternative ways of going about this, ones
> that would even make load balancing easier.

Yeah I agree that providing hierarchical group-fairness at the cost of single 
(or fewer) scheduling levels would be a nice thing to target for,
although I don't know of any good way to do it. Do you have any ideas
here? Doing group fairness in a single level, using a common rb-tree for tasks 
from all groups is very difficult IMHO. We need atleast two levels.

One possibility is that we recognize deeper hierarchies only in user-space,
but flatten this view from kernel perspective i.e some user space tool
will have to distributed the weights accordingly in this flattened view
to the kernel.

> On Wed, May 23, 2007 at 10:18:59PM +0530, Srivatsa Vaddagiri wrote:
> > 3. Grouping mechanism - I have used 'uid' as the basis of grouping for
> >    timebeing (since that grouping concept is already in mainline today).
> >    The patch can be adapted to a more generic process grouping mechanism
> >    (like http://lkml.org/lkml/2007/4/27/146) later.
> 
> I'd like to see how desirable the semantics achieved by reflecting
> more of the process hierarchy structure in the scheduler groupings are.
> Users, sessions, pgrps, and thread_groups would be the levels of
> hierarchy there, where some handling of orphan pgrps is needed.

Good point. Essentially all users should get fair cpu first, then all
sessions/pgrps under a user should get fair share, followed by
process-groups under a session, followed by processes in a
process-group, followed by threads in a process (phew) .. ?

The container patches by Paul Menage at http://lkml.org/lkml/2007/4/27/146
provide a generic enough mechanism to group tasks in a hierarchical
manner for each resource controller. For ex: for the cpu controller, if the 
desired fairness is as per the above scheme (user/session/pgrp/threads etc), 
then it is possible to write a script which creates such a tree under cpu
controller filesystem:

	# mkdir /dev/cpuctl
	# mount -t container -o cpuctl none /dev/cpuctl

/dev/cpuctl is the cpu controller filesystem which can look like this:

	/dev/cpuctl
		|----uid root
		|       |-- sid 10
		|	|      |------ pgrp 20
		| 	|      | 	  |-- process 100
		|	|      |          |-- process 101
		|	|      |          |
		|       |
		|	|-- sid 11
		|
		|--- uid guest

(If the cpu controller really supports those many levels that is!)

user scripts can be written to modify this filesystem tree upon every 
login/session/user creation (if that is possible to trap on). Essentially it 
lets this semantics (what you ask) be dynamic/tunable by user.

> Kernel compiles are markedly poor benchmarks. Try lat_ctx from lmbench,
> VolanoMark, AIM7, OAST, SDET, and so on.

Thanks for this list of tests. I intend to run all of them if possible
for my next version.

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: [RFC] [PATCH 0/3] Add group fairness to CFS
  - From: "Li, Tong N" <[email protected]>

References:
- [RFC] [PATCH 0/3] Add group fairness to CFS
  - From: Srivatsa Vaddagiri <[email protected]>

Prev by Date: Re: BUG in 2.6.22-rc2-mm1: NIC module b44.c broken (Broadcom 4400)
Next by Date: Re: [PATCH 1/2] msi: Invert the sense of the MSI enables.
Previous by thread: Re: [RFC] [PATCH 0/3] Add group fairness to CFS
Next by thread: Re: [RFC] [PATCH 0/3] Add group fairness to CFS
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]