Re: [Announce] [patch] Modular Scheduler Core and Completely Fair

Pine.LNX.4.64.0704181515290.25880 () alien ! or ! mcafeemobile ! com

Davide Libenzi wrote:
> On Wed, 18 Apr 2007, Ingo Molnar wrote:
> > That's one reason why i dont think it's necessarily a good idea to
> > group-schedule threads, we dont really want to do a per thread group
> > percpu_alloc().
>
> I still do not have clear how much overhead this will bring into the
> table, but I think (like Linus was pointing out) the hierarchy should look
> like:
...
> The "run_queue" concept (and data) that now is bound to a CPU, need to be
> replicated in:
>
> ROOT <- VCPUs add themselves here
>     VCPU <- USERs add themselves here
>         USER <- PROCs add themselves here
>             PROC <- THREADs add themselves here
>                 THREAD (ultimate fine grained scheduling unit)
>
> So ROOT, VCPU, USER and PROC will have their own "run_queue". 
...

I can't comment on the internals about run_queues, overhead and so on, but 
these discussion leads me to the idea about a dynamic *tree* of scheduler 
queues.

With dynamic I mean that they are configured in user-space - be it with 
something like CLONE_NEW_SCHEDULER_CLASS, or possibly better some other 
interface to allow an *arbitrary* tree that is not coupled on the 
user/process/thread borders. New threads and processes are per default 
created in the parents queue, just like now.

So user-space could build an tree like this (eg with a pam module):

 Default queue - init
   +- kernel-thread queue (to avoid having kernel threads being blocked by
   |               user-space)
   +- cron, atd, sshd, .... unless they change their "class"
   +- user1
   |  +- X
   |  +- kde
   |  |  + konsole
   |  |  \ kmail
   |  |    + mail fetch thread
   |  |    + mail filter thread
   |  |    + GUI thread
   |  |  \- mplayer 
   \- user2
      +.....

Whether the queues are handled with some staircase behaviour, or CFS, or just 
get CPU time distributed by nice level, is another question - but they have 
to be "fair" only locally.

Of course, that's simply some sort of moving the problem into user-space - but 
I think (and read that often enough) that the needs vary so much that a 
single, hardcoded system won't suffice. And we can try to get the "right" 
behaviour in each queue, just like now.

Walking the tree might make the scheduler not fully O(1) - but per default 
only one queue is defined (or possibly two queues, one for kernel threads), 
and everything else can be done by user-space.

The mentioned case of a web-server with gzip started would be done with having 
each httpd being in a queue just below init, and having everything else in 
another - or by nicing the webserver, as it's defined as "important".
(I believe that's called "moving policy into userspace" :-)

Regards,

Phil
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Prev by Date: Re: CFS and suspend2: hang in atomic copy
Next by Date: Re: [PATCH] [KERNEL-DOC] kill warnings when building mandocs
Previous by thread: Re: [Announce] [patch] Modular Scheduler Core and Completely Fair
Next by thread: [PATCH] ALPHA: support graphics on non-zero PCI domains (take 2)
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]