Re: [PATCH 2/6] sched: make sched_slice() group scheduling savvy

On Thu, 2007-11-01 at 22:01 +0530, Srivatsa Vaddagiri wrote:
> On Thu, Nov 01, 2007 at 01:20:08PM +0100, Peter Zijlstra wrote:
> > On Thu, 2007-11-01 at 13:03 +0100, Peter Zijlstra wrote:
> > > On Thu, 2007-11-01 at 12:58 +0100, Peter Zijlstra wrote:
> > > 
> > > > > sched_slice() is about lantecy, its intended purpose is to ensure each
> > > > > task is ran exactly once during sched_period() - which is
> > > > > sysctl_sched_latency when nr_running <= sysctl_sched_nr_latency, and
> > > > > otherwise linearly scales latency.
> > > 
> > > The thing that got my brain in a twist is what to do about the non-leaf
> > > nodes, for those it seems I'm not doing the right thing - I think.
> > 
> > Ok, suppose a tree like so:
> > 
> > 
> > level 2                   cfs_rq
> >                        A           B
> > 
> > level 1             cfs_rqA     cfs_rqB
> >                      A0        B0 - B99
> > 
> > 
> > So for sake of determinism, we want to impose a period in which all
> > level 1 tasks will have ran (at least) once.
> 
> Peter,
> 	I fail to see why this requirement to "determine a period in
> which all level 1 tasks will have ran (at least) once" is essential.

Because it gives a steady feel to things. For humans its most essential
that things run in a predicable fashion. So no only does it matter how
much time a task gets, it also very much matters when it gets that.

It contributes to the feeling of gradual slow down. 

> I am visualizing each of the groups to be similar to Xen-like partitions
> which are given fair timeslices by the hypervisor (Linux kernel in this
> case). How each partition (group in this case) manages the allocated
> timeslice(s) to provide fairness to tasks within that partition/group should not
> (IMHO) depend on other groups and esp. how many tasks other groups has.

Agreed, I've realised this since my last mail, one group should not
influence another in such a fashion, in this respect you don't want to
flatten it like I did.

> For ex: before this patch, fair time would be allocated to group and
> their tasks as below:
> 
>     A0       B0-B9     A0    B10-B19     A0     B20-B29
>  |--------|--------|--------|--------|--------|--------|-----//--| 
>  0       10ms	   20ms	   30ms     40ms     50ms     60ms
> 
> i.e during the first 10ms allocated to group B, B0-B9 run,
>     during the next  10ms allocated to group B, B10-B19 run etc
> 
> What's wrong with this scheme?

What made me start tinkering here is that the nested level is again
distributing wall-time without being aware of the fraction it gets from
the upper levels.

So if we have two groups, A and B, and B is selected for 1/2 of period,
then Bn will again divide period, even though in reality it will only
have p/2.

So I guess, I need a top down traversal, not a bottom up traversal to
get this fixed up... I'll ponder this.

> By letting __sched_period() be determined for each group independently,
> we are building stronger isolation between them, which is good IMO
> (imagine a rogue container that does a fork bomb).

Agreed.

> > Index: linux-2.6/kernel/sched_fair.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/sched_fair.c
> > +++ linux-2.6/kernel/sched_fair.c
> > @@ -341,7 +341,7 @@ static u64 sched_slice(struct cfs_rq *cf
> >  		do_div(slice, cfs_rq->load.weight);
> >  	}
> > 
> > -	return slice;
> > +	return min_t(u64, sysctl_sched_latency, slice);

> which seems to be more or less giving what we already have w/o the
> patch?

Well, its basically giving up on overload, admittedly not very nice.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

References:
- [PATCH 0/6] various scheduler patches
  - From: Peter Zijlstra <a.p.zijlstra@chello.nl>
- [PATCH 2/6] sched: make sched_slice() group scheduling savvy
  - From: Peter Zijlstra <a.p.zijlstra@chello.nl>
- Re: [PATCH 2/6] sched: make sched_slice() group scheduling savvy
  - From: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
- Re: [PATCH 2/6] sched: make sched_slice() group scheduling savvy
  - From: Peter Zijlstra <a.p.zijlstra@chello.nl>
- Re: [PATCH 2/6] sched: make sched_slice() group scheduling savvy
  - From: Peter Zijlstra <peterz@infradead.org>
- Re: [PATCH 2/6] sched: make sched_slice() group scheduling savvy
  - From: Peter Zijlstra <peterz@infradead.org>
- Re: [PATCH 2/6] sched: make sched_slice() group scheduling savvy
  - From: Peter Zijlstra <peterz@infradead.org>
- Re: [PATCH 2/6] sched: make sched_slice() group scheduling savvy
  - From: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>

Prev by Date: Re: per-bdi-throttling: synchronous writepage doesn't work correctly
Next by Date: Re: [PATCH 3/16] read/write_crX, clts and wbinvd for 64-bit paravirt
Previous by thread: Re: [PATCH 2/6] sched: make sched_slice() group scheduling savvy
Next by thread: [PATCH 5/6] sched: SCHED_FIFO/SCHED_RR watchdog timer
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]