Re: [patch] CFS (Completely Fair Scheduler), v2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Gene,

On Tue, Apr 17, 2007 at 12:53:56AM -0400, Gene Heskett wrote:
> On Monday 16 April 2007, Ingo Molnar wrote:
> >this is the second release of the CFS (Completely Fair Scheduler)
> >patchset, against v2.6.21-rc7:
> >
> >   http://redhat.com/~mingo/cfs-scheduler/sched-cfs-v2.patch
> >
> >i'd like to thank everyone for the tremendous amount of feedback and
> >testing the v1 patch got - i could hardly keep up with just reading the
> >mails! Some of the stuff people addressed i couldnt implement yet, i
> >mostly concentrated on bugs, regressions and debuggability.
> >
> >there's a fair amount of churn:
> >
> >   15 files changed, 456 insertions(+), 241 deletions(-)
> >
> >But it's an encouraging sign that there was no crash bug found in v1,
> >all the bugs were related to scheduling-behavior details. The code was
> >tested on 3 architectures so far: i686, x86_64 and ia64. Most of the
> >code size increase in -v2 is due to debugging helpers, they'll be
> >removed later. (The new /proc/sched_debug file can be used to see the
> >fine details of CFS scheduling.)
> >
> >Changes since -v1:
> >
> > - make nice levels less starvable. (reported by Willy Tarreau)
> >
> > - fixed child-runs first. A /proc/sys/kernel/sched_child_runs_first
> >   flag can be used to turn it on/off. (This might fix the Kaffeine bug
> >   reported by S.Ça??lar Onur <)
> >
> > - changed SCHED_FAIR back to SCHED_NORMAL (suggested by Con Kolivas)
> >
> > - UP build fix. (reported by Gabriel C)
> >
> > - timer tick micro-optimization (Dmitry Adamushko)
> >
> > - preemption fix: sched_class->check_preempt_curr method to decide
> >   whether to preempt after a wakeup (or at a timer tick). (Found via a
> >   fairness-test-utility written for CFS by Mike Galbraith)
> >
> > - start forked children with neutral statistics instead of trying to
> >   inherit them from the parent: Willy Tarreau reported that this
> >   results in better behavior on extreme workloads, and it also
> >   simplifies the code quite nicely. Removed sched_exit() and the
> >   ->task_exit() methods.
> >
> > - make nice levels independent of the sched_granularity value
> >
> > - new /proc/sched_debug file listing runqueue details and the rbtree
> >
> > - new SCH-* fields in /proc/<NR>/status to see scheduling details
> >
> > - new cpu-hog feature (off by default) and sysctl tunable to set it:
> >   /proc/sys/kernel/sched_max_hog_history_ns tunable defaults to
> >   0 (off). Positive values are meant the maximum 'memory' that the
> >   scheduler has of CPU hogs.
> >
> > - various code cleanups
> >
> > - added more statistics temporarily: sum_exec_runtime,
> >   sum_wait_runtime.
> >
> > - added -CFS-v2 to EXTRAVERSION
> >
> >as usual, any sort of feedback, bugreports, fixes and suggestions are
> >more than welcome,
> >
> >	Ingo
> 
> This one (v2-rc2) is not a keeper I'm sorry to say, Ingo.  v2-rc0 was much 
> better.  Watching amanda run with htop, kmails composer is being subjected to 
> 5 to 10 second pauses, and htop says that gzip -best isn't getting more that 
> 15% of the cpu, and the /amandatapes drive is being written to in a regular 
> pattern that seems to be the cause of the pauses  according to gkrellm, which 
> also seems to track the size of the writes, and can show anything from 4.3k 
> to 54 megs as being written in one cycle of its screen update.

Have you tried previous version with the fair-fork patch ? It might be possible
that your workload is sensible to the fork()'s child getting much CPU upon
startup.

Ingo, maybe I'm saying something stupid, but in my userland scheduler, when
new tasks are "forked", they are queued at the end of the run queue with a
fixed priority. In our case, this would translate into assigning them the
same prio and timeslice as their parent, but queuing them at the end so that
they don't make existing tasks starve during huge fork() loads.

I don't know how that would be possible (nor if that would help in anything),
but I found it was a good compromise over sharing the timeslice with the
parent. Perhaps we should have some absolute timeslice and some relative
timeslice (eg: X percent of total time divided by the number of tasks) ?

Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux