Re: [patch] CFS scheduler, v3

Peter Williams wrote:

Ingo Molnar wrote:
* Peter Williams <[email protected]> wrote:
- bugfix: use constant offset factor for nice levels instead of
sched_granularity_ns. Thus nice levels work even if someone setssched_granularity_ns to 0. NOTE: nice support is still naive, i'lladdress the many nice level related suggestions in -v4.
I have a suggestion I'd like to make that addresses both nice andfairness at the same time. As I understand the basic principlebehind this scheduler it to work out a time by which a task shouldmake it onto the CPU and then place it into an ordered list (based onthis value) of tasks waiting for the CPU. I think that this is agreat idea [...]
yes, that's exactly the main idea behind CFS, and thanks for thecompliment :)
Under this concept the scheduler never really has to guess: everyscheduler decision derives straight from the relatively simpleone-sentence (!) scheduling concept outlined above. Everything thattasks 'get' is something they 'earned' before and all the schedulerdoes are micro-decisions based on math with the nanosec-granularityvalues. Both the rbtree and nanosec accounting are a straightconsequence of this too: they are the tools that allow theimplementation of this concept in the highest-quality way. It'scertainly a very exciting experiment to me and the feedback 'from thefield' is very promising so far.
[...] and my suggestion is with regard to a method for working outthis time that takes into account both fairness and nice.
First suppose we have the following metrics available in addition towhat's already provided.
rq->avg_weight_load /* a running average of the weighted load on theCPU */ p->avg_cpu_per_cycle /* the average time in nsecs that pspends on the CPU each scheduling cycle */
yes. rq->nr_running is really just a first-level approximation ofrq->raw_weighted_load. I concentrated on the 'nice 0' case initially.
I appreciate that the notion of basing the expected wait on thetask's average cpu use per scheduling cycle is counter intuitive butI believe that (if you think about it) you'll see that it actuallymakes sense.
hm. So far i tried to not do any statistical approach anywhere: thep->wait_runtime metric (which drives the task ordering) is in essencean absolutely precise 'integral' of the 'expected runtimes' that thetask observes and hence is a precise "load-average as observed by thetask"
To me this is statistics :-)
in itself. Every time we base some metric on an average value weintroduce noise into the system.
i definitely agree with your suggestion that CFS should use anice-scaled metric for 'load' instead of the current rq->nr_running,but regarding the basic calculations i'd rather lean towards usingrq->raw_weighted_load. Hm?
This can result in jerkiness (in my experience) but using the smoothedversion is certainly something that can be tried later rather thansooner. Perhaps just something to bear in mind as a solution to"jerkiness" if it manifests.
your suggestion concentrates on the following scenario: if a taskhappens to schedule in an 'unlucky' way and happens to hit a busyperiod while there are many idle periods. Unless i misunderstood yoursuggestion, that is the main intention behind it, correct?
You misunderstand (that's one of my other schedulers :-)). This one'sbased on the premise that if everything happens as the task expects itwill get the amount of CPU bandwidth (over this short period) that it'sentitled to. In reality, sometimes it will get more and sometimes lessbut on average it should get what it deserves. E.g. If you had two taskswith equal nice and both had demands of 90% of a CPU you'd expect themeach to get about half of the CPU bandwidth. Now suppose that one ofthem uses 5ms of CPU each time it got onto the CPU and the other uses10ms. If these two tasks just round robin with each other the likelyoutcome is that the one with the 10ms bursts will get twice as much CPUas the other but my proposed method should prevent and cause them to getroughly the same amount of CPU. (I believe this was a scenario thatcaused problems with O(1) and required a fix at some stage?)
BTW this has the advantage that the decay rate used in calculating thetask's statistics can be used to control how quickly the schedulerreacts to changes in the task's behaviour.

I think that, with this model, if the current task hasn't surrenderedthe CPU when the next task on the queue's "on CPU" time arrives that thecurrent task should be pre-empted in favour of that task. I'm not surewhat would be the best way to implement this.


Peter
--
Peter Williams                                   [email protected]

"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: [patch] CFS scheduler, v3
  - From: Peter Williams <[email protected]>

References:
- [patch] CFS scheduler, v3
  - From: Ingo Molnar <[email protected]>
- Re: [patch] CFS scheduler, v3
  - From: Peter Williams <[email protected]>
- Re: [patch] CFS scheduler, v3
  - From: Ingo Molnar <[email protected]>
- Re: [patch] CFS scheduler, v3
  - From: Peter Williams <[email protected]>

Prev by Date: Re: [RFC PATCH(experimental) 2/2] Fix freezer-kthread_stop race
Next by Date: Re: [Devel] Re: [PATCH] bluetooth bnep: Convert to kthread API.
Previous by thread: Re: [patch] CFS scheduler, v3
Next by thread: Re: [patch] CFS scheduler, v3
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]