Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

Linus Torvalds wrote:

On Wed, 18 Apr 2007, Matt Mackall wrote:
On Wed, Apr 18, 2007 at 07:48:21AM -0700, Linus Torvalds wrote:
And "fairness by euid" is probably a hell of a lot easier to do thantrying to figure out the wakeup matrix.
For the record, you actually don't need to track a whole NxN matrix
(or do the implied O(n**3) matrix inversion!) to get to the same
result.
I'm sure you can do things differently, but the reason I think "fairnessby euid" is actually worth looking at is that it's pretty much the*identical* issue that we'll have with "fairness by virtual machine" and anumber of other "container" issues.
The fact is:
- "fairness" is *not* about giving everybody the same amount of CPU time(scaled by some niceness level or not). Anybody who thinks that is"fair" is just being silly and hasn't thought it through.
- "fairness" is multi-level. You want to be fair to threads within athread group (where "process" may be one good approximation of what a"thread group" is, but not necessarily the only one).
But you *also* want to be fair in between those "thread groups", andthen you want to be fair across "containers" (where "user" may be onesuch container).
So I claim that anything that cannot be fair by user ID is actually reallyREALLY unfair. I think it's absolutely humongously STUPID to callsomething the "Completely Fair Scheduler", and then just be fair on athread level. That's not fair AT ALL! It's the anti-thesis of being fair!
So if you have 2 users on a machine running CPU hogs, you should *first*try to be fair among users. If one user then runs 5 programs, and theother one runs just 1, then the *one* program should get 50% of the CPUtime (the users fair share), and the five programs should get 10% of CPUtime each. And if one of them uses two threads, each thread should get 5%.
So you should see one thread get 50& CPU (single thread of one user), 4threads get 10% CPU (their fair share of that users time), and 2 threadsget 5% CPU (the fair share within that thread group!).
Any scheduling argument that just considers the above to be "7 threadstotal" and gives each thread 14% of CPU time "fairly" is *anything* butfair. It's a joke if that kind of scheduler then calls itself CFS!
And yes, that's largely what the current scheduler will do, but at leastthe current scheduler doesn't claim to be fair! So the current scheduleris a lot *better* if only in the sense that it doesn't make ridiculousclaims that aren't true!
			Linus

Sounds a lot like the PLFS (process level fair sharing) scheduler inAurema's ARMTech (for whom I used to work). The "fair" in the title isa bit misleading as it's all about unfair scheduling in order to meetspecific policies. But it's based on the principle that if you canallocate CPU band width "fairly" (which really means in proportion tothe entitlement each process is allocated) then you can allocate CPUband width "fairly" between higher level entities such as processgroups, users groups and so on by subdividing the entitlements downwards.

The tricky part of implementing this was the fact that not all entitiesat the various levels have sufficient demand for CPU band width to usetheir entitlements and this in turn means that the entities above themwill have difficulty using their entitlements even if other of theirsubordinates have sufficient demand (because their entitlements will betoo small). The trick is to have a measure of each entity's demand forCPU bandwidth and use that to modify the way entitlement is dividedamong subordinates.

As a first guess, an entity's CPU band width usage is an indicator ofdemand but doesn't take into account unmet demand due to tasks waitingon a run queue waiting for access to the CPU. On the other hand, usageplus time waiting on the queue isn't a good measure of demand either(although it's probably a good upper bound) as it's unlikely that thetask would have used the same amount of CPU as the waiting time if ithad gone straight to the CPU.

But my main point is that it is possible to build schedulers that canachieve higher level scheduling policies. Versions of PLFS work onWindows from user space by twiddling process priorities. Part of mymore recent work at Aurema had been involved in patching Linux'sscheduler so that nice worked more predictably so that we could releasea user space version of PLFS for Linux. The other part was to add hardCPU band width caps for processes so that ARMTech could enforce hard CPUbandwidth caps on higher level entities (as this can't be done withoutthe kernel being able to do it at that level.

Peter
--
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

References:
- Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
  - From: William Lee Irwin III <wli@holomorphy.com>
- Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
  - From: Nick Piggin <npiggin@suse.de>
- Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
  - From: William Lee Irwin III <wli@holomorphy.com>
- Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
  - From: Nick Piggin <npiggin@suse.de>
- Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
  - From: Matt Mackall <mpm@selenic.com>
- Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
  - From: Nick Piggin <npiggin@suse.de>
- Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
  - From: Matt Mackall <mpm@selenic.com>
- Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
  - From: Nick Piggin <npiggin@suse.de>
- Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
  - From: Matt Mackall <mpm@selenic.com>
- Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
  - From: Linus Torvalds <torvalds@linux-foundation.org>
- Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
  - From: Matt Mackall <mpm@selenic.com>
- Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
  - From: Linus Torvalds <torvalds@linux-foundation.org>

Prev by Date: Re: [PATCH][RFC] Kill off legacy power management stuff.
Next by Date: Re: [ck] Announce - Staircase Deadline cpu scheduler v0.41
Previous by thread: Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
Next by thread: Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]