Re: [RFC] scheduler: improve SMP fairness in CFS

Chris Snook wrote:

Tong Li wrote:
This patch extends CFS to achieve better fairness for SMPs. Forexample, with 10 tasks (same priority) on 8 CPUs, it enables each taskto receive equal CPU time (80%). The code works on top of CFS andprovides SMP fairness at a coarser time grainularity; local on eachCPU, it relies on CFS to provide fine-grained fairness and goodinteractivity.
The code is based on the distributed weighted round-robin (DWRR)algorithm. It keeps two RB trees on each CPU: one is the originalcfs_rq, referred to as active, and one is a new cfs_rq, calledround-expired. Each CPU keeps a round number, initially zero. Thescheduler works exactly the same way as in CFS, but only runs tasksfrom the active tree. Each task is assigned a round slice, equal toits weight times a system constant (e.g., 100ms), controlled bysysctl_base_round_slice. When a task uses up its round slice, it movesto the round-expired tree on the same CPU and stops running. Thus, atany time on each CPU, the active tree contains all tasks that arerunning in the current round, while tasks in round-expired have allfinished the current round and await to start the next round. When anactive tree becomes empty, it calls idle_balance() to grab tasks ofthe same round from other CPUs. If none can be moved over, it switchesits active and round-expired trees, thus unleashing round-expiredtasks and advancing the local round number by one. An invariant itmaintains is that the round numbers of any two CPUs in the systemdiffer by at most one. This property ensures fairness across CPUs. Thevariable sysctl_base_round_slice controls fairness-performancetradeoffs: a smaller value leads to better cross-CPU fairness at thepotential cost of performance; on the other hand, the larger the valueis, the closer the system behavior is to the default CFS without thepatch.
Any comments and suggestions would be highly appreciated.
This patch is massive overkill. Maybe you're not seeing the overhead onyour 8-way box, but I bet we'd see it on a 4096-way NUMA box with apartially-RT workload. Do you have any data justifying the need forthis patch?
Doing anything globally is expensive, and should be avoided at allcosts. The scheduler already rebalances when a CPU is idle, so you'rereally just rebalancing the overload here. On a server workload, wedon't necessarily want to do that, since the overload may be multiplethreads spawned to service a single request, and could be sharing a lotof data.
Instead of an explicit system-wide fairness invariant (which will getvery hard to enforce when you throw SCHED_FIFO processes into the mixand the scheduler isn't running on some CPUs), try a simpler invariant.If we guarantee that the load on CPU X does not differ from the load onCPU (X+1)%N by more than some small constant, then we know that thesystem is fairly balanced. We can achieve global fairness with localbalancing, and avoid all this overhead. This has the added advantage ofkeeping most of the migrations core/socket/node-local onSMT/multicore/NUMA systems.
    -- Chris

To clarify, I'm not suggesting that the "balance with cpu (x+1)%n only"algorithm is the only way to do this. Rather, I'm pointing out thateven an extremely simple algorithm can give you fair loading when youalready have CFS managing the runqueues. There are countless moresophisticated ways we could do this without using global locking, orpossibly without any locking at all, other than the locking we alreadyuse during migration.


	-- Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: [RFC] scheduler: improve SMP fairness in CFS
  - From: "Li, Tong N" <[email protected]>

References:
- [RFC] scheduler: improve SMP fairness in CFS
  - From: Tong Li <[email protected]>
- Re: [RFC] scheduler: improve SMP fairness in CFS
  - From: Chris Snook <[email protected]>

Prev by Date: Re: 2.6.23-rc1: i386 section mismatch warnings
Next by Date: Re: [lm-sensors] drivers/hwmon/lm93.c: array overruns
Previous by thread: Re: [RFC] scheduler: improve SMP fairness in CFS
Next by thread: Re: [RFC] scheduler: improve SMP fairness in CFS
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]