Re: [Announce] [patch] Modular Scheduler Core and Completely Fair

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Al Boldi wrote:
Peter Williams wrote:
Al Boldi wrote:
Peter Williams wrote:
William Lee Irwin III wrote:
On Mon, Apr 16, 2007 at 11:06:56AM +1000, Peter Williams wrote:
PS I no longer read LKML (due to time constraints) and would
appreciate it if I could be CC'd on any e-mails suggesting scheduler
changes. PPS I'm just happy to see that Ingo has finally accepted
that the vanilla scheduler was badly in need of fixing and don't
really care who fixes it.
PPS Different schedulers for different aims (i.e. server or work
station) do make a difference.  E.g. the spa_svr scheduler in
plugsched does about 1% better on kernbench than the next best
scheduler in the bunch. PPPS Con, fairness isn't always best as
humans aren't very altruistic and we need to give unfair preference
to interactive tasks in order to stop the users flinging their PCs
out the window.  But the current scheduler doesn't do this very well
and is also not very good at fairness so needs to change.  But the
changes need to address interactive response and fairness not just
fairness.
Kernel compiles not so useful a benchmark. SDET, OAST, AIM7, etc. are
better ones. I'd not bother citing kernel compile results.
spa_svr actually does its best work when the system isn't fully loaded
as the type of improvement it strives to achieve (minimizing on queue
wait time) hasn't got much room to manoeuvre when the system is fully
loaded.  Therefore, the fact that it's 1% better even in these
circumstances is a good result and also indicates that the overhead for
keeping the scheduling statistics it uses for its decision making is
well spent.  Especially, when you consider that the total available
room for improvement on this benchmark is less than 3%.

To elaborate, the motivation for this scheduler was acquired from the
observation of scheduling statistics (in particular, on queue wait
time) on systems running at about 30% to 50% load.  Theoretically, at
these load levels there should be no such waiting but the statistics
show that there is considerable waiting (sometimes as high as 30% to
50%).  I put this down to "lack of serendipity" e.g.  everyone sleeping
at the same time and then trying to run at the same time would be
complete lack of serendipity.  On the other hand, if everyone is synced
then there would be total serendipity.

Obviously, from the POV of a client, time the server task spends
waiting on the queue adds to the response time for any request that has
been made so reduction of this time on a server is a good thing(tm). Equally obviously, trying to achieve this synchronization by asking the
tasks to cooperate with each other is not a feasible solution and some
external influence needs to be exerted and this is what spa_svr does --
it nudges the scheduling order of the tasks in a way that makes them
become well synced.

Unfortunately, this is not a good scheduler for an interactive system
as it minimizes the response times for ALL tasks (and the system as a
whole) and this can result in increased response time for some
interactive tasks (clunkiness) which annoys interactive users.  When
you start fiddling with this scheduler to bring back "interactive
unfairness" you kill a lot of its superior low overall wait time
performance.
spa_svr is my favorite, but as you mentioned doesn't work well with ia. So I started instrumenting its behaviour with chew.c (attached). What I
found is that prio-levels are way to coarse.  Setting max_tpt_bonus = 3
bounds this somewhat, but it was still not enough.  Looking at
spa_svr_reassess_bonus and changing it to simply adjust prio based on
avg_sleep did the trick like this:

static void spa_svr_reassess_bonus(struct task_struct *p)
{
	if (p->sdu.spa.avg_sleep_per_cycle >> 10) {
		incr_throughput_bonus(p, 1);
	} else
		decr_throughput_bonus(p);
}
I suspect that this would kill some of the good server performance as it
removes the mechanism that minimises wait time.  It is effectively just
a simplification of what the vanilla O(1) scheduler tries to do i.e.
assume tasks that sleep a lot are interactive and give them a boost.

spa_ws tries to do this as well only in a bit more complicated fashion.
  So maybe an spa_svr modified in this way and renamed could make a good
interactive scheduler.

Great!

Reducing the prio-level granularity may also be helpful;

Because of some of the bit operations code makes it a bad idea to have more than 160 priority levels, you're more or less limited to 60 priority levels for SCHED_OTHER tasks (as 100 are used for real time) and you need 40 of these to pay some attention to niceness leaving you about 20 priority levels to use for fiddling. Is that enough?

With spa_ebs (now that CPU rate caps have been removed), you have all 60 priorities available for fiddling with as niceness is taken care of when calculating each task's entitlement.

or maybe even make it adjustable? Any easy way to introduce something like this?


As long as sysfs remains available it should be fairly easy. Although replacing constants with variables (which may be necessary) has space/time overhead implications but they would probably be insignificant.

Peter
-
Peter Williams                                   [email protected]

"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux