Re: -mm seems significanty slower than mainline on kernbench

Peter Williams wrote:

Peter Williams wrote:
Martin Bligh wrote:
But I was thinking more about the code that (in the original)handled the case where the number of tasks to be moved was lessthan 1 but more than 0 (i.e. the cases where "imbalance" would havebeen reduced to zero when divided by SCHED_LOAD_SCALE). I thinkthat I got that part wrong and you can end up with a bias load tobe moved which is less than any of the bias_prio values for anyqueued tasks (in circumstances where the original code would haverounded up to 1 and caused a move). I think that the way to handlethis problem is to replace 1 with "average bias prio" within thatlogic. This would guarantee at least one task with a bias_priosmall enough to be moved.
I think that this analysis is a strong argument for my originalpatch being the cause of the problem so I'll go ahead and generatea fix. I'll try to have a patch available later this morning.
Attached is a patch that addresses this problem. Unlike thedescription above it does not use "average bias prio" as thatsolution would be very complicated. Instead it makes the assumptionthat NICE_TO_BIAS_PRIO(0) is a "good enough" for this purpose asthis is highly likely to be the median bias prio and the median isprobably better for this purpose than the average.
Signed-off-by: Peter Williams <pwil3058@bigpond.com.au>
Doesn't fix the perf issue.
OK, thanks. I think there's a few more places where SCHED_LOAD_SCALEneeds to be multiplied by NICE_TO_BIAS_PRIO(0). Basically, anywherethat it's added to, subtracted from or compared to a load. In thosecases it's being used as a scaled version of 1 and we need a scaled
This would have been better said as "the load generated by 1 task"rather than just "a scaled version of 1". Numerically, they're the samebut one is clearer than the other and makes it more obvious why we needNICE_TO_BIAS_PRIO(0) * SCHED_LOAD_SCALE and where we need it.
version of NICE_TO_BIAS_PRIO(0).  I'll have another patch later today.
I'm just testing this at the moment.

Attached is a new patch to fix the excessive idle problem. This patchtakes a new approach to the problem as it was becoming obvious thattrying to alter the load balancing code to cope with biased load washarder than it seemed.

This approach reverts to the old load values but weights them accordingto tasks' bias_prio values. This means that any assumptions by the loadbalancing code that the load generated by a single task isSCHED_LOAD_SCALE will still hold. Then, in find_busiest_group(), theimbalance is scaled back up to bias_prio scale so that move_tasks() canmove biased load rather than tasks.

One advantage of this is that when there are no non zero niced tasks theprocessing will be mathematically the same as the original code.Kernbench results from a 2 CPU Celeron 550Mhz system are:

Average Optimal -j 8 Load Run:
Elapsed Time 1056.16 (0.831102)
User Time 1906.54 (1.38447)
System Time 182.086 (0.973386)
Percent CPU 197 (0)
Context Switches 48727.2 (249.351)
Sleeps 27623.4 (413.913)

This indicates that, on average, 98.9% of the total available CPU wasused by the build.

Signed-off-by: Peter Williams <pwil3058@bigpond.com.au>

BTW I think that we need to think about a slightly more complex nice tobias mapping function. The current one gives a nice==19 1/20 of thebias of a nice=0 task but only gives nice=-20 tasks twice the bias of anice=0 task. I don't think this is a big problem as the majority of nonnice==0 tasks will have positive nice but should be looked at for afuture enhancement.

Peter
--
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce

Index: MM-2.6.X/kernel/sched.c
===================================================================
--- MM-2.6.X.orig/kernel/sched.c	2006-01-13 14:53:34.000000000 +1100
+++ MM-2.6.X/kernel/sched.c	2006-01-13 15:11:19.000000000 +1100
@@ -1042,7 +1042,8 @@ void kick_process(task_t *p)
 static unsigned long source_load(int cpu, int type)
 {
 	runqueue_t *rq = cpu_rq(cpu);
-	unsigned long load_now = rq->prio_bias * SCHED_LOAD_SCALE;
+	unsigned long load_now = (rq->prio_bias * SCHED_LOAD_SCALE) /
+		NICE_TO_BIAS_PRIO(0);
 
 	if (type == 0)
 		return load_now;
@@ -1056,7 +1057,8 @@ static unsigned long source_load(int cpu
 static inline unsigned long target_load(int cpu, int type)
 {
 	runqueue_t *rq = cpu_rq(cpu);
-	unsigned long load_now = rq->prio_bias * SCHED_LOAD_SCALE;
+	unsigned long load_now = (rq->prio_bias * SCHED_LOAD_SCALE) /
+		NICE_TO_BIAS_PRIO(0);
 
 	if (type == 0)
 		return load_now;
@@ -1322,7 +1324,8 @@ static int try_to_wake_up(task_t *p, uns
 			 * of the current CPU:
 			 */
 			if (sync)
-				tl -= p->bias_prio * SCHED_LOAD_SCALE;
+				tl -= (p->bias_prio * SCHED_LOAD_SCALE) /
+					NICE_TO_BIAS_PRIO(0);
 
 			if ((tl <= load &&
 				tl + target_load(cpu, idx) <= SCHED_LOAD_SCALE) ||
@@ -2159,7 +2162,7 @@ find_busiest_group(struct sched_domain *
 	}
 
 	/* Get rid of the scaling factor, rounding down as we divide */
-	*imbalance = *imbalance / SCHED_LOAD_SCALE;
+	*imbalance = (*imbalance * NICE_TO_BIAS_PRIO(0)) / SCHED_LOAD_SCALE;
 	return busiest;
 
 out_balanced:
@@ -2472,7 +2475,8 @@ static void rebalance_tick(int this_cpu,
 	struct sched_domain *sd;
 	int i;
 
-	this_load = this_rq->prio_bias * SCHED_LOAD_SCALE;
+	this_load = (this_rq->prio_bias * SCHED_LOAD_SCALE) /
+		NICE_TO_BIAS_PRIO(0);
 	/* Update our load */
 	for (i = 0; i < 3; i++) {
 		unsigned long new_load = this_load;

Follow-Ups:
- Re: -mm seems significanty slower than mainline on kernbench
  - From: "Martin J. Bligh" <mbligh@google.com>
- Re: -mm seems significanty slower than mainline on kernbench
  - From: Andy Whitcroft <apw@shadowen.org>
- Re: -mm seems significanty slower than mainline on kernbench
  - From: "Martin J. Bligh" <mbligh@google.com>
- Re: -mm seems significanty slower than mainline on kernbench
  - From: Peter Williams <pwil3058@bigpond.net.au>

References:
- -mm seems significanty slower than mainline on kernbench
  - From: Martin Bligh <mbligh@google.com>
- Re: -mm seems significanty slower than mainline on kernbench
  - From: "Martin J. Bligh" <mbligh@google.com>
- Re: -mm seems significanty slower than mainline on kernbench
  - From: Peter Williams <pwil3058@bigpond.net.au>
- Re: -mm seems significanty slower than mainline on kernbench
  - From: Con Kolivas <kernel@kolivas.org>
- Re: -mm seems significanty slower than mainline on kernbench
  - From: Peter Williams <pwil3058@bigpond.net.au>
- Re: -mm seems significanty slower than mainline on kernbench
  - From: Peter Williams <pwil3058@bigpond.net.au>
- Re: -mm seems significanty slower than mainline on kernbench
  - From: Martin Bligh <mbligh@google.com>
- Re: -mm seems significanty slower than mainline on kernbench
  - From: Peter Williams <pwil3058@bigpond.net.au>
- Re: -mm seems significanty slower than mainline on kernbench
  - From: Peter Williams <pwil3058@bigpond.net.au>

Prev by Date: [PATCH] smbfs: remove kmalloc wrapper
Next by Date: Re: 2.6.15 OOPS while trying to mount cdrom
Previous by thread: Re: -mm seems significanty slower than mainline on kernbench
Next by thread: Re: -mm seems significanty slower than mainline on kernbench
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]