2.6.22 must-have item - perhaps suitable for -stable too, because it was
reproduced on 2.6.21.5 too.
---------------------->
From: Christoph Lameter <[email protected]>
Subject: [patch] sched: fix next_interval determination in idle_balance()
Fix massive SMP imbalance on NUMA nodes observed on 2.6.21.5 with CFS.
(and later on reproduced without CFS as well).
The intervals of domains that do not have SD_BALANCE_NEWIDLE must be
considered for the calculation of the time of the next balance.
Otherwise we may defer rebalancing forever and nodes might stay idle for
very long times.
Siddha also spotted that the conversion of the balance interval to
jiffies is missing. Fix that to.
From: Srivatsa Vaddagiri <[email protected]>
also continue the loop if !(sd->flags & SD_LOAD_BALANCE).
Tested-by: Paul E. McKenney <[email protected]>
It did in fact trigger under all three of mainline, CFS, and -rt
including CFS -- see below for a couple of emails from last Friday
giving results for these three on the AMD box (where it happened) and on
a single-quad NUMA-Q system (where it did not, at least not with such
severity).
Signed-off-by: Christoph Lameter <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched.c | 22 +++++++++++++---------
1 file changed, 13 insertions(+), 9 deletions(-)
Index: v/kernel/sched.c
===================================================================
--- v.orig/kernel/sched.c
+++ v/kernel/sched.c
@@ -2938,17 +2938,21 @@ static void idle_balance(int this_cpu, s
unsigned long next_balance = jiffies + 60 * HZ;
for_each_domain(this_cpu, sd) {
- if (sd->flags & SD_BALANCE_NEWIDLE) {
+ unsigned long interval;
+
+ if (!(sd->flags & SD_LOAD_BALANCE))
+ continue;
+
+ if (sd->flags & SD_BALANCE_NEWIDLE)
/* If we've pulled tasks over stop searching: */
pulled_task = load_balance_newidle(this_cpu,
- this_rq, sd);
- if (time_after(next_balance,
- sd->last_balance + sd->balance_interval))
- next_balance = sd->last_balance
- + sd->balance_interval;
- if (pulled_task)
- break;
- }
+ this_rq, sd);
+
+ interval = msecs_to_jiffies(sd->balance_interval);
+ if (time_after(next_balance, sd->last_balance + interval))
+ next_balance = sd->last_balance + interval;
+ if (pulled_task)
+ break;
}
if (!pulled_task)
/*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]