Jack Steiner wrote:
Nick -
The latest 2.6.12-rc5-mm2 tree fails to boot on some of the 64p
SGI systems. The system hangs immediately after printing:
...
Inode-cache hash table entries: 8388608 (order: 12, 67108864 bytes)
Mount-cache hash table entries: 1024
Boot processor id 0x0/0x0
Brought up 64 CPUs
Total of 64 processors activated (118415.36 BogoMIPS).
I have isolated the failure to cpu 0 hanging in sched_balance_self() during
a fork (or clone). The "while" loop at the end of function never
terminates, ie. sd is never NULL.
Is this a problem that you have seen before. If not, I'll do some
more digging & isolate the problem.
Hi Jack,
I haven't completely got to the bottom of this yet, but I was able
to reproduce on a 64-way Altix, and something like the attached patch
seems to 'fix' the problem.
I didn't have time to find what's gone wrong tonight, but I'll get
to that tomorrow.
--
SUSE Labs, Novell Inc.
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c 2005-06-08 00:01:53.000000000 +1000
+++ linux-2.6/kernel/sched.c 2005-06-08 00:02:47.000000000 +1000
@@ -1113,6 +1113,7 @@ static int sched_balance_self(int cpu, i
cpumask_t span;
struct sched_group *group;
int new_cpu;
+ int weight;
span = sd->span;
group = find_idlest_group(sd, t, cpu);
@@ -1127,8 +1128,9 @@ static int sched_balance_self(int cpu, i
cpu = new_cpu;
nextlevel:
sd = NULL;
+ weight = cpus_weight(span);
for_each_domain(cpu, tmp) {
- if (cpus_subset(span, tmp->span))
+ if (weight <= cpus_weight(tmp->span))
break;
if (tmp->flags & flag)
sd = tmp;
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]