Bad hotplug/scheduler interaction?

I'm concerned that we don't have adequate protection for the scheduler
during cpu hotplug events, but I'm willing to believe I simply don't
understand the mechanism well enough.  We had a crash in (comparatively
ancient) 2.6.16.* but I think the relevant code is basically unchanged
since then.

First we introduced some cpu-intensive workloads.  Then we added two cpus.
System quickly crashed.  The crash was in find_busiest_group(), when
the kernel tried to access "this", which was NULL.  If we don't find a
localgroup, we won't set this, and when we try to calculate *imbalance,
we'll dereference a NULL "this" and crash.

As I looked over the code, though, I couldn't tell if the fault was with
find_busiest_group() for not covering this case, or if the problem was
that the method the hotplug code is using to reconstruct the sched_domains
really doesn't protect find_busiest_group (and find_idlest_group) at all.
Can anybody explain how synchronize_sched() is really syncing?  It looks
like a half-implemented RCU setup.  I fear we really don't have any way
to protect the two functions above from hotplug's desire to twiddle
with the sched_domains.

Do we?

Rick
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Prev by Date: Re: DTR/DSR Patch
Next by Date: Re: [announce] CFS-devel, performance improvements
Previous by thread: sata_nv issues with MCP51 SATA controller
Next by thread: [PATCH] Blackfin arch: add some missing syscall
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]