Re: [PATCH 0/10] bulk cpu removal support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ashok Raj wrote:
> On Thu, May 11, 2006 at 01:42:47PM -0700, Martin Bligh wrote:
> > Ashok Raj wrote:
> > > 
> > > 
> > > It depends on whats running at the time... with some light load, i measured 
> > > wall clock time, i remember seeing 2 secs at times, but its been a long time
> > > i did that.. so take that with a pinch :-)_
> > > 
> > > i will try to get those idle and load times worked out again... the best
> > > i have is a  16 way, if i get help from big system oems i will send the 
> > > numbers out
> > 
> > Why is taking 30s to offline CPUs a problem?
> > 
> 
> Well, the real problem is that for each cpu offline we schedule a RT thread
> kstopmachine() on each cpu, then turn off interrupts until this one cpu has 
> removed. stop_machine_run() is a big enough sledge hammer during cpu offline
> and doing this repeatedly... say on a 4 socket system, where each socket=16
> logical cpus.
> 
> the system would tend to get hick ups 64 times, since we do the stopmachine
> thread once for each logical cpu. When we want to replace a node for
> reliability reasons, its not clear if this hick ups is a good thing.

Can you provide more detail on what you mean by hiccups?


> Doing kstopmachine() on a single system is in itself noticable, what we heard
> from some OEM's is this would have other app level impact as well.

What "other app level impact"?


> With the bulk removal, we do stop machine just once, but all the 16 cpus
> get removed once hence there is just one hickup, instead of 64.

Have you done any profiling or other instrumentation that identifies
stopmachine as the real culprit here?  I mean, it's a reasonable
assumption to make, but are you sure there's not something else
causing the hiccups?  Perhaps contention on the cpu hotplug lock, or
something wrong in the architecture cpu_disable code?

Module unload also uses stop_machine_run, iirc.  Do you see hiccups
with that, too?


> Less time to offline, avoid process and interrupt bouncing on and off a cpu
> which is just about to be offlined are almost extra fringe benefits you get 
> with the bulk removal approach.

Ok, so that's not the primary motivation for these patches?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux