Re: [PATCH 0/10] bulk cpu removal support

Ashok Raj wrote:
> On Thu, May 11, 2006 at 01:42:47PM -0700, Martin Bligh wrote:
> > Ashok Raj wrote:
> > > 
> > > 
> > > It depends on whats running at the time... with some light load, i measured 
> > > wall clock time, i remember seeing 2 secs at times, but its been a long time
> > > i did that.. so take that with a pinch :-)_
> > > 
> > > i will try to get those idle and load times worked out again... the best
> > > i have is a  16 way, if i get help from big system oems i will send the 
> > > numbers out
> > 
> > Why is taking 30s to offline CPUs a problem?
> > 
> 
> Well, the real problem is that for each cpu offline we schedule a RT thread
> kstopmachine() on each cpu, then turn off interrupts until this one cpu has 
> removed. stop_machine_run() is a big enough sledge hammer during cpu offline
> and doing this repeatedly... say on a 4 socket system, where each socket=16
> logical cpus.
> 
> the system would tend to get hick ups 64 times, since we do the stopmachine
> thread once for each logical cpu. When we want to replace a node for
> reliability reasons, its not clear if this hick ups is a good thing.

Can you provide more detail on what you mean by hiccups?

> Doing kstopmachine() on a single system is in itself noticable, what we heard
> from some OEM's is this would have other app level impact as well.

What "other app level impact"?

> With the bulk removal, we do stop machine just once, but all the 16 cpus
> get removed once hence there is just one hickup, instead of 64.

Have you done any profiling or other instrumentation that identifies
stopmachine as the real culprit here?  I mean, it's a reasonable
assumption to make, but are you sure there's not something else
causing the hiccups?  Perhaps contention on the cpu hotplug lock, or
something wrong in the architecture cpu_disable code?

Module unload also uses stop_machine_run, iirc.  Do you see hiccups
with that, too?

> Less time to offline, avoid process and interrupt bouncing on and off a cpu
> which is just about to be offlined are almost extra fringe benefits you get 
> with the bulk removal approach.

Ok, so that's not the primary motivation for these patches?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

References:
- [PATCH 0/10] bulk cpu removal support
  - From: Shaohua Li <[email protected]>
- Re: [PATCH 0/10] bulk cpu removal support
  - From: Andrew Morton <[email protected]>
- Re: [PATCH 0/10] bulk cpu removal support
  - From: Ashok Raj <[email protected]>
- Re: [PATCH 0/10] bulk cpu removal support
  - From: Andrew Morton <[email protected]>
- Re: [PATCH 0/10] bulk cpu removal support
  - From: Ashok Raj <[email protected]>
- Re: [PATCH 0/10] bulk cpu removal support
  - From: Martin Bligh <[email protected]>
- Re: [PATCH 0/10] bulk cpu removal support
  - From: Ashok Raj <[email protected]>

Prev by Date: Re: [PATCH 4/6] myri10ge - First half of the driver
Next by Date: RE: Regression seen for patch "sched:dont decrease idle sleep avg"
Previous by thread: Re: [PATCH 0/10] bulk cpu removal support
Next by thread: Re: [PATCH 0/10] bulk cpu removal support
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]