Re: [PATCH] x86: Document the hotplug code is incompatible with x86 irq handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tuesday, 12 June 2007 20:19, Eric W. Biederman wrote:
> Pavel Machek <[email protected]> writes:
> 
> > Hi!
> >
> >> I just realized that except for doing the code review and noticing
> >> that the current cpu hotplug code is fundamentally incompatible
> >> with x86 I haven't done anything about it.  So here is my patch
> >> to document what is wrong.
> >> 
> >> The current cpu hotplug code requires irqs to be migrated from a cpu
> >> outside of irq context.  On x86 ioapics simply do not support this,
> >> making the code unfixable without major redesign of the generic cpu
> >> hotplug code.
> >> 
> >> So this patch makes CPU_HOTPLUG on x86 depend on CONFIG_BROKEN
> >> and adds a WARN_ON so people that do enable it are not in doubt about
> >> which part of the code is broken, even if it does work for them.
> >
> >
> >> --- a/arch/i386/kernel/irq.c
> >> +++ b/arch/i386/kernel/irq.c
> >> @@ -312,6 +312,19 @@ void fixup_irqs(cpumask_t map)
> >>  	unsigned int irq;
> >>  	static int warned;
> >>  
> >> +	/* 
> >> +	 * Function is so wrong at so many levels.
> >> +	 * - We migrate irqs that are directed at the cpu we are
> >> +	 *   removing.
> >
> > Is this about irq pinning?
> 
> Sorry. That should have been: We migrate irqs that are not directed at the
> cpu we are removing.  (We are migrating irqs when it is unnecessary).
> 
> >> +	 * - We cannot safely migrate ioapic irqs on x86 except in
> >> +	 *   side of irq context.
> >
> > 'inside'?
> >
> > Can you be more specific for this one?
> 
> Yes inside. 
> 
> An irq migration currently requires two instances of the irq firing to
> complete.  Once on the source cpu once on the target cpu.
> 
> Migrating irqs while the irq is alive is a royal pain.
> 
> >> +	 * Since someone probably finds this useful just warn very
> >> +	 * loudly until cpu hotplug is redesigned.
> >> +	 */
> >> +	WARN_ON(1);
> >
> > Ugh, no, this does not warn anyone. This will just make people ask me
> > why they see stack trace while suspending... and we are not interested
> > in the stack trace, anyway.
> >
> > printk(KERN_WARNING)?
> 
> 
> Because you are calling unfixably broken code.  That should be a decent
> incentive to do something else won't it?

Can you please tell me _what_ else can be done?

> IOAPICs do not support what the code is doing here.  There is lots of
> practical evidence including bad experiences and practical tests that
> support this.

Well, AFAICS, Suresh has tried to debug one failing case recently without
any consistent conclusions.  I don't know of any other test cases (links,
please?).
 
> I suspect the only reason you don't have problems is that the irqs are
> already shut down at the source before we get to this code path.

You are probably right, but OTOH I've tried the CPU hotplug via the sysfs
interface for _many_ times and it has _never_ failed for me.

> >> index 5ce9443..a61c4f2 100644
> >> --- a/arch/x86_64/Kconfig
> >> +++ b/arch/x86_64/Kconfig
> >> @@ -429,7 +429,7 @@ config NR_CPUS
> >>  
> >>  config HOTPLUG_CPU
> >>  	bool "Support for suspend on SMP and hot-pluggable CPUs (EXPERIMENTAL)"
> >> -	depends on SMP && HOTPLUG && EXPERIMENTAL
> >> +	depends on SMP && HOTPLUG && EXPERIMENTAL && BROKEN
> >>  	help
> >
> > Great, this will force everyone and their dog to enable broken, making
> > broken useless. Please don't.
> 
> CONFIG_BROKEN is quite likely excessive but the code is totally and
> unfixably broken.  So it still doesn't feel wrong to me.
> 
> Perhaps we should just disable swap suspend on SMP until we get a
> design that can be implemented correctly on existing hardware.

All kinds of suspend, not only hibernation.

> I am not happy with people telling me that we must keep broken code because
> people with brand new SMP laptops will scream otherwise.

Sorry, but that's how it goes.

We've been using that code for more then a year now and I'd expect someone
to tell us that it's wrong a bit earlier.

If you are telling us to drop it now, then _please_ advise what we can use
instead, because we obviously need the functionality.

> Since the fundamental problems in this code path don't appear to bite people
> very frequently I don't mind waiting while an alternative solution is
> debugged and tested, but there is no way it makes sense to keep this
> code in service more for more than a kernel release or two.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux