Re: [2.6.17-rc5-mm2] crash when doing second suspend: BUG in arch/i386/kernel/nmi.c:174

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 6 Jun 2006 19:05:04 -0400
Don Zickus <[email protected]> wrote:

> On Tue, Jun 06, 2006 at 03:15:07PM -0700, Andrew Morton wrote:
> > On Tue, 6 Jun 2006 17:45:53 -0400
> > Don Zickus <[email protected]> wrote:
> > 
> > > On Tue, Jun 06, 2006 at 04:18:15PM +0200, Andi Kleen wrote:
> > > > 
> > > > > Because he is using a i386 machine, the nmi watchdog is disabled by
> > > > > default. 
> > > > 
> > > > I changed that - it's now on by default on i386 too.
> > > > 
> > > > -Andi
> > > 
> > > I am trying to create a patch for this problem and it just dawned on me,
> > > how does one store the previous state in a suspend/resume path if the code
> > > hotplugs all the cpus first?  CPU0 is easy because an explicit
> > > suspend/resume path is called, but it seems to be called last after all
> > > the other cpus have been removed.  How do I save the state?
> > 
> > I'm really struggling to understand this question.  If you're referring to
> > some per-cpu state then a CPU hotplug handler would be appropriate?
> 
> Sorry.  I got ahead of myself.  My concern is how the suspend/resume code
> works with device drivers on an SMP system.  My initial impression was
> that the subsystem registers with the suspend/resume layer and upon such
> actions those registered functions are called.  
> 
> Inside those functions I saved the previous state of the watchdog timer.
> However, I learned today that my understanding was incorrect.  Instead
> first the _hotplug_ code is called for every cpu _except_ cpu0.  The
> _suspend/resume_ functions are only called in the context of _cpu0_.  
> 
> This breaks the design I have because upon resuming the watchdog timers
> automatically start on all cpus (except cpu0 because I saved the previous
> state through the handlers), regardless of what the previous state was.  
> 
> So my question is/was what is the proper way to handle processor level
> subsystems during the suspend/resume path on an SMP system.  I really
> don't understand the hotplug path nor the suspend/resume path very well.  
> 
> I didn't want to register a hotplug handler because a hotplug event is
> really different than a suspend event (I want to _save_ info during a
> suspend event).  The documentation I was reading seemed to suggest that
> hotplug/suspend/smp was a work-in-progress. 
> 
> Is the typical approach to just hack in an extra parameter to the
> start/stop functions of the nmi_watchdog letting the function know it is
> coming through the suspend/resume path? 
> 
> Any tips, code, other docs would be helpful.
> 

OK...  My understanding of how it works is that the cpu hotplug handlers
are called early in the suspend process to take the CPUs down.  Once all
the APs are shut down, CPU0 will then proceed to handle the devices.

So if you want to save and restore per-cpu NMI state then doing it in the
CPU hot-add and hot-remove handlers is appropriate.  It will affect the
behaviour of _real_ CPU hot-add and hot-remove as well.  But in what
appears to be a correct fashion.

All the above applies to suspend-to-disk.  I don't know if suspend-to-RAM
shuts down the APs.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux