Re: nmi_watchdog=2 regression in 2.6.21

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2007-08-31 at 09:21 -0700, Stephane Eranian wrote:
> Daniel,
> 
> On Fri, Aug 31, 2007 at 07:43:20AM -0700, Daniel Walker wrote:
> > On Thu, 2007-08-30 at 14:05 -0700, Stephane Eranian wrote:
> > > Daniel,
> > 
> > > Yes, I realized I missed a small detail in the switch statement.
> > > Could you try the new version?
> > 
> > This patch still has the stuck NMI .. Essentially the same thing that
> > happened without the patch..
> > 
> Ok, looks like deaulting to P6 does not quite work.
> 
> Here is a new version. This time I used a different approach.
> I am must admit I am a bit puzzled by the duplication of information
> between the wd_ops and the nmi_watchdog_ctlblk structure. My understanding
> is that thelater is used as a cache for the info that needs to be per-cpu.
> 
> The wd_ops provides the MSR to use for the counter, yet all the setup_*()
> routines hardcode the MSR. Not sure why?

Yeah, that's bad .. For instance, if those had all been centralized
Bjorn wouldn't have needed to fix those up later..

> In this patch, the setup_*() routine now extract the MSR from the wd_ops
> to copy them into the nmi_watchdog_ctlblk. This is not done for P4 because
> of the special and ugly case of HT. 
> 
> With this approach, we can now create a custom wd_ops for CoreDuo that is
> a clone of the intel_arch_wd_ops, except for the MSR.
> 
> Could you try this one instead?

So I tested your patch unchanged and the system boots, and the
check_nmi_watchdog() passes .. However, the nmi stops ticking right
after bootup,

>From my /proc/interrupts below,

           CPU0       CPU1       CPU2       CPU3       
  0:        108          0          0          0   IO-APIC-edge      timer
  1:          0          0          0          8   IO-APIC-edge      i8042
  4:       3427          0          0          1   IO-APIC-edge      serial
  8:          1          0          0          1   IO-APIC-edge      rtc
 12:          0          0          0        113   IO-APIC-edge      i8042
 14:       1128          0          0         10   IO-APIC-edge      ide0
 16:       1664          0          0          1   IO-APIC-fasteoi   uhci_hcd:usb2, eth0
 18:          0          0          0          0   IO-APIC-fasteoi   ehci_hcd:usb1
 19:          0          0          0          0   IO-APIC-fasteoi   uhci_hcd:usb3
 20:          0          0          0          1   IO-APIC-fasteoi   acpi
NMI:       1670       1453       1097        967 
LOC:      48001      48002      48000      48006 
ERR:          0
MIS:          0


The NMI field never changes ..

So I added another change which looked appropriate,

@@ -674,6 +688,7 @@ unsigned lapic_adjust_nmi_hz(unsigned hz
 {
        struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
        if (wd->perfctr_msr == MSR_P6_PERFCTR0 ||
+           wd->perfctr_msr == MSR_ARCH_PERFMON_PERFCTR0 ||
            wd->perfctr_msr == MSR_ARCH_PERFMON_PERFCTR1)
                hz = adjust_for_32bit_ctr(hz);
        return hz;


Unfortunately that didn't fix anything, but I have a feeling is has
something to do with the nmi hertz adjustment that happens after
check_nmi_watchdog() ..

Daniel


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux