RE: nmi_watchdog fix for x86_64 to be more like i386

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 4 Oct 2007, Pallipadi, Venkatesh wrote:
> >-----Original Message-----
> >From: [email protected] 
> >[mailto:[email protected]] On Behalf Of 
> >Thomas Gleixner
> >Sent: Monday, October 01, 2007 11:19 PM
> >To: Andi Kleen
> >Cc: Arjan van de Ven; David Bahi; LKML; 
> >[email protected]; Andrew Morton; Ingo Molnar; 
> >Gregory Haskins
> >Subject: Re: nmi_watchdog fix for x86_64 to be more like i386
> >
> >>
> >> The only workaround for chipsets ignoring IRQ affinity would 
> >be to keep
> >> track on which CPU irq 0 happens and then restart APIC timer 
> >interrupts
> >> on the others (or send IPIs) as needed. But that would be 
> >fairly ugly.
> >
> >The clock events code does handle this already. The broadcast 
> >interrupt 
> >can come in on any cpu. It's just the nmi watchdog which would 
> >be affected 
> >by that.
> >
> 
> Probably we can workaround this by keeping track of IRQ0 count at percpu
> level and
> use local apic timer + this percpu counter in NMI. Or just increment
> local
> apic timer count in IRQ0 with nohz enabled.

No, I tried that. It's ugly.

The per cpu accounting is the correct way to go if we want to take
care of those systems, which ignore the CPU0 binding of irq0.

See patch against the x86 tree below.

	tglx

-------------------->
commit 093976c7ad206a008bd5de4619f40f6bca4a79c3
Author: Thomas Gleixner <[email protected]>
Date:   Fri Oct 5 22:19:18 2007 +0200

    x86: Fix irq0 / local apic timer accounting
    
    The clock events merge introduced a change to the nmi watchdog code to
    handle the not longer increasing local apic timer count in the
    broadcast mode. This is fine for UP, but on SMP it pampers over a
    stuck CPU which is not handling the broadcast interrupt due to the
    unconditional sum up of local apic timer count and irq0 count.
    
    To cover all cases we need to keep track on which CPU irq0 is
    handled. In theory this is CPU#0 due to the explicit disabling of irq
    balancing for irq0, but there are systems which ignore this on the
    hardware level. The per cpu irq0 accounting allows us to remove the
    irq0 to CPU0 binding as well.
    
    Add a per cpu counter for irq0 and evaluate this instead of the global
    irq0 count in the nmi watchdog code.
    
    Signed-off-by: Thomas Gleixner <[email protected]>

diff --git a/arch/x86/kernel/nmi_32.c b/arch/x86/kernel/nmi_32.c
index c7227e2..95d3fc2 100644
--- a/arch/x86/kernel/nmi_32.c
+++ b/arch/x86/kernel/nmi_32.c
@@ -353,7 +353,8 @@ __kprobes int nmi_watchdog_tick(struct pt_regs * regs, unsigned reason)
 	 * Take the local apic timer and PIT/HPET into account. We don't
 	 * know which one is active, when we have highres/dyntick on
 	 */
-	sum = per_cpu(irq_stat, cpu).apic_timer_irqs + kstat_cpu(cpu).irqs[0];
+	sum = per_cpu(irq_stat, cpu).apic_timer_irqs +
+		per_cpu(irq_stat, cpu).irq0_irqs;
 
 	/* if the none of the timers isn't firing, this cpu isn't doing much */
 	if (!touched && last_irq_sums[cpu] == sum) {
diff --git a/arch/x86/kernel/time_32.c b/arch/x86/kernel/time_32.c
index 19a6c67..3571d0a 100644
--- a/arch/x86/kernel/time_32.c
+++ b/arch/x86/kernel/time_32.c
@@ -157,6 +157,9 @@ EXPORT_SYMBOL(profile_pc);
  */
 irqreturn_t timer_interrupt(int irq, void *dev_id)
 {
+	/* Keep nmi watchdog up to date */
+	per_cpu(irq_stat, cpu).irq0_irqs++;
+
 #ifdef CONFIG_X86_IO_APIC
 	if (timer_ack) {
 		/*
diff --git a/include/asm-x86/hardirq_32.h b/include/asm-x86/hardirq_32.h
index ed7cf97..9188635 100644
--- a/include/asm-x86/hardirq_32.h
+++ b/include/asm-x86/hardirq_32.h
@@ -9,6 +9,7 @@ typedef struct {
 	unsigned long idle_timestamp;
 	unsigned int __nmi_count;	/* arch dependent */
 	unsigned int apic_timer_irqs;	/* arch dependent */
+	unsigned int irq0_irqs;
 } ____cacheline_aligned irq_cpustat_t;
 
 DECLARE_PER_CPU(irq_cpustat_t, irq_stat);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux