[patch] x86_64: do not enable the NMI watchdog by default

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



* Len Brown <[email protected]> wrote:

> Personally I have never been a big fan of having the NMI watchdog 
> running by default on all systems -- but Andi insists that it helps 
> him debug failures, so tick it does...

enabling it by default was IMO a really bad decision and it needs to be 
undone via the patch attached further below.

If Andi wants to debug stuff via the NMI wachdog, he should use the 
nmi_watchdog=2 boot option: next to the tty=ttyS0 serial console 
options, initcall_debug, apic=debug, earlyprintk and myriads of other 
kernel-hackers-only boot options that we all use to 'help debug 
failures' ...

also, lock debugging facilities catch lockup possibilities (and actual 
lockups) alot more efficiently, i cannot remember the last time the NMI 
watchdog caught anything on my boxes without some other facility not 
triggering first. (and i have it enabled everywhere)

I have run a quick analysis over all locking related crashes i triggered 
on one particular testbox over the past 2 weeks. Out of 26 separate lock 
related incidents spread out equally over that timeframe (out of 497 
bootups on this box), this was the distribution of debugging facilities 
that caught the bugs:

      1  BUG: spinlock lockup on
      1  [ INFO: inconsistent lock state ]
      2  BUG: scheduling while atomic
      2  [ BUG: bad unlock balance detected! ]
      2  BUG: sleeping function called from invalid context
      6  BUG: scheduling with irqs disabled
      5  [ INFO: hard-safe -> hard-unsafe lock order detected ]
      7  BUG: using smp_processor_id() in preemptible [] code

8 were caught by lockdep, 8 by atomicity checks in the scheduler, 7 by 
DEBUG_PREEMPT and 1 by DEBUG_SPINLOCK.

Note: zero were caught by the NMI watchdog, and i run the NMI watchdog 
enabled by default on all architectures, and i have serial logging of 
everything.

but even for the typical distro kernel and for the typical user, the NMI 
watchdog is normally useless, because NMI lockups rarely make it into 
the syslog and X just locks up without showing anything on the screen.

	Ingo

----------------------->
Subject: [patch] x86_64: do not enable the NMI watchdog by default
From: Ingo Molnar <[email protected]>

do not enable the NMI watchdog by default. Now that we have lockdep i 
cannot remember the last time it caught a real bug, but the NMI watchdog 
can /cause/ problems. Furthermore, to the typical user, an NMI watchdog 
assert results in a total lockup anyway (if under X). In that sense, all 
that the NMI watchdog does is that it makes the system /less/ stable and 
/less/ debuggable.

people can still enable it either after bootup via:

   echo 1 > /proc/sys/kernel/nmi

or via the nmi_watchdog=1 or nmi_watchdog=2 boot options.

build and boot tested on an Athlon64 box.

Signed-off-by: Ingo Molnar <[email protected]>
---
 arch/x86_64/kernel/apic.c    |    1 -
 arch/x86_64/kernel/io_apic.c |    5 +----
 arch/x86_64/kernel/nmi.c     |    2 +-
 arch/x86_64/kernel/smpboot.c |    1 -
 include/asm-x86_64/nmi.h     |    1 -
 5 files changed, 2 insertions(+), 8 deletions(-)

Index: linux/arch/x86_64/kernel/apic.c
===================================================================
--- linux.orig/arch/x86_64/kernel/apic.c
+++ linux/arch/x86_64/kernel/apic.c
@@ -443,7 +443,6 @@ void __cpuinit setup_local_APIC (void)
 			oldvalue, value);
 	}
 
-	nmi_watchdog_default();
 	setup_apic_nmi_watchdog(NULL);
 	apic_pm_activate();
 }
Index: linux/arch/x86_64/kernel/io_apic.c
===================================================================
--- linux.orig/arch/x86_64/kernel/io_apic.c
+++ linux/arch/x86_64/kernel/io_apic.c
@@ -1604,7 +1604,6 @@ static inline void check_timer(void)
 		 */
 		unmask_IO_APIC_irq(0);
 		if (!no_timer_check && timer_irq_works()) {
-			nmi_watchdog_default();
 			if (nmi_watchdog == NMI_IO_APIC) {
 				disable_8259A_irq(0);
 				setup_nmi();
@@ -1630,10 +1629,8 @@ static inline void check_timer(void)
 		setup_ExtINT_IRQ0_pin(apic2, pin2, vector);
 		if (timer_irq_works()) {
 			apic_printk(APIC_VERBOSE," works.\n");
-			nmi_watchdog_default();
-			if (nmi_watchdog == NMI_IO_APIC) {
+			if (nmi_watchdog == NMI_IO_APIC)
 				setup_nmi();
-			}
 			return;
 		}
 		/*
Index: linux/arch/x86_64/kernel/nmi.c
===================================================================
--- linux.orig/arch/x86_64/kernel/nmi.c
+++ linux/arch/x86_64/kernel/nmi.c
@@ -181,7 +181,7 @@ static __cpuinit inline int nmi_known_cp
 }
 
 /* Run after command line and cpu_init init, but before all other checks */
-void nmi_watchdog_default(void)
+static inline void nmi_watchdog_default(void)
 {
 	if (nmi_watchdog != NMI_DEFAULT)
 		return;
Index: linux/arch/x86_64/kernel/smpboot.c
===================================================================
--- linux.orig/arch/x86_64/kernel/smpboot.c
+++ linux/arch/x86_64/kernel/smpboot.c
@@ -866,7 +866,6 @@ static int __init smp_sanity_check(unsig
  */
 void __init smp_prepare_cpus(unsigned int max_cpus)
 {
-	nmi_watchdog_default();
 	current_cpu_data = boot_cpu_data;
 	current_thread_info()->cpu = 0;  /* needed? */
 	set_cpu_sibling_map(0);
Index: linux/include/asm-x86_64/nmi.h
===================================================================
--- linux.orig/include/asm-x86_64/nmi.h
+++ linux/include/asm-x86_64/nmi.h
@@ -59,7 +59,6 @@ extern void disable_timer_nmi_watchdog(v
 extern void enable_timer_nmi_watchdog(void);
 extern int nmi_watchdog_tick (struct pt_regs * regs, unsigned reason);
 
-extern void nmi_watchdog_default(void);
 extern int setup_nmi_watchdog(char *);
 
 extern atomic_t nmi_active;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux