[PATCH] x86_64: mce poll at IDLE_START and printk fix

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Tim Hockin <[email protected]>

Background:
 The MCE handler already has an idle-task handler which checks for the
 TIF_MCE_NOTIFY flag.  Given that the system is idle at that point, we can
 get even better granularity of MCE logging by polling for MCEs whenever
 we enter the idle loop.  This exposes a small imperfection in the
 printk() rate limiting whereby that last "Events Logged" message might
 not get printed if no more MCEs arrive.

Description:
 This patch extends the MCE idle notifier callback to poll for MCEs on the
 current CPU at IDLE_START time.  It also adds one new static variable to
 track whether any events have been logged since the last printk() and
 causes a printk at the next rate-limited opportunity.

Result:
 MCEs are found more rapidly on systems with bad memory.

Alternatives:
 None.

Testing:
 I used software to inject correctable and uncorrectable errors.  An
 application poll()ing /dev/mcelog gets woken up very quickly after error
 injection.

Patch:
 This patch is against 2.6.21-mm.

Signed-off-by: Tim Hockin <[email protected]>

---

This is the first version of this patch.


diff -pruN linux-2.6.21+04_tolerant_cleanup/arch/x86_64/kernel/mce.c linux-2.6.21+05/arch/x86_64/kernel/mce.c
--- linux-2.6.21+04_tolerant_cleanup/arch/x86_64/kernel/mce.c	2007-05-11 21:02:12.000000000 -0700
+++ linux-2.6.21+05/arch/x86_64/kernel/mce.c	2007-05-17 15:29:00.000000000 -0700
@@ -308,10 +308,10 @@ void do_machine_check(struct pt_regs * r
 		}
 	}
 
+ out:
 	/* notify userspace ASAP */
 	set_thread_flag(TIF_MCE_NOTIFY);
 
- out:
 	/* the last thing we do is clear state */
 	for (i = 0; i < banks; i++)
 		wrmsrl(MSR_IA32_MC0_STATUS+4*i, 0);
@@ -389,29 +389,43 @@ static void mcheck_timer(struct work_str
  */
 int mce_notify_user(void)
 {
+	static int do_printk;
+	int retval = 0;
+
 	clear_thread_flag(TIF_MCE_NOTIFY);
-	if (test_and_clear_bit(0, &notify_user)) {
-		static unsigned long last_print;
-		unsigned long now = jiffies;
 
+	/* notify userspace apps as soon as possible */
+	if (test_and_clear_bit(0, &notify_user)) {
 		wake_up_interruptible(&mce_wait);
 		if (trigger[0])
 			call_usermodehelper(trigger, trigger_argv, NULL, -1);
+		do_printk = 1;
+		retval = 1;
+	}
+
+	/* only log a message periodically */
+	if (do_printk) {
+		static unsigned long last_print;
+		unsigned long now = jiffies;
 
 		if (time_after_eq(now, last_print + (check_interval*HZ))) {
 			last_print = now;
 			printk(KERN_INFO "Machine check events logged\n");
+			do_printk = 0;
 		}
-
-		return 1;
 	}
-	return 0;
+
+	return retval;
 }
 
-/* see if the idle task needs to notify userspace */
+/* take advantage of idle time to manage MCEs */
 static int
 mce_idle_callback(struct notifier_block *nfb, unsigned long action, void *junk)
 {
+	/* poll for new MCEs on this CPU */
+	if (action == IDLE_START)
+		mcheck_check_cpu(NULL);
+
 	/* IDLE_END should be safe - interrupts are back on */
 	if (action == IDLE_END && test_thread_flag(TIF_MCE_NOTIFY))
 		mce_notify_user();
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux