Re: 2.6.23-rc7-mm1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 24 Sep 2007 22:38:03 +0530 Kamalesh Babulal
<[email protected]> wrote:

> Peter Zijlstra wrote:
> > On Mon, 24 Sep 2007 09:44:48 -0700 Andrew Morton
> > <[email protected]> wrote:
> > 
> >> On Mon, 24 Sep 2007 18:43:33 +0530 Kamalesh Babulal <[email protected]> wrote:
> >>
> >>> Hi Andrew,
> >>>
> >>> Kernel BUG over x86_64 (AMD Opteron(tm) Processor 844).
> >>>
> >>> Similar kernel Bug was reported for 2.6.23-rc2-mm1
> >>> at http://lkml.org/lkml/2007/8/10/20 and the 
> >>> mm-dirty-balancing-for-tasks.patch was dropped from 2.6.23-rc2-mm2.
> >>> And the same patch is in this -mm version, suspect whether is it the
> >>> same patch triggering this Bug.
> >>>
> >>> BUG: soft lockup - CPU#0 stuck for 11s! [events/0:15]
> >>> CPU 0:
> >>> Modules linked in:
> >>> Pid: 15, comm: events/0 Tainted: G      D 2.6.23-rc7-mm1-autokern1 #1
> >>> RIP: 0010:[<ffffffff8021be46>]  [<ffffffff8021be46>] __smp_call_function_mask+0x9a/0xc4
> >>> RSP: 0000:ffff8100017add80  EFLAGS: 00000297
> >>> RAX: 00000000000000fc RBX: ffff8100017adde0 RCX: 0000000000000001
> >>> RDX: 00000000000008fc RSI: 00000000000000fc RDI: 000000000000000e
> >>> RBP: ffffc20002d11000 R08: ffff8100017ac000 R09: ffffffff80675e38
> >>> R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000000f
> >>> R13: ffffffff8021bcfe R14: 0000000000000000 R15: 0000000000000001
> >>> FS:  0000000000000000(0000) GS:ffffffff8065a000(0000) knlGS:00000000556aa2a0
> >>> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> >>> CR2: ffffc20002d11008 CR3: 0000000000201000 CR4: 00000000000006e0
> >>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> >>>
> >>> Call Trace:
> >>> Inexact backtrace:
> >>>  [<ffffffff802157a4>] mcheck_check_cpu+0x0/0x31
> >>>  [<ffffffff802157a4>] mcheck_check_cpu+0x0/0x31
> >>>  [<ffffffff8021becf>] smp_call_function_mask+0x5f/0x72
> >>>  [<ffffffff802157a4>] mcheck_check_cpu+0x0/0x31
> >>>  [<ffffffff8021bf82>] smp_call_function+0x19/0x1b
> >>>  [<ffffffff8023a773>] on_each_cpu+0x16/0x2b
> >>>  [<ffffffff802158a2>] mcheck_timer+0x0/0x7c
> >>>  [<ffffffff802158c0>] mcheck_timer+0x1e/0x7c
> >>>  [<ffffffff802444b9>] run_workqueue+0x88/0x109
> >>>  [<ffffffff8024453a>] worker_thread+0x0/0xf4
> >>>  [<ffffffff80244623>] worker_thread+0xe9/0xf4
> >>>  [<ffffffff8024841d>] autoremove_wake_function+0x0/0x37
> >>>  [<ffffffff8024841d>] autoremove_wake_function+0x0/0x37
> >>>  [<ffffffff80247e5c>] kthread+0x44/0x6d
> >>>  [<ffffffff8020c5a8>] child_rip+0xa/0x12
> >>>  [<ffffffff80247e18>] kthread+0x0/0x6d
> >>>  [<ffffffff8020c59e>] child_rip+0x0/0x12
> >> hm, I thought we'd fixed the problems in that patchset.  Peter, were
> >> you aware of this one?
> > 
> > Nope, and the stacktrace is utterly puzzling.
> > 
> > /me goes read the lkml.org link
> > 
> > Kamalesh Babulal: do you still get:
> >   BUG: spinlock bad magic on
> > 
> > msgs?
> > 
> > Because those I could reproduce using fsx, and I fixed all that.
> Hi Peter,
> 
> I do not get BUG: spinlock bad magic messages any more, but the softlock message is
> thrown more than 30 time, while running the ltp runall.

It would be good to know what function on_each_cpu is executing, could
you try something like:

---
 kernel/softirq.c    |    5 +++++
 kernel/softlockup.c |    7 +++++++
 2 files changed, 12 insertions(+)

Index: linux-2.6/kernel/softirq.c
===================================================================
--- linux-2.6.orig/kernel/softirq.c
+++ linux-2.6/kernel/softirq.c
@@ -645,6 +645,8 @@ __init int spawn_ksoftirqd(void)
 }
 
 #ifdef CONFIG_SMP
+
+DEFINE_PER_CPU(void (*)(void *info), last_on_each_cpu);
 /*
  * Call a function on all processors
  */
@@ -653,6 +655,9 @@ int on_each_cpu(void (*func) (void *info
 	int ret = 0;
 
 	preempt_disable();
+
+	per_cpu(last_on_each_cpu, smp_processor_id()) = func;
+
 	ret = smp_call_function(func, info, retry, wait);
 	local_irq_disable();
 	func(info);
Index: linux-2.6/kernel/softlockup.c
===================================================================
--- linux-2.6.orig/kernel/softlockup.c
+++ linux-2.6/kernel/softlockup.c
@@ -15,6 +15,8 @@
 #include <linux/notifier.h>
 #include <linux/module.h>
 #include <linux/kgdb.h>
+#include <linux/percpu.h>
+#include <linux/kallsyms.h>
 
 #include <asm/irq_regs.h>
 
@@ -71,6 +73,8 @@ void touch_all_softlockup_watchdogs(void
 }
 EXPORT_SYMBOL(touch_all_softlockup_watchdogs);
 
+DECLARE_PER_CPU(void (*)(void *), last_on_each_cpu);
+
 /*
  * This callback runs from the timer interrupt, and checks
  * whether the watchdog thread has hung or not:
@@ -122,6 +126,9 @@ void softlockup_tick(void)
 	printk(KERN_ERR "BUG: soft lockup - CPU#%d stuck for %lus! [%s:%d]\n",
 			this_cpu, now - touch_timestamp,
 			current->comm, task_pid_nr(current));
+	printk(KERN_ERR " last_on_each_cpu: [<%p>] ",
+			per_cpu(last_on_each_cpu, this_cpu));
+	print_symbol("%s\n", (unsigned long)per_cpu(last_on_each_cpu, this_cpu));
 	if (regs)
 		show_regs(regs);
 	else
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux