Fix for OpenSUSE kernel bug (was Re: [Opps] Invalid opcode)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



S.Çağlar Onur wrote:
05 Kas 2006 Paz 18:40 tarihinde, Andi Kleen şunları yazmıştı:
How do you know this?

Just guessing, if im not wrong panics occur after SMP alternative switching code done its job.

And does it still happen in 2.6.19-rc4?

Will try

in VmWare and Microsoft Virtual
PC and in order to confirm this bug is not our distro specific i
downloaded and tried latest OpenSuse also [1]  and [2] are screens
captured by vmware but exact same panic occurs in Virtual PC as reported
to us in [3].
Always the same BUG()?

Yes, same bug

There is just some rolling Turkish text there.

Ah im sorry here is the correct links :(

[1] http://cekirdek.pardus.org.tr/~caglar/2.6.18/panic_on_opensuse.png
[2] http://cekirdek.pardus.org.tr/~caglar/2.6.18/panic_on_pardus.png

Cheers

I'm proposing this as a fix for your bug. Having tasklets scheduled before softirqd gets to run might be somewhat backwards, but there is nothing I can find wrong about it from a correctness point of view. Better to boot the kernel even when compiled with bug checking on, I think.

This bug started becoming apparent in 2.6.18 because of some rework with the CPU hotplug code, but in theory, it exists at least all the way back to 2.6.10, which is as far as I looked backwards in time.

Zach
It is possible to have tasklets get scheduled before softirqd has had
a chance to spawn on all CPUs.  This is totally harmless; after success
during action CPU_UP_PREPARE, action CPU_ONLINE will be called, which
immediately wakes softirqd on the appropriate CPU to process the already
pending tasklets.  So there is no danger of having a missed wakeup for
any tasklets that were already pending.

In particular, i386 is affected by this during startup, and is visible when
using a very large initrd; during the time it takes for the initrd to be
decompressed, a timer IRQ can come in and schedule RCU callbacks.  It is also
possible that resending of a hardware IRQ via a softirq triggers the same bug.

Because of different timing conditions, this shows up in all emulators
and virtual machines tested, including Xen, VMware, Virtual PC, and Qemu.
It is also possible to trigger on native hardware with a large enough initrd,
although I don't have a reliable case demonstrating that.

Signed-off-by: Zachary Amsden <[email protected]>

Index: linux-2.6.18/kernel/softirq.c
===================================================================
--- linux-2.6.18.orig/kernel/softirq.c	2006-11-10 14:44:39.000000000 -0800
+++ linux-2.6.18/kernel/softirq.c	2006-11-29 22:19:36.000000000 -0800
@@ -574,8 +574,6 @@ static int __cpuinit cpu_callback(struct
 
 	switch (action) {
 	case CPU_UP_PREPARE:
-		BUG_ON(per_cpu(tasklet_vec, hotcpu).list);
-		BUG_ON(per_cpu(tasklet_hi_vec, hotcpu).list);
 		p = kthread_create(ksoftirqd, hcpu, "ksoftirqd/%d", hotcpu);
 		if (IS_ERR(p)) {
 			printk("ksoftirqd for %i failed\n", hotcpu);

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux