Re: mm/rmap.c bug(?) in the 2.6.12 kernel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 29 Sep 2005, Balazs Csak wrote:
> 
> I've got an oops on a dual Opteron machine that's running on 2.6.12.3
> kernel. Under heavy load the system hangs and we get this oops on the
> screen:
> ...
> <ffffffff8013477d>{printk+141} <ffffffff8012db83>{try_to_wake_up+915}
> <ffffffff8010e445>{error_exit+0} <ffffffff80170ec6>{page_remove_rmap+38}
> <ffffffff80168fde>{unmap_vmas+1342} <ffffffff8016ea63>{exit_mmap+163}
> <ffffffff801317a1>{mmput+49} <ffffffff80136432>{do_exit+338}
> <ffffffff80136f5f>{do_group_exit+239}
> <ffffffff8010da46>{system_call+126}
> 
> This problem has been described here, earlier:
> http://seclists.org/lists/linux-kernel/2005/May/6369.html
> 
> Our setup is: 2x Opteron 246, Tyan Thunder K8S MB, 3ware 9500-8,
> 4 G RAM, Fedora Core 3 x86_64, 2.6.12.3 kernel.
> 
> Memtest shows nothing special...
> Does anyone know something useful for this problem?

Please try Linus' patch at the bottom: our best guess now is that
yours is a different manifestation of the same underlying issue.  
Here's what Linus said on 20 Sep:

On Tue, 20 Sep 2005, Charles McCreary wrote:
>
> Another datapoint for this thread. The box spewing the bad pmds messages is a 
> dual opteron 246 on a TYAN S2885 Thunder K8W motherboard. Kernel is 
> 2.6.11.4-20a-smp.

This is quite possibly the result of an Opteron errata (tlb flush
filtering is broken on SMP) that we worked around as of 2.6.14-rc4.

So either just try 2.6.14-rc2, or try the appended patch (it has since 
been confirmed by many more people).

		Linus

---
diff-tree bc5e8fdfc622b03acf5ac974a1b8b26da6511c99 (from 61ffcafafb3d985e1ab8463be0187b421614775c)
Author: Linus Torvalds <[email protected]>
Date:   Sat Sep 17 15:41:04 2005 -0700

    x86-64/smp: fix random SIGSEGV issues
    
    They seem to have been due to AMD errata 63/122; the fix is to disable
    TLB flush filtering in SMP configurations.
    
    Confirmed to fix the problem by Andrew Walrond <[email protected]>
    
    [ Let's see if we'll have a better fix eventually, this is the Q&D
      "let's get this fixed and out there" version ]
    
    Signed-off-by: Linus Torvalds <[email protected]>

diff --git a/arch/x86_64/kernel/setup.c b/arch/x86_64/kernel/setup.c
--- a/arch/x86_64/kernel/setup.c
+++ b/arch/x86_64/kernel/setup.c
@@ -831,11 +831,26 @@ static void __init amd_detect_cmp(struct
 #endif
 }
 
+#define HWCR 0xc0010015
+
 static int __init init_amd(struct cpuinfo_x86 *c)
 {
 	int r;
 	int level;
 
+#ifdef CONFIG_SMP
+	unsigned long value;
+
+	// Disable TLB flush filter by setting HWCR.FFDIS:
+	// bit 6 of msr C001_0015
+	//
+	// Errata 63 for SH-B3 steppings
+	// Errata 122 for all(?) steppings
+	rdmsrl(HWCR, value);
+	value |= 1 << 6;
+	wrmsrl(HWCR, value);
+#endif
+
 	/* Bit 31 in normal CPUID used for nonstandard 3DNow ID;
 	   3DNow is IDd by bit 31 in extended CPUID (1*32+31) anyway */
 	clear_bit(0*32+31, &c->x86_capability);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]
  Powered by Linux