Summary:
* Patch is relevant to driver "char/ecc.c", for the AMD76x Athlon chipset.
* Write 1 bits, not 0 bits, to clear the ECC error flag status.
* Without this patch, Linux will detect ONLY the first single and multibit
error. All subsequent errors are ignored by the hardware until the
register is properly cleared.
* The patch also fast-paths the common polled operation, and simplifies
code.
* Includes reference to AMD 761 spec sheet documenting the ECC register
values.
* Note: this module is often not installed by default: do "modprobe ecc"
then "cat /proc/ram" to check your ECC memory for detected soft errors.
* Patch is against Linux-2.6.13, the last kernel I could find with ecc.c
* Tabs suck.
I am an infrequent contributor, and did not find a matching entry in the
./MAINTAINERS file. Please help me to understand the proper procedure
for submitting this patch. I understand that perhaps ecc.c is changing
soon. So maybe it's not the most important patch, but it does fix a
real bug, and is quite simple.
----------------------------------------------------------------------------------------
linux:/usr/src # diff -u -b orig/drivers/char/ecc.c
linux-2.6.13-15/drivers/char/ecc.c > /tmp/ecc.diff
linux:/usr/src # cat /tmp/ecc.diff
--- orig/drivers/char/ecc.c 2005-09-13 08:52:29.000000000 -0700
+++ linux-2.6.13-15/drivers/char/ecc.c 2005-12-05 09:36:26.000000000 -0800
@@ -10,6 +10,7 @@
*/
#define DEBUG 0
+
#include <linux/config.h>
#include <linux/version.h>
#include <linux/module.h>
@@ -22,7 +23,7 @@
#include <asm/io.h>
#include <linux/proc_fs.h>
-#define ECC_VER "0.14 (Oct 10 2001)"
+#define ECC_VER "0.15 (Dec 1 2005)"
#define KERN_ECC KERN_ALERT
static struct timer_list ecctimer;
@@ -1102,15 +1103,20 @@
}
}
+
+// Spec source: AMD 761 System Controller/BIOS Guide, 24081D-February 2002
+//
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24081.pdf
void check_amd76x(void)
{
- unsigned long eccstat = pci_dword(0x48);
+ u32 eccstat;
+ pci_read_config_dword(bridge, 0x48, &eccstat);
+ if(eccstat & 0x30)
+ {
if(eccstat & 0x10)
{
/* bits 7-4 of eccstat indicate the row the MBE occurred. */
int row = (eccstat >> 4) & 0xf;
printk("<1>ECC: MBE Detected in DRAM row %d\n", row);
- scrub_needed |= 2;
bank[row].mbecount++;
}
if(eccstat & 0x20)
@@ -1118,22 +1124,9 @@
/* bits 3-0 of eccstat indicate the row the SBE occurred. */
int row = eccstat & 0xf;
printk("<1>ECC: SBE Detected in DRAM row %d\n", row);
- scrub_needed |= 1;
bank[row].sbecount++;
}
- if (scrub_needed)
- {
- /*
- * clear error flag bits that were set by writing 0 to them
- * we hope the error was a fluke or something :)
- */
- unsigned long value = eccstat;
- if (scrub_needed & 1)
- value &= 0xFFFFFDFF;
- if (scrub_needed & 2)
- value &= 0xFFFFFEFF;
- pci_write_config_dword(bridge, 0x48, value);
- scrub_needed = 0;
+ pci_write_config_dword(bridge, 0x48, eccstat); // clear
by writing a 1
}
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]