Re: [RCF] Linux memory error handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2005.06.15 09:30:13 +0000, Russ Anderson wrote:
> 		[RCF] Linux memory error handling.
> 
> Summary: One of the most common hardware failures in a computer 
> 	is a memory failure.   There has been efforts in various
> 	architectures to support recover from memory errors.  This
> 	is an attempt to define a common support infrastructure
> 	in Linux to support memory error handling.
> 
> Background:  There has been considerable work on recovering from
> 	Machine Check Aborts (MCAs) in arch/ia64.  One result is
> 	that many memory errors encountered by user applications
> 	not longer cause a kernel panic.  The application is 
> 	terminated, but linux and other applications keep running.
> 	Additional improvements are becoming dependent on mainline
> 	linux support.  That requires involvement of lkml, not
> 	just linux-ia64.

Good RFC! Actually on x86 arch, 'bluesmoke' - http://bluesmoke.sf.net - is out 
there for some simple mem ECC error handling already. It's inspired by the old linux-ecc 
project. Current capability is limited to detect, report, configuable for polling and UE
panic. 

Bluesmoke contains a driver core which is used to host infos for each mem 
controller, like dimm info, and currently only polling method is taken for registered 
controller. Others are all the specific chipset drivers, which is mostly platform depend, 
e.g e7520, 82875P, etc. Those platforms have also been tested, bluesmoke's webpage
contains some test method if you really want to try. 

nmi handling is still under work, Dave and Corey's patch is on sourceforge page, and

    http://lkml.org/lkml/2004/8/19/140
    http://lkml.org/lkml/2005/3/21/11

Those nmi callbacks have not been added to chipset driver yet, but some initial 
testing failed, still don't know why...

thanks
-zhen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux