Re: Is ECC memory any use?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 2007-12-09 at 04:45 -0500, Chris Snook wrote:
> Timothy Murphy wrote:
> > I'm getting memory for a very old (P2B-LS) Asus motherboard,
> > and I see I can get ECC memory for some 20% more.
> > 
> > Is there any point in getting this?
> > I see there is quite a lot of work
> > in getting ECC testing incorporated into the Linux kernel.
> > But even if it were there, would it be very valuable?
> > 
> > I have a feeling that disk errors are far more likely
> > than RAM errors.
> > Is that right?
> > 
> > 
> 
> Depends who's buying.  Few people do anything on "personal" systems that really 
> justifies ECC RAM, though I'm sure the exceptions are probably on this list.  If 
> you're doing any kind of business work where uptime is important, or any kind of 
> technical work where bit flips could cause nasty side effects, it's probably 
> worth buying the ECC, unless you're doing high-end graphics where a stray pixel 
> won't make a difference and most of your power budget is going to the GPUs.
> 
> 	-- Chris
> 
You are assuming that only data resides in the memory, which is not the
case.  Your program will act quite strangely if bits flip as well.  ECC
has been dropped by some manufacturers because it is cheaper.  Some
other forms of systems have an alternative scheme, but I prefer ECC.
That said, my current system doesn't have it because I misread the spec.
Bits do not typically "flip occasionally".  As memory ages, a "feature"
of CMOS is something that creates bridges internal to the silicon.  This
in turn causes failures.  If you are lucky, the bit that fails will
match what is written to it.  In which case no failure shows up.  On the
other hand a bit that has a bridge can change state, and if you wrote a
0 to a bit that would bridge to a 1, then you will find that a program
can run for a while, then FLIP! and it doesn't work.  You will get an
error message that will probably not say anything about a bit flipping,
and worse, it will not be repeatable because programs are dynamically
loaded, data is paged in and out, and memory is occasionally shifted
(depending on the OS means of optimizing memory access, and the language
or OS means of garbage collection).  ECC memory means your programs will
work because your system can recover from these "single bit errors".
Note that very few ECC systems will correct multibit errors.

	If you have too many failures in code running, with strange and non
repeatable conditions, you should begin to suspect memory errors,
whether or not you have ECC memory.

Regards,
Les Howell


[Index of Archives]     [Current Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [Yosemite Photos]     [KDE Users]     [Fedora Tools]     [Fedora Docs]

  Powered by Linux