On Sun, 2007-12-09 at 04:45 -0500, Chris Snook wrote: > Timothy Murphy wrote: > > I'm getting memory for a very old (P2B-LS) Asus motherboard, > > and I see I can get ECC memory for some 20% more. > > > > Is there any point in getting this? > > I see there is quite a lot of work > > in getting ECC testing incorporated into the Linux kernel. > > But even if it were there, would it be very valuable? > > > > I have a feeling that disk errors are far more likely > > than RAM errors. > > Is that right? > > > > > > Depends who's buying. Few people do anything on "personal" systems that really > justifies ECC RAM, though I'm sure the exceptions are probably on this list. If > you're doing any kind of business work where uptime is important, or any kind of > technical work where bit flips could cause nasty side effects, it's probably > worth buying the ECC, unless you're doing high-end graphics where a stray pixel > won't make a difference and most of your power budget is going to the GPUs. > > -- Chris > You are assuming that only data resides in the memory, which is not the case. Your program will act quite strangely if bits flip as well. ECC has been dropped by some manufacturers because it is cheaper. Some other forms of systems have an alternative scheme, but I prefer ECC. That said, my current system doesn't have it because I misread the spec. Bits do not typically "flip occasionally". As memory ages, a "feature" of CMOS is something that creates bridges internal to the silicon. This in turn causes failures. If you are lucky, the bit that fails will match what is written to it. In which case no failure shows up. On the other hand a bit that has a bridge can change state, and if you wrote a 0 to a bit that would bridge to a 1, then you will find that a program can run for a while, then FLIP! and it doesn't work. You will get an error message that will probably not say anything about a bit flipping, and worse, it will not be repeatable because programs are dynamically loaded, data is paged in and out, and memory is occasionally shifted (depending on the OS means of optimizing memory access, and the language or OS means of garbage collection). ECC memory means your programs will work because your system can recover from these "single bit errors". Note that very few ECC systems will correct multibit errors. If you have too many failures in code running, with strange and non repeatable conditions, you should begin to suspect memory errors, whether or not you have ECC memory. Regards, Les Howell