Jack Howarth wrote:
We have a machine with ECC support enabled in the motherboard firmware
and ECC DIMMs installed. Recently this machine has suffered a couple
random freezes and yesterday began to report the following kernel error...
kernel: EDAC MC0: UE page 0x8e0, offset 0x0, grain 4096, row 0, labels ":": i82875p UE
...indicating it had unrecoverable memory errors. However, when I run
memtest86+ by booting into it, the default settings with ECC disabled
don't report any memory errors during the test. If I enable the ECC
mode in memtest86+, I finally do see a bad memory location appear
repeatedly.
What exactly is happening in this situation? I am guessing that the
ECC enabled memory is suppressing the bad memory location just enough
that it passes when the memtest86+ memory test is run with ECC disabled.
This would only make sense if memtest86+ somehow short-circuited the
ECC feature when the ECC mode in memtest86+ is enabled so that it could
see if ECC is correcting memory errors in the background silently. Is
this a correct read on the situation?
IMO, no. *I* think it's the ECC feature of your chips/mobo that's the
culprit here, not ECC masking the problem. That's just a guess though.
-- Rex