Brent Snow, Mr. wrote:
Hi All,
I am having a problem with a new Dell PowerEdge 1900 Server
running Fedora 8.
The System setup is as follows:
2 - Xeon E5310 (Quad-Core 1.6 GHz) processors
16 GB of RAM, I SATA 80 GB HDD.
------------------------------------------------------
The Error is as follows: EDAC i5000 MC0: nonfatal errors
found 0=800.
Is that the only error that you are getting? If edac is detecting enough
memory errors to slow a machine down, you should have enormous numbers of edac
errors in either dmesg or the messages file.
The system runs very very slow (I have a p3 that is faster
then this system is).
I have installed Windows 2003 Server X_64 and it runs very
very quick.
There are no errors under Windows, and there are no errors
reported by Dell's diagnostic tools.
I have run Memtest86+ (for 96 hours) and there are no errors
detected there as well.
Does the memtest program you are running actually register ECC errors for the
I5000 chipset? And is the ECC monitoring feature in memtest86 actually turned
on? It will show up in the menus, if it does not, then it is not monitoring the
ECC errors, and is useless to debug this issue. If it does not actually read
those errors then you could be getting errors all over the place and the
hardware ECC would correct it and memtest would be think everything was ok-I
have seen this more than once.
As soon as I install Fedora 8, the errors show back up and
the system just bogs down.
I have tried aliasing the EDAC files thinking that this may
be the problem, but all that did was stop the log messages.
If edac was causing the problem, and you don't actually have bad memory, then
you would need to remove the edac module and/or turn it off to stop the errors,
but I have seen new ram so bad that it gave persistent errors on every access
(this was the 2nd rev of memory from the company for a certain MB-the 1st rev
crashed under load within a very short time), and this memory was going to be
passed by the MB vendor (not DELL) because they were using memtest86 and it
ignored the actual errors that they were getting since the HW corrected it, so
it looked fine to them.
Roger