Re: EDAC error

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Brent Snow, Mr. wrote:
Hi All,

            I am having a problem with a new Dell PowerEdge 1900 Server
running Fedora 8.

            The System setup is as follows:

            2 - Xeon  E5310 (Quad-Core 1.6 GHz) processors

16 GB of RAM, I SATA 80 GB HDD.
            ------------------------------------------------------

            The Error is as follows: EDAC i5000 MC0: nonfatal errors
found 0=800.


Is that the only error that you are getting? If edac is detecting enough memory errors to slow a machine down, you should have enormous numbers of edac errors in either dmesg or the messages file.

            The system runs very very slow (I have a p3 that is faster
then this system is).

            I have installed Windows 2003 Server X_64 and it runs very
very quick.
            There are no errors under Windows, and there are no errors
reported by Dell's diagnostic tools.

            I have run Memtest86+ (for 96 hours) and there are no errors
detected there as well.


Does the memtest program you are running actually register ECC errors for the I5000 chipset? And is the ECC monitoring feature in memtest86 actually turned on? It will show up in the menus, if it does not, then it is not monitoring the ECC errors, and is useless to debug this issue. If it does not actually read those errors then you could be getting errors all over the place and the hardware ECC would correct it and memtest would be think everything was ok-I have seen this more than once.

            As soon as I install Fedora 8, the errors show back up and
the system just bogs down.
            I have tried aliasing the EDAC files thinking that this may
be the problem, but all that did was stop the log messages.

If edac was causing the problem, and you don't actually have bad memory, then you would need to remove the edac module and/or turn it off to stop the errors, but I have seen new ram so bad that it gave persistent errors on every access (this was the 2nd rev of memory from the company for a certain MB-the 1st rev crashed under load within a very short time), and this memory was going to be passed by the MB vendor (not DELL) because they were using memtest86 and it ignored the actual errors that they were getting since the HW corrected it, so it looked fine to them.

                                 Roger


[Index of Archives]     [Current Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [Yosemite Photos]     [KDE Users]     [Fedora Tools]     [Fedora Docs]

  Powered by Linux