Handle X wrote:
On 7/27/06, Robert Hancock <[email protected]> wrote:
Vikas Kedia wrote:
> The server seems to be running fine. A. can I ignore the following
> mcelog errors ? B. If not what should i do to stop the server from
> reporting mcelog errors.
Looks like data cache ECC errors, meaning the CPU 0 is faulty.
Eventually if it's not replaced there will likely be some uncorrectable
errors and the system will likely crash.
I am facing similar, but different errors.
[root@turyxsrv ~]# mcelog
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 4 northbridge TSC 89a560bb249
ADDR 1dfa49690
Northbridge Chipkill ECC error
Chipkill ECC syndrome = 2021
bit46 = corrected ecc error
bus error 'local node response, request didn't time out
generic read mem transaction
memory access, level generic'
STATUS 9410c00020080a13 MCGSTATUS 0
Repeats whenever I do any kind of operations...
How severe is ChipKill errors? Should I consider throwing away CPU 1
and get another one.
That sounds to me more like some of the RAM attached to CPU1 is bad..
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]