Re: kernel panic at load average of 24 is it acceptable ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Read up on MCE debugging methods on Linux or so, that should hopefully help.

Here is the output of mcelog:
root@srv1:~# less /var/log/mcelog
MCE 0
CPU 0 0 data cache TSC 6988ae18046
ADDR f87f5ec0
 Data cache ECC error (syndrome ce)
      bit46 = corrected ecc error
 bus error 'local node origin, request didn't time out
     data read mem transaction
     memory access, level generic'
STATUS 9467400000000833 MCGSTATUS 0
MCE 0
CPU 0 0 data cache TSC 723b38a3633
ADDR 3d9fc0
 Data cache ECC error (syndrome ce)
      bit46 = corrected ecc error
      bit62 = error overflow (multiple errors)
 bus error 'local node origin, request didn't time out
     data read mem transaction
     memory access, level generic'
STATUS d467400000000833 MCGSTATUS 0

Since it shows ECC error is the hypothesis correct that its the RAM
problem and replacing it should solve the problem.

Regards,

Vikas

On 7/17/06, Andreas Mohr <[email protected]> wrote:
Hi,

On Mon, Jul 17, 2006 at 12:08:41AM -0700, Vikas Kedia wrote:
> The memtest ran fine for 8 hours:
> http://www.visitlab.com/styles/basic/images/memtest.JPG
>
> My questions are:
> 1. Kernel panic at load average of 24 is it acceptable ?

Kernel panic is _NEVER_ acceptable.
I've seen loads in the couple hundreds with no problem.

However you actually have a mce_panic() crash here.
Make sure to figure out why this Machine Check Exception got raised,
otherwise you might hose the box if you continue without investigation.
It could easily be due to mal-working CPU fan etc.pp., especially since it
happened exactly while you stress-tested the machine.

> 2. If not how do I go about debugging this kernel panic ?

Read up on MCE debugging methods on Linux or so, that should hopefully help.

Good luck!

Andreas Mohr

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux