machine check errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have a Tyan S2892 board with a pair Opteron 288 dual core cpus and 16GB dram. I receive the errors shown in the attached file, mcelog. It appears that these occur when the free memory becomes small, there is a lot in the cache, and a lot of IO.

The Tyan S2892 has an Nvidia Crush K8-04, which I think they call the southbridge. My errors appear to be related to the north bridge. There is an AMD 8131 PCI-X controller that runs the PCI slots. There is a 3WARE 9500-12 located in one of the PCI-X slots.

I have run Memtest86+-1.65 for 24 hours without errors. I recently upgraded the BIOS to V2.00 without any remarkable changes.

I am running 2.6.15 within a current Fedora Core4 configuration.

I would appreciate any advice as to how to proceed. I have not noticed any adverse behavior from the mce's. But that could be masked is data transfered or ???.

Could there be any connection with the memory cache? Thanks in advance for your assistance.

don
--

MCE 0
CPU 2 4 northbridge TSC 967b1992c66 
ADDR 2a52cb5f0 
  Northbridge Chipkill ECC error
  Chipkill ECC syndrome = 40b9
       bit32 = err cpu0
       bit46 = corrected ecc error
       bit62 = error overflow (multiple errors)
  bus error 'local node origin, request didn't time out
      generic read mem transaction
      memory access, level generic'
STATUS d45cc00140080813 MCGSTATUS 0
MCE 1
CPU 2 4 northbridge TSC a101bbf7338 
ADDR 2922df698 
  Northbridge Chipkill ECC error
  Chipkill ECC syndrome = 6051
       bit46 = corrected ecc error
       bit62 = error overflow (multiple errors)
  bus error 'local node response, request didn't time out
      generic read mem transaction
      memory access, level generic'
STATUS d428c00060080a13 MCGSTATUS 0
MCE 2
CPU 2 4 northbridge TSC ab885e5bdbe 
ADDR 2922df698 
  Northbridge Chipkill ECC error
  Chipkill ECC syndrome = 6051
       bit46 = corrected ecc error
       bit62 = error overflow (multiple errors)
  bus error 'local node response, request didn't time out
      generic read mem transaction
      memory access, level generic'
STATUS d428c00060080a13 MCGSTATUS 0
MCE 0
CPU 2 4 northbridge TSC b60f17bf394 
ADDR 2918d98d0 
  Northbridge Chipkill ECC error
  Chipkill ECC syndrome = 6051
       bit46 = corrected ecc error
       bit62 = error overflow (multiple errors)
  bus error 'local node response, request didn't time out
      generic read mem transaction
      memory access, level generic'
STATUS d428c00060080a13 MCGSTATUS 0
MCE 1
CPU 2 4 northbridge TSC c095ba23a7e 
ADDR 2918cdff0 
  Northbridge Chipkill ECC error
  Chipkill ECC syndrome = 40b9
       bit46 = corrected ecc error
       bit62 = error overflow (multiple errors)
  bus error 'local node response, request didn't time out
      generic read mem transaction
      memory access, level generic'
STATUS d45cc00040080a13 MCGSTATUS 0
MCE 2
CPU 2 4 northbridge TSC cb1c7387269 
ADDR 2bf0cfa50 
  Northbridge Chipkill ECC error
  Chipkill ECC syndrome = 6051
       bit46 = corrected ecc error
       bit62 = error overflow (multiple errors)
  bus error 'local node response, request didn't time out
      generic read mem transaction
      memory access, level generic'
STATUS d428c00060080a13 MCGSTATUS 0
MCE 3
CPU 2 4 northbridge TSC d5a315f1a34 
ADDR 2900df990 
  Northbridge Chipkill ECC error
  Chipkill ECC syndrome = 20e8
       bit46 = corrected ecc error
       bit62 = error overflow (multiple errors)
  bus error 'local node response, request didn't time out
      generic read mem transaction
      memory access, level generic'
STATUS d474400020080a13 MCGSTATUS 0
MCE 4
CPU 2 4 northbridge TSC e029b8504a6 
ADDR 2900dd030 
  Northbridge Chipkill ECC error
  Chipkill ECC syndrome = 6051
       bit46 = corrected ecc error
       bit62 = error overflow (multiple errors)
  bus error 'local node response, request didn't time out
      generic read mem transaction
      memory access, level generic'
STATUS d428c00060080a13 MCGSTATUS 0
MCE 5
CPU 2 4 northbridge TSC eab071b5316 
ADDR 291ac9d98 
  Northbridge Chipkill ECC error
  Chipkill ECC syndrome = 6051
       bit46 = corrected ecc error
       bit62 = error overflow (multiple errors)
  bus error 'local node response, request didn't time out
      generic read mem transaction
      memory access, level generic'
STATUS d428c00060080a13 MCGSTATUS 0
MCE 6
CPU 2 4 northbridge TSC f537141ab1c 
ADDR 2918dfe78 
  Northbridge Chipkill ECC error
  Chipkill ECC syndrome = 40b9
       bit33 = err cpu1
       bit46 = corrected ecc error
       bit62 = error overflow (multiple errors)
  bus error 'local node origin, request didn't time out
      generic read mem transaction
      memory access, level generic'
STATUS d45cc00240080813 MCGSTATUS 0
MCE 7
CPU 2 4 northbridge TSC ffbdb67fd26 
ADDR 2beac9010 
  Northbridge Chipkill ECC error
  Chipkill ECC syndrome = 40b9
       bit46 = corrected ecc error
       bit62 = error overflow (multiple errors)
  bus error 'local node response, request didn't time out
      generic read mem transaction
      memory access, level generic'
STATUS d45cc00040080a13 MCGSTATUS 0
MCE 0
CPU 2 4 northbridge TSC 10a446fe04fa 
ADDR 2cfcc9870 
  Northbridge Chipkill ECC error
  Chipkill ECC syndrome = 40b9
       bit32 = err cpu0
       bit46 = corrected ecc error
       bit62 = error overflow (multiple errors)
  bus error 'local node origin, request didn't time out
      generic read mem transaction
      memory access, level generic'
STATUS d45cc00140080813 MCGSTATUS 0
MCE 1
CPU 2 4 northbridge TSC 114cb12451a2 
ADDR 291ac9630 
  Northbridge Chipkill ECC error
  Chipkill ECC syndrome = 6051
       bit33 = err cpu1
       bit46 = corrected ecc error
       bit62 = error overflow (multiple errors)
  bus error 'local node origin, request didn't time out
      generic read mem transaction
      memory access, level generic'
STATUS d428c00260080813 MCGSTATUS 0
MCE 2
CPU 2 4 northbridge TSC 11f51cba9b82 
ADDR 2c10cb010 
  Northbridge Chipkill ECC error
  Chipkill ECC syndrome = 20e8
       bit32 = err cpu0
       bit46 = corrected ecc error
       bit62 = error overflow (multiple errors)
  bus error 'local node origin, request didn't time out
      generic read mem transaction
      memory access, level generic'
STATUS d474400120080813 MCGSTATUS 0
MCE 3
CPU 2 4 northbridge TSC 129d86e0d26a 
ADDR 294ec9390 
  Northbridge Chipkill ECC error
  Chipkill ECC syndrome = 6051
       bit32 = err cpu0
       bit46 = corrected ecc error
       bit62 = error overflow (multiple errors)
  bus error 'local node origin, request didn't time out
      generic read mem transaction
      memory access, level generic'
STATUS d428c00160080813 MCGSTATUS 0

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux