RE: MCE problem on dual Opteron

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 4 Aug 2005, Roger Heflin wrote:

> If this does not happen immediately at boot up (before the machine
> finished all init stuff), it is generally a hardware problem.  In
> my experience with new machines 75% of the time it will be the cpu
> itself, and another 25% it will be a serious memory error.

Unfortunatelly this seems to appear during bootup. Allways at the 
exact same place. Attached is a dmesg output that precedes the below 
mentioned MCE error. (BTW: The timestamp there isnt actually 847.7... but 
84.7... I got it wrong when I was copying it from the screen by hand.) The 
attached dmesg output is from the 'nomce' run of the kernel and it is 
snipped so that it ends just where it ends without the 'nomce' option and 
then follows the  below mentioned MCE error.

I don't think it is a HW problem, or am I wrong?

Martin

> 
> > -----Original Message-----
> > From: [email protected] 
> > [mailto:[email protected]] On Behalf Of Martin Drab
> > Sent: Thursday, August 04, 2005 8:55 AM
> > To: Linux Kernel Mailing List
> > Subject: MCE problem on dual Opteron
> > 
> > Hi,
> > 
> > I get the following problem with 2.6.13-rc5-git1 on a dual Opteron
> > machine:
> > 
> > ---------
> > ...
> > [   847.745921] CPU 0: Machine Check Exception:               
> >  7 Bank 3: b40000000000083b
> > [   847.746066] RIP 10:<ffffffff802c04ee> {pci_conf1_read+0xbe/0x110}
> > [   847.746149] TSC 189fe311d3f ADDR fdfc000cfe
> > [   847.746218] Kernel panic - not syncing: Uncorrected machine check
> > ---------
> > 
> > This appears during bootup and it hangs. So my question is: 
> > Is this a HW problem or is it some kernel (MCE ?) bug? If it 
> > is a HW problem is it possible to determine what's wrong somehow?
...
[    0.000000] Bootdata ok (command line is ro root=LABEL=/ resume=/dev/sda2 rhgb vga=794 ipmi_watchdog.start_now=0 ipmi_watchdog.timeout=0 nomce)
[    0.000000] Linux version 2.6.13-rc5-git1 ([email protected]) (gcc version 4.0.0 20050519 (Red Hat 4.0.0-8)) #2 SMP Wed Aug 3 19:41:13 CEST 2005
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  BIOS-e820: 0000000000000000 - 000000000009b000 (usable)
[    0.000000]  BIOS-e820: 000000000009b000 - 00000000000a0000 (reserved)
[    0.000000]  BIOS-e820: 00000000000d0000 - 0000000000100000 (reserved)
[    0.000000]  BIOS-e820: 0000000000100000 - 000000007ff70000 (usable)
[    0.000000]  BIOS-e820: 000000007ff70000 - 000000007ff7b000 (ACPI data)
[    0.000000]  BIOS-e820: 000000007ff7b000 - 000000007ff80000 (ACPI NVS)
[    0.000000]  BIOS-e820: 000000007ff80000 - 0000000080000000 (reserved)
[    0.000000]  BIOS-e820: 00000000fec00000 - 00000000fec00400 (reserved)
[    0.000000]  BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
[    0.000000]  BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)
[    0.000000] ACPI: RSDP (v002 PTLTD                                 ) @ 0x00000000000f78e0
[    0.000000] ACPI: XSDT (v001 PTLTD  	 XSDT   0x06040000  LTP 0x00000000) @ 0x000000007ff78d11
[    0.000000] ACPI: FADT (v003 AMD    HAMMER   0x06040000 PTEC 0x000f4240) @ 0x000000007ff7ae0e
[    0.000000] ACPI: HPET (v001 AMD    HAMMER   0x06040000 PTEC 0x00000000) @ 0x000000007ff7af02
[    0.000000] ACPI: MADT (v001 PTLTD  	 APIC   0x06040000  LTP 0x00000000) @ 0x000000007ff7af3a
[    0.000000] ACPI: SPCR (v001 PTLTD  $UCRTBL$ 0x06040000 PTL  0x00000001) @ 0x000000007ff7afb0
[    0.000000] ACPI: DSDT (v001 AMD-K8  AMDACPI 0x06040000 MSFT 0x0100000e) @ 0x0000000000000000
[    0.000000] Scanning NUMA topology in Northbridge 24
[    0.000000] Number of nodes 2
[    0.000000] Node 0 using interleaving mode 1/0
[    0.000000] No NUMA configuration found
[    0.000000] Faking a node at 0000000000000000-000000007ff70000
[    0.000000] Bootmem setup node 0 0000000000000000-000000007ff70000
[    0.000000] On node 0 totalpages: 524144
[    0.000000]   DMA zone: 4096 pages, LIFO batch:1
[    0.000000]   Normal zone: 520048 pages, LIFO batch:31
[    0.000000]   HighMem zone: 0 pages, LIFO batch:1
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[    0.000000] Processor #0 15:5 APIC version 16
[    0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
[    0.000000] Processor #1 15:5 APIC version 16
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
[    0.000000] ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
[    0.000000] IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23
[    0.000000] ACPI: IOAPIC (id[0x03] address[0xfc000000] gsi_base[24])
[    0.000000] IOAPIC[1]: apic_id 3, version 17, address 0xfc000000, GSI 24-27
[    0.000000] ACPI: IOAPIC (id[0x04] address[0xfc001000] gsi_base[28])
[    0.000000] IOAPIC[2]: apic_id 4, version 17, address 0xfc001000, GSI 28-31
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge)
[    0.000000] ACPI: IRQ0 used by override.
[    0.000000] ACPI: IRQ2 used by override.
[    0.000000] ACPI: IRQ9 used by override.
[    0.000000] Setting APIC routing to flat
[    0.000000] ACPI: HPET id: 0x102282a0 base: 0xfed00000
[    0.000000] Using ACPI (MADT) for SMP configuration information
[    0.000000] Allocating PCI resources starting at 80000000 (gap: 80000000:7ec00000)
[    0.000000] Built 1 zonelists
[    0.000000] Kernel command line: ro root=LABEL=/ resume=/dev/sda2 rhgb vga=794 ipmi_watchdog.start_now=0 ipmi_watchdog.timeout=0 nomce
[    0.000000] Initializing CPU#0
[    0.000000] PID hash table entries: 4096 (order: 12, 131072 bytes)
[    0.000000] time.c: Using 14.318180 MHz HPET timer.
[    0.000000] time.c: Detected 1994.064 MHz processor.
[   94.964587] Console: colour dummy device 80x25
[   94.967418] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
[   94.971633] Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
[   94.995579] Memory: 2055484k/2096576k available (2263k kernel code, 0k reserved, 1473k data, 220k init)
[   95.144209] Calibrating delay using timer specific routine.. 3992.44 BogoMIPS (lpj=19962218)
[   95.144251] Security Framework v1.0.0 initialized
[   95.144263] SELinux:  Initializing.
[   95.144279] SELinux:  Starting in permissive mode
[   95.144286] selinux_register_security:  Registering secondary module capability
[   95.144291] Capability LSM initialized as secondary
[   95.144305] Mount-cache hash table entries: 256
[   95.144456] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[   95.144461] CPU: L2 Cache: 1024K (64 bytes/line)
[   95.144465] CPU 0(1) -> Node 0 -> Core 0
[   95.144470] mtrr: v2.0 (20020519)
[   95.254026] Using local APIC timer interrupts.
[   95.304130] Detected 12.462 MHz APIC timer.
[   95.304237] Booting processor 1/2 APIC 0x1
[   95.314544] Initializing CPU#1
[   95.463830] Calibrating delay using timer specific routine.. 3987.63 BogoMIPS (lpj=19938185)
[   95.463838] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[   95.463840] CPU: L2 Cache: 1024K (64 bytes/line)
[   95.463842] CPU 1(1) -> Node 0 -> Core 0
[   95.463968] AMD Opteron(tm) Processor 246 stepping 0a
[   95.463979] CPU 1: Syncing TSC to CPU 0.
[   95.464001] Brought up 2 CPUs
[   95.464240] CPU 1: synchronized TSC with CPU 0 (last diff -23 cycles, maxerr 1124 cycles)
[   95.464277] time.c: Using HPET based timekeeping.
[   95.464289] testing NMI watchdog ... OK.
[   95.564386] checking if image is initramfs... it is
[   95.647764] NET: Registered protocol family 16
[   95.647808] ACPI: bus type pci registered

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]
  Powered by Linux