2.6.11 x86_64 NUMA slightly broken

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello!

I've upgraded the kernel from 2.6.10 to 2.6.11 (both from Fedora Core 3
RPMs) on one of my servers, based on a Tyan S2882 board with two Opteron
248's and two 1Gb modules inserted into the slots next to *second* CPU
(they just happened to be there). And the kernel didn't booted. I've
also tried to compile the kernel myself -- with same result.

I'll skip the whole detective story and will describe just the net
result. It looks like the bug is somewhere in the NUMA handling code,
because:

a) The kernel boots fine with numa=off

b) The kernel started working even without numa=off after I've moved the
DRAM from the slots bound to CPU#2 to the slots bound to CPU#1.

c) The output with kernel 2.6.10 (which worked fine even with DRAM in
slots #2) was as follows:

Bootdata ok (command line is ro root=/dev/hda1 selinux=0)
Linux version 2.6.10-1.741_FC3smp ([email protected])
(gcc version 3.4.2 20041017 (Red Hat 3.4.2-6.fc3)) #1 SMP
Thu Jan 13 16:58:29 EST 2005
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000007fff0000 (usable)
 BIOS-e820: 000000007fff0000 - 000000007ffff000 (ACPI data)
 BIOS-e820: 000000007ffff000 - 0000000080000000 (ACPI NVS)
 BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved)
Scanning NUMA topology in Northbridge 24
Number of nodes 2 (10010)
Skipping disabled node 0
Node 1 MemBase 0000000000000000 Limit 000000007fff0000
Using node hash shift of 24
Bootmem setup node 1 0000000000000000-000000007fff0000
No mptable found.
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 15:5 APIC version 16
...

---------------------------------------------------
Kernel 2.6.11 with numa=off:

Linux version 2.6.11-1.7_FC3smp ([email protected])
(gcc version 3.4.3 20050227 (Red Hat 3.4.3-22)) #1 SMP
Sat Mar 19 20:16:43 EST 2005
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000007fff0000 (usable)
 BIOS-e820: 000000007fff0000 - 000000007ffff000 (ACPI data)
 BIOS-e820: 000000007ffff000 - 0000000080000000 (ACPI NVS)
 BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved)
NUMA turned off
Faking a node at 0000000000000000-000000007fff0000
Bootmem setup node 0 0000000000000000-000000007fff0000
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 15:5 APIC version 16
...

---------------------------------------------------
Kernel 2.6.11 with enabled numa, crashing (transcribed off the
screen with the earlyprintk=vga option):

Linux version 2.6.11-1.7_FC3smp ([email protected])
(gcc version 3.4.3 20050227 (Red Hat 3.4.3-22)) #1 SMP
Sat Mar 19 20:16:43 EST 2005
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000007fff0000 (usable)
 BIOS-e820: 000000007fff0000 - 000000007ffff000 (ACPI data)
 BIOS-e820: 000000007ffff000 - 0000000080000000 (ACPI NVS)
 BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved)
kernel direct mapping tables upto ffff810100000000 @ 8000-d000
Scanning NUMA topology in Northbridge 24
Number of nodes 2
Skipping disabled node 0
Node 1 MemBase 0000000000000000-000000007fff0000
Using node hash shift of 24
Bootmem setup node 1 0000000000000000-000000007fff0000
PANIC: early exception rip ffffffff8020427b error 0 cr2 0
...

The address ffffffff8020427b is:

ffffffff80204210 T find_next_zero_bit
ffffffff802042a0 T find_next_bit

Is this information enough for catching the bug? I'll do my best to
gather more info although it's a production server.

-- 
Greetings,
   Andrew

P.S. Please Cc: to me.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux