Re: [Bug] [2.6.18-rc5-mm1] system no boot early death x86_64

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 7 Sep 2006, keith mannthey wrote:

On Thu, 2006-09-07 at 11:20 -0700, keith mannthey wrote:
Hello,
  I was booting rc4-mm3.  With rc5-mm1 I am hanging early... Mel I don't
know if this is related to your code but I will soon know. (I don't get
your debug info in early console.)
  I was working on patches for the reserve based memory hot add path in
srat.c (the initial error is fixed by Mels patches but there is more to
do)

That is some good news at least.

and was just moving to rc5-mm1 to sync up and then more trouble.
This is with reserve based hot-add not enabled at the command line.


Well this isn't fully adding up but here is what I found.

If I drop
x86_64-mm-drop-640k-reservation.patch
x86_64-mm-remove-e820-fallback.patch
and
x86_64-mm-remove-e820-fallback-fix.patch

I build and boot.  All files in the series upto x86_64-mm-drop-640k-
reservation.patch work just fine.  Dropping this patch makes things
better. The e820 patches were removed to make the rest of the series
apply.


I am having trouble reproducing this. However, I recently got access to a machine similar to yours. I can say that sometimes the stability of 2.6.18-rc4-mm3 and 2.6.18-rc5-mm1 was totally useless (but the symptons different to yours) and the box would easily crash for reasons I could not pin down. As stability problems had been reported on the machine earlier by other users, I was inclined to blame the hardware. Now I'm not sure.

It is not clear what changes would cause me to die setting up the
bootmem allocator on my first node...


Unless your machine really has something special in the low 640K that is required and bad things happen if it's written to at a bad time.

I know x86_64-mm-drop-640k-reservation.patch has been around for a
while.

any ideas?


None so far, I'll keep hitting the machine I have to see if I can find something more useful but I'm not very optimistic I'll pin it down.

Thanks,
 Keith

(from a working boot)

disabling early console
Linux version 2.6.18-rc5-mm1-smp (root@elm3a153) (gcc version 4.1.0
(SUSE Linux)) #13 SMP Thu Sep 7 19:15:00 EDT 2006
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 0000000000098400 (usable)
BIOS-e820: 0000000000098400 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000007ff85e00 (usable)
BIOS-e820: 000000007ff85e00 - 000000007ff98880 (ACPI data)
BIOS-e820: 000000007ff98880 - 0000000080000000 (reserved)
BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
BIOS-e820: 0000000100000000 - 0000000470000000 (usable)
BIOS-e820: 0000001070000000 - 0000001160000000 (usable)
Entering add_active_range(0, 0, 152) 0 entries of 3200 used
Entering add_active_range(0, 256, 524165) 1 entries of 3200 used
Entering add_active_range(0, 1048576, 4653056) 2 entries of 3200 used
Entering add_active_range(0, 17235968, 18219008) 3 entries of 3200 used
end_pfn_map = 18219008
DMI 2.3 present.
ACPI: RSDP (v000 IBM                                   ) @
0x00000000000fdcf0
ACPI: RSDT (v001 IBM    EXA01ZEU 0x00001000 IBM  0x45444f43) @
0x000000007ff98800
ACPI: FADT (v001 IBM    EXA01ZEU 0x00001000 IBM  0x45444f43) @
0x000000007ff98780
ACPI: MADT (v001 IBM    EXA01ZEU 0x00001000 IBM  0x45444f43) @
0x000000007ff98600
ACPI: SRAT (v001 IBM    EXA01ZEU 0x00001000 IBM  0x45444f43) @
0x000000007ff983c0
ACPI: HPET (v001 IBM    EXA01ZEU 0x00001000 IBM  0x45444f43) @
0x000000007ff98380
ACPI: SSDT (v001 IBM    VIGSSDT0 0x00001000 INTL 0x20030122) @
0x000000007ff90780
ACPI: SSDT (v001 IBM    VIGSSDT1 0x00001000 INTL 0x20030122) @
0x000000007ff88bc0
ACPI: DSDT (v001 IBM    EXA01ZEU 0x00001000 INTL 0x20030122) @
0x0000000000000000
SRAT: PXM 0 -> APIC 0 -> Node 0
SRAT: PXM 0 -> APIC 1 -> Node 0
SRAT: PXM 0 -> APIC 2 -> Node 0
SRAT: PXM 0 -> APIC 3 -> Node 0
SRAT: PXM 0 -> APIC 38 -> Node 0
SRAT: PXM 0 -> APIC 39 -> Node 0
SRAT: PXM 0 -> APIC 36 -> Node 0
SRAT: PXM 0 -> APIC 37 -> Node 0
SRAT: PXM 1 -> APIC 64 -> Node 1
SRAT: PXM 1 -> APIC 65 -> Node 1
SRAT: PXM 1 -> APIC 66 -> Node 1
SRAT: PXM 1 -> APIC 67 -> Node 1
SRAT: PXM 1 -> APIC 102 -> Node 1
SRAT: PXM 1 -> APIC 103 -> Node 1
SRAT: PXM 1 -> APIC 100 -> Node 1
SRAT: PXM 1 -> APIC 101 -> Node 1
SRAT: Node 0 PXM 0 0-80000000
Entering add_active_range(0, 0, 152) 0 entries of 3200 used
Entering add_active_range(0, 256, 524165) 1 entries of 3200 used
SRAT: Node 0 PXM 0 0-470000000
Entering add_active_range(0, 0, 152) 2 entries of 3200 used
Entering add_active_range(0, 256, 524165) 2 entries of 3200 used
Entering add_active_range(0, 1048576, 4653056) 2 entries of 3200 used
SRAT: Node 1 PXM 1 1070000000-1160000000
Entering add_active_range(1, 17235968, 18219008) 3 entries of 3200 used
NUMA: Using 36 for the hash shift.
Bootmem setup node 0 0000000000000000-0000000470000000
Bootmem setup node 1 0000001070000000-0000001160000000
Zone PFN ranges:
 DMA             0 ->     4096
 DMA32        4096 ->  1048576
 Normal    1048576 -> 18219008
early_node_map[4] active PFN ranges
   0:        0 ->      152
   0:      256 ->   524165
   0:  1048576 ->  4653056
   1: 17235968 -> 18219008
On node 0 totalpages: 4128541




--
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux