Re: [PATCH 4/6] Have x86_64 use add_active_range() and free_area_init_nodes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 31 Aug 2006, Keith Mannthey wrote:

On 8/31/06, Mel Gorman <[email protected]> wrote:
On Thu, 31 Aug 2006, Keith Mannthey wrote:
> On 8/31/06, Mel Gorman <[email protected]> wrote:
>> On (30/08/06 13:57), Keith Mannthey didst pronounce:
>> > On 8/21/06, Mel Gorman <[email protected]> wrote:
>> > >

Can you confirm that happens by applying the patch I sent to you and
checking the output? When the reserve fails, it should print out what
range it actually checked. I want to be sure it's not checking the
addresses 0->0x1070000000

See below


Perfect, thanks a lot. I should have enough to reproduce without a test machine what is going on and develop the required patches.

>> > >@@ -329,6 +330,8 @@ acpi_numa_memory_affinity_init(struct ac
>> > >
>> > >        printk(KERN_INFO "SRAT: Node %u PXM %u %Lx-%Lx\n", node, pxm,
>> > >               nd->start, nd->end);
>> > >+       e820_register_active_regions(node, nd->start >> PAGE_SHIFT,
>> > >+ nd->end >> PAGE_SHIFT);
>> >
>> > A node chunk in this section of code may be a hot-pluggable zone. With
>> > MEMORY_HOTPLUG_SPARSE we don't want to register these regions.
>> >
>>
>> The ranges should not get registered as active memory by
>> e820_register_active_regions() unless they are marked E820_RAM. My
>> understanding is that the regions for hotadd would be marked "reserved"
>> in the e820 map. Is that wrong?
>
> This is wrong.  In a mult-node system that last node add area will not
> be marked reserved by the e820.  The e820 only defines memory <
> end_pfn. the last node add area is > end_pfn.
>

ok, that should still be fine. As long as the ranges are not marked
"usable", add_active_range() will not be called and the holes should be
counted correctly with the patch I sent you.

> With RESERVE based add-memory you want the add-areas repored by the
> srat to be setup during boot like all the other pages.
>

So, do you actally expect a lot of unused mem_map to be allocated with
struct pages that are inactive until memory is hot-added in an
x86_64-specific manner? The arch-independent stuff currently will not do
that. It sets up memmap for where memory really exists. If that is not
what you expect, it will hit issues at hotadd time which is not the
current issue but one that can be fixed.

Yes. RESERVED based is a big waste of mem_map space.

Right, it's all very clear now. At some point in the future, I'd like to visit why SPARSEMEM-based hot-add is not always used but it's a separate issue.

The add areas
are marked as RESERVED during boot and then later onlined during add.

That explains the reserve_bootmem_node()

It might be ok.  I will play with tomorrow.  I might just need to
call add_active_range in the right spot :)


I'll play with this from the opposite perspective - what is required for any arch using this API to have spare memmap for reserve-based hot-add.

>> > > if (ma->flags.hot_pluggable && !reserve_hotadd(node, start, end)
>> <
>> > >        0) {
>> > >                /* Ignore hotadd region. Undo damage */
>> >
>> >  I have but the e820_register_active_regions as a else to this
>> > statment the absent pages check fails.
>> >
>>
>> The patch below omits this change because I think
>> e820_register_active_regions() will still have got called by the time
>> you encounter a hotplug area.
>
> called but then removed in setup arch.

By "removed", I assume you mean the active regions removed by the call
to remove_all_active_regions() in setup_arch(). Before reserve_hotadd() is
called, e820_register_active_regions() will have reregistered the active
regions with the NUMA node id.

I see e820_register_active_regions is acting as a filter against the e820


Yes. A range of pfn's in given to e820_register_active_regions() and it reads the e820 for E820_RAM sections within that range.

>> > Also nodes_cover_memory and alot of these check were based against
>> > comparing the srat data against the e820.  Now all this code is
>> > comparing SRAT against SRAT....
>> >
>>
>> I don't see why. The SRAT table passes a range to
>> e820_register_active_regions() so should be comparing SRAT to e820
>
> let me go off and look at e820_register_active_regions() some more.
Things get clear :)

Should be ok.


nice.

> Sure thing.  It is just the hot-add area I am guessing it is an off by
> one error of some sort.
>
See below. I do my e820_register_active_area as an else to to if
(hotplug.....!reserve) and the prink is easy to sort out.


yep, it should be easy to put this into a test program.

I see your pfn are in base 10.  Looks like it considers the last
addres to be a present page. (off by one thing).


Probably. I'll start poking now. Thanks a lot.

Thanks,
Keith

Output below
disabling early console
Linux version 2.6.18-rc4-mm3-smp (root@elm3a153) (gcc version 4.1.0
(SUSE Linux)) #6 SMP Thu Aug 31 22:06:00 EDT 2006
Command line: root=/dev/sda3
ip=9.47.66.153:9.47.66.169:9.47.66.1:255.255.255.0 resume=/dev/sda2
showopts earlyprintk=ttyS0,115200 console=ttyS0,115200 console=tty0
debug numa=hotadd=100
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 0000000000098400 (usable)
BIOS-e820: 0000000000098400 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000007ff85e00 (usable)
BIOS-e820: 000000007ff85e00 - 000000007ff98880 (ACPI data)
BIOS-e820: 000000007ff98880 - 0000000080000000 (reserved)
BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
BIOS-e820: 0000000100000000 - 0000000470000000 (usable)
BIOS-e820: 0000001070000000 - 0000001160000000 (usable)
Entering add_active_range(0, 0, 152) 0 entries of 3200 used
Entering add_active_range(0, 256, 524165) 1 entries of 3200 used
Entering add_active_range(0, 1048576, 4653056) 2 entries of 3200 used
Entering add_active_range(0, 17235968, 18219008) 3 entries of 3200 used
end_pfn_map = 18219008
DMI 2.3 present.
ACPI: RSDP (v000 IBM                                   ) @ 0x00000000000fdcf0
ACPI: RSDT (v001 IBM    EXA01ZEU 0x00001000 IBM  0x45444f43) @
0x000000007ff98800
ACPI: FADT (v001 IBM    EXA01ZEU 0x00001000 IBM  0x45444f43) @
0x000000007ff98780
ACPI: MADT (v001 IBM    EXA01ZEU 0x00001000 IBM  0x45444f43) @
0x000000007ff98600
ACPI: SRAT (v001 IBM    EXA01ZEU 0x00001000 IBM  0x45444f43) @
0x000000007ff983c0
ACPI: HPET (v001 IBM    EXA01ZEU 0x00001000 IBM  0x45444f43) @
0x000000007ff98380
ACPI: SSDT (v001 IBM    VIGSSDT0 0x00001000 INTL 0x20030122) @
0x000000007ff90780
ACPI: SSDT (v001 IBM    VIGSSDT1 0x00001000 INTL 0x20030122) @
0x000000007ff88bc0
ACPI: DSDT (v001 IBM    EXA01ZEU 0x00001000 INTL 0x20030122) @
0x0000000000000000
SRAT: PXM 0 -> APIC 0 -> Node 0
SRAT: PXM 0 -> APIC 1 -> Node 0
SRAT: PXM 0 -> APIC 2 -> Node 0
SRAT: PXM 0 -> APIC 3 -> Node 0
SRAT: PXM 0 -> APIC 38 -> Node 0
SRAT: PXM 0 -> APIC 39 -> Node 0
SRAT: PXM 0 -> APIC 36 -> Node 0
SRAT: PXM 0 -> APIC 37 -> Node 0
SRAT: PXM 1 -> APIC 64 -> Node 1
SRAT: PXM 1 -> APIC 65 -> Node 1
SRAT: PXM 1 -> APIC 66 -> Node 1
SRAT: PXM 1 -> APIC 67 -> Node 1
SRAT: PXM 1 -> APIC 102 -> Node 1
SRAT: PXM 1 -> APIC 103 -> Node 1
SRAT: PXM 1 -> APIC 100 -> Node 1
SRAT: PXM 1 -> APIC 101 -> Node 1
SRAT: Node 0 PXM 0 0-80000000
Entering add_active_range(0, 0, 152) 0 entries of 3200 used
Entering add_active_range(0, 256, 524165) 1 entries of 3200 used
SRAT: Node 0 PXM 0 0-470000000
Entering add_active_range(0, 0, 152) 2 entries of 3200 used
Entering add_active_range(0, 256, 524165) 2 entries of 3200 used
Entering add_active_range(0, 1048576, 4653056) 2 entries of 3200 used
SRAT: Node 0 PXM 0 0-1070000000
reserve_hotadd called with node 0 sart 470000000 end 1070000000
SRAT: Hotplug area has existing memory
Entering add_active_range(0, 0, 152) 3 entries of 3200 used
Entering add_active_range(0, 256, 524165) 3 entries of 3200 used
Entering add_active_range(0, 1048576, 4653056) 3 entries of 3200 used
SRAT: Node 1 PXM 1 1070000000-1160000000
Entering add_active_range(1, 17235968, 18219008) 3 entries of 3200 used
SRAT: Node 1 PXM 1 1070000000-3200000000
reserve_hotadd called with node 1 sart 1160000000 end 3200000000
SRAT: Hotplug area has existing memory
Entering add_active_range(1, 17235968, 18219008) 4 entries of 3200 used
NUMA: Using 28 for the hash shift.
Bootmem setup node 0 0000000000000000-0000001070000000
Bootmem setup node 1 0000001070000000-0000001160000000
Zone PFN ranges:
DMA             0 ->     4096
DMA32        4096 ->  1048576
Normal    1048576 -> 18219008
early_node_map[4] active PFN ranges
  0:        0 ->      152
  0:      256 ->   524165
  0:  1048576 ->  4653056
  1: 17235968 -> 18219008
On node 0 totalpages: 4128541
0 pages used for SPARSE memmap
1149 pages DMA reserved
DMA zone: 2843 pages, LIFO batch:0
0 pages used for SPARSE memmap
DMA32 zone: 520069 pages, LIFO batch:31
0 pages used for SPARSE memmap
Normal zone: 3604480 pages, LIFO batch:31
On node 1 totalpages: 983040
0 pages used for SPARSE memmap
0 pages used for SPARSE memmap
0 pages used for SPARSE memmap
Normal zone: 983040 pages, LIFO batch:31


--
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux