[PATCH] x86_64: NUMA range fixes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The current x86_64 NUMA memory code is inconsequent when it comes to node
memory ranges. The exact behaviour varies depending on which config option
that is used.

setup_node_bootmem() has start and end as arguments and these are used to 
calculate the size of the node like this: (end - start). This is all fine
if end is pointing to the first non-available byte. The problem is that the
current x86_64 code sometimes treats it as the last present byte and sometimes
as the first non-available byte. The result is that some configurations might
lose a page at the end of the range.

This patch tries to fix CONFIG_ACPI_NUMA, CONFIG_K8_NUMA and CONFIG_NUMA_EMU
so they all treat the end variable as the first non-available byte. This is
the same way as the single node code.

The patch is boot tested on dual x86_64 hardware with the above configurations,
but maybe the removed code is needed as some workaround?

Signed-off-by: Magnus Damm <[email protected]>
---

BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000007fff0000 (usable)
 BIOS-e820: 000000007fff0000 - 000000007ffff000 (ACPI data)
 BIOS-e820: 000000007ffff000 - 0000000080000000 (ACPI NVS)
 BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved)

CONFIG_ACPI_NUMA:
-----------------
(without patch)
Bootmem setup node 0 0000000000000000-000000003fffffff
Bootmem setup node 1 0000000040000000-000000007ffeffff
On node 0 totalpages: 262046
  DMA zone: 3999 pages, LIFO batch:1
  Normal zone: 258047 pages, LIFO batch:31
  HighMem zone: 0 pages, LIFO batch:1
On node 1 totalpages: 262127
  DMA zone: 0 pages, LIFO batch:1
  Normal zone: 262127 pages, LIFO batch:31
  HighMem zone: 0 pages, LIFO batch:1
...
(with patch)
Bootmem setup node 0 0000000000000000-0000000040000000
Bootmem setup node 1 0000000040000000-000000007fff0000
On node 0 totalpages: 262047
  DMA zone: 3999 pages, LIFO batch:1
  Normal zone: 258048 pages, LIFO batch:31
  HighMem zone: 0 pages, LIFO batch:1
On node 1 totalpages: 262128
  DMA zone: 0 pages, LIFO batch:1
  Normal zone: 262128 pages, LIFO batch:31
  HighMem zone: 0 pages, LIFO batch:1
...

CONFIG_K8_NUMA:
---------------
(without patch)
Bootmem setup node 0 0000000000000000-000000003fffffff
Bootmem setup node 1 0000000040000000-000000007fff0000
On node 0 totalpages: 262046
  DMA zone: 3999 pages, LIFO batch:1
  Normal zone: 258047 pages, LIFO batch:31
  HighMem zone: 0 pages, LIFO batch:1
On node 1 totalpages: 262128
  DMA zone: 0 pages, LIFO batch:1
  Normal zone: 262128 pages, LIFO batch:31
  HighMem zone: 0 pages, LIFO batch:1
...
(with patch)
Bootmem setup node 0 0000000000000000-0000000040000000
Bootmem setup node 1 0000000040000000-000000007fff0000
On node 0 totalpages: 262047
  DMA zone: 3999 pages, LIFO batch:1
  Normal zone: 258048 pages, LIFO batch:31
  HighMem zone: 0 pages, LIFO batch:1
On node 1 totalpages: 262128
  DMA zone: 0 pages, LIFO batch:1
  Normal zone: 262128 pages, LIFO batch:31
  HighMem zone: 0 pages, LIFO batch:1
...

CONFIG_NUMA_EMU: (passing numa=fake=4 to kernel)
----------------
(without patch)
Bootmem setup node 0 0000000000000000-000000000fffffff
Bootmem setup node 1 0000000010000000-000000001fffffff
Bootmem setup node 2 0000000020000000-000000002fffffff
Bootmem setup node 3 0000000030000000-000000007fff0000
On node 0 totalpages: 65438
  DMA zone: 3999 pages, LIFO batch:1
  Normal zone: 61439 pages, LIFO batch:31
  HighMem zone: 0 pages, LIFO batch:1
On node 1 totalpages: 65535
  DMA zone: 0 pages, LIFO batch:1
  Normal zone: 65535 pages, LIFO batch:31
  HighMem zone: 0 pages, LIFO batch:1
On node 2 totalpages: 65535
  DMA zone: 0 pages, LIFO batch:1
  Normal zone: 65535 pages, LIFO batch:31
  HighMem zone: 0 pages, LIFO batch:1
On node 3 totalpages: 327664
  DMA zone: 0 pages, LIFO batch:1
  Normal zone: 327664 pages, LIFO batch:31
  HighMem zone: 0 pages, LIFO batch:1
...
(with patch)
Bootmem setup node 0 0000000000000000-0000000010000000
Bootmem setup node 1 0000000010000000-0000000020000000
Bootmem setup node 2 0000000020000000-0000000030000000
Bootmem setup node 3 0000000030000000-000000007fff0000
On node 0 totalpages: 65439
  DMA zone: 3999 pages, LIFO batch:1
  Normal zone: 61440 pages, LIFO batch:31
  HighMem zone: 0 pages, LIFO batch:1
On node 1 totalpages: 65536
  DMA zone: 0 pages, LIFO batch:1
  Normal zone: 65536 pages, LIFO batch:31
  HighMem zone: 0 pages, LIFO batch:1
On node 2 totalpages: 65536
  DMA zone: 0 pages, LIFO batch:1
  Normal zone: 65536 pages, LIFO batch:31
  HighMem zone: 0 pages, LIFO batch:1
On node 3 totalpages: 327664
  DMA zone: 0 pages, LIFO batch:1
  Normal zone: 327664 pages, LIFO batch:31
  HighMem zone: 0 pages, LIFO batch:1
...

 k8topology.c |    1 +
 numa.c       |    2 --
 srat.c       |    4 ----
 3 files changed, 1 insertion(+), 6 deletions(-)

diff -urNp linux-2.6.14-rc5-git5/arch/x86_64/mm/k8topology.c linux-2.6.14-rc5-git5-x86_64_numa_range_fixes/arch/x86_64/mm/k8topology.c
--- linux-2.6.14-rc5-git5/arch/x86_64/mm/k8topology.c	2005-10-24 15:37:44.000000000 +0900
+++ linux-2.6.14-rc5-git5-x86_64_numa_range_fixes/arch/x86_64/mm/k8topology.c	2005-10-27 17:03:49.000000000 +0900
@@ -108,6 +108,7 @@ int __init k8_scan_nodes(unsigned long s
 		limit >>= 16; 
 		limit <<= 24; 
 		limit |= (1<<24)-1;
+		limit++;
 
 		if (limit > end_pfn << PAGE_SHIFT)
 			limit = end_pfn << PAGE_SHIFT;
diff -urNp linux-2.6.14-rc5-git5/arch/x86_64/mm/numa.c linux-2.6.14-rc5-git5-x86_64_numa_range_fixes/arch/x86_64/mm/numa.c
--- linux-2.6.14-rc5-git5/arch/x86_64/mm/numa.c	2005-10-24 15:37:44.000000000 +0900
+++ linux-2.6.14-rc5-git5-x86_64_numa_range_fixes/arch/x86_64/mm/numa.c	2005-10-27 17:03:53.000000000 +0900
@@ -205,8 +205,6 @@ static int numa_emulation(unsigned long 
  		if (i == numa_fake-1)
  			sz = (end_pfn<<PAGE_SHIFT) - nodes[i].start;
  		nodes[i].end = nodes[i].start + sz;
- 		if (i != numa_fake-1)
- 			nodes[i].end--;
  		printk(KERN_INFO "Faking node %d at %016Lx-%016Lx (%LuMB)\n",
  		       i,
  		       nodes[i].start, nodes[i].end,
diff -urNp linux-2.6.14-rc5-git5/arch/x86_64/mm/srat.c linux-2.6.14-rc5-git5-x86_64_numa_range_fixes/arch/x86_64/mm/srat.c
--- linux-2.6.14-rc5-git5/arch/x86_64/mm/srat.c	2005-10-24 15:37:44.000000000 +0900
+++ linux-2.6.14-rc5-git5-x86_64_numa_range_fixes/arch/x86_64/mm/srat.c	2005-10-27 17:03:55.000000000 +0900
@@ -71,8 +71,6 @@ static __init void cutoff_node(int i, un
 			nd->start = nd->end;
 	}
 	if (nd->end > end) {
-		if (!(end & 0xfff))
-			end--;
 		nd->end = end;
 		if (nd->start > nd->end)
 			nd->start = nd->end;
@@ -166,8 +164,6 @@ acpi_numa_memory_affinity_init(struct ac
 		if (nd->end < end)
 			nd->end = end;
 	}
-	if (!(nd->end & 0xfff))
-		nd->end--;
 	printk(KERN_INFO "SRAT: Node %u PXM %u %Lx-%Lx\n", node, pxm,
 	       nd->start, nd->end);
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux