Re: assert/crash in __rmqueue() when enabling CONFIG_NUMA

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Nick Piggin wrote:	[Tue May 02 2006, 10:25:57AM EDT]
> Martin J. Bligh wrote:
> >>Oh that's a 32bit kernel. I don't think the 32bit NUMA has ever worked
> >>anywhere but some Summit systems (at least every time I tried it it 
> >>blew up on me and nobody seems to use it regularly). Maybe it would be 
> >>finally time to mark it CONFIG_BROKEN though or just remove it (even 
> >>by design it doesn't work very well) 
> >
> >
> >Bollocks. It works fine, and is tested every single day, on every git
> >release, and every -mm tree.
> 
> Whatever the case, there definitely does not appear to be sufficient
> zone alignment enforced for the buddy allocator. I cannot see how it
> could work if zones are not aligned on 4MB boundaries.
> 
> Maybe some architectures / subarch code naturally does this for us,
> but Ingo is definitely hitting this bug because his config does not
> (align, that is).
> 
> I've randomly added a couple more cc's.
> 
The patch below isn't compile tested or correct for those cases where
alloc_remap is called or where arch code has allocated node_mem_map for
CONFIG_FLAT_NODE_MEM_MAP. It's just conveying what I believe the issue is.

Andy added code to buddy allocator which doesn't require the zone's endpoints
to be aligned to MAX_ORDER. I think the issue is that the buddy
allocator requires the node_mem_map's endpoints to be MAX_ORDER aligned. 
Otherwise __page_find_buddy could compute a buddy not in node_mem_map
for partial MAX_ORDER regions at zone's endpoints. page_is_buddy will
detect that these pages at endpoints aren't PG_buddy (they were zeroed
out by bootmem allocator and not part of zone).  Of course the negative
here is we could waste a little memory but the positive is eliminating
all the old checks for zone boundary conditions.

SPARSEMEM won't encounter this issue because of MAX_ORDER size
constraint when SPARSEMEM is configured. ia64 VIRTUAL_MEM_MAP doesn't
need the logic either because the holes and endpoints are handled
differently.  This leaves checking alloc_remap and other arches which
privately allocate for node_mem_map.

Any how I could be totally wrong but like I said this requires more
thought.

bob


Index: linux-2.6.17-rc3/mm/page_alloc.c
===================================================================
--- linux-2.6.17-rc3.orig/mm/page_alloc.c	2006-04-27 09:44:02.000000000 -0400
+++ linux-2.6.17-rc3/mm/page_alloc.c	2006-05-03 14:50:13.000000000 -0400
@@ -2123,14 +2123,23 @@ static void __init alloc_node_mem_map(st
 #ifdef CONFIG_FLAT_NODE_MEM_MAP
 	/* ia64 gets its own node_mem_map, before this, without bootmem */
 	if (!pgdat->node_mem_map) {
-		unsigned long size;
+		unsigned long size, start, end;
 		struct page *map;
 
-		size = (pgdat->node_spanned_pages + 1) * sizeof(struct page);
+		/*
+		 * The zone's endpoints aren't required to be MAX_ORDER
+		 * aligned but the node_mem_map endpoints must be in order
+		 * for the buddy allocator to function correctly.
+		 */
+		start = pgdat->node_start_pfn & ~((1 << (MAX_ORDER - 1)) - 1);
+		end = start + pgdat->node_spanned_pages;
+		end = (end + ((1 << (MAX_ORDER - 1)) - 1) &
+			~((1 << (MAX_ORDER - 1)) - 1);
+		size =  (end - start) * sizeof(struct page);
 		map = alloc_remap(pgdat->node_id, size);
 		if (!map)
 			map = alloc_bootmem_node(pgdat, size);
-		pgdat->node_mem_map = map;
+		pgdat->node_mem_map = map + ( pgdat->node_start_pfn - start);
 	}
 #ifdef CONFIG_FLATMEM
 	/*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux