On Mon, 2005-05-16 at 09:47 -0700, Christoph Lameter wrote:
> On Mon, 16 May 2005, Dave Hansen wrote:
> > There are some broken assumptions in the kernel that
> > CONFIG_DISCONTIG==CONFIG_NUMA. These usually manifest when code assumes
> > that one pg_data_t means one NUMA node.
> >
> > However, NUMA node ids are actually distinct from "discontigmem nodes".
> > A "discontigmem node" is just one physically contiguous area of memory,
> > thus one pg_data_t. Some (non-NUMA) Mac G5's have a gap in their
> > address space, so they get two discontigmem nodes.
>
> I thought the discontigous memory in one node was handled through zones?
> I.e. ZONE_HIGHMEM in i386?
You can only have one zone of each type under each pg_data_t. For
instance, you can't properly represent (DMA, NORMAL, HIGHMEM, <GAP>,
HIGHMEM) in a single pg_data_t without wasting node_mem_map[] space.
The "proper" discontig way of representing that is like this:
pg_data_t[0] (DMA, NORMAL, HIGHMEM)
<GAP>
pg_data_t[1] (---, ------, HIGHMEM)
Where pg_data_t[1] has empty DMA and NORMAL zones. Also, remember that
both of these could theoretically be on the same NUMA node. But, I
don't think we ever do that in practice.
> > So, that #error is bogus. It's perfectly valid to have multiple
> > discontigmem nodes, when the number of NUMA nodes is 1. MAX_NUMNODES
> > refers to discontigmem nodes, not NUMA nodes.
>
> Ok. We looked through the code and saw that the check may be removed
> without causing problems. However, there is still a feeling of uneasiness
> about this.
I don't blame you :)
> To what node does numa_node_id() refer?
That refers to the NUMA node that you're thinking of. Close CPUs and
memory and I/O, etc...
> And it is legit to use
> numa_node_id() to index cpu maps and stuff?
Yes, those are all NUMA nodes.
> How do the concepts of numa node id relate to discontig node ids?
I believe there are quite a few assumptions on some architectures that,
when NUMA is on, they are equivalent. It appears to be pretty much
assumed everywhere that CONFIG_NUMA=y means one pg_data_t per NUMA node.
Remember, as you saw, you can't assume that MAX_NUMNODES=1 when NUMA=n
because of the DISCONTIG=y case.
So, in summary, if you want to do it right: use the
CONFIG_NEED_MULTIPLE_NODES that you see in -mm. As plain DISCONTIG=y
gets replaced by sparsemem any code using this is likely to stay
working.
-- Dave
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]