Re: [discuss] Re: x86_64: 2.6.14-rc4 swiotlb broken

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello. Linus-san.

> NOTE! Even if the machine has 4GB or more of memory, it's entirely likely 
> that the quick "use NODE(0)" hack will work fine. 
> 
> Why? Because the bootmem memory should still be allocated low-to-high by 
> default, which means that as logn as NODE(0) has _enough_ memory in the 
> DMA range, we should be ok.
> 
> So I _think_ the simple one-liner NODE(0) patch is sufficient, and should 
> work (and is a lot more acceptable for 2.6.14 than switching the node 
> ordering around yet again, or doing bigger surgery on the bootmem code).
> 
> So the only thing that worried me (and made me ask whether there might be 
> machines where it doesn't work) is if some machines might have their high 
> memory (or no memory at all) on NODE(0). It does sound unlikely, but I 
> simple don't know what kind of strange NUMA configs there are out there.
> 
> And I'm definitely only interested in machines that are out there, not 
> some theoretical issues.

In our making IA64 machine node 0 might not have any low-memory, and
another node can have low-memory instead.

This cause comes from hotplug whole of one node.
For example, please imagine following case.

1) In this case, firmware remembers pxm 1's node has low memory.

                 node 0             node 1 
               +--------------+  +-----------+
               |  pxm = 1     |  | pxm = 2   |
               |  low memory  |  |           |
               +--------------+  +-----------+


2) If one node is hot-added at pxm = 0 (pxm is decided from physical
   locate by firmware.), new node will be node 2.

  node 2          node 0          node 1 
+-----------+  +--------------+  +-----------+
| pxm = 0   |  |  pxm = 1     |  | pxm = 2   |
|           |  |  low memory  |  |           |
+-----------+  +--------------+  +-----------+

3) If user reboots the machine, Linux decides node id from pxm's order.
   But firmware still remembers which node has low memory.
   So, node 0 will not have any low memory.

  node 0          node 1          node 2
+-----------+  +--------------+  +-----------+
| pxm = 0   |  |  pxm = 1     |  | pxm = 2   |
|           |  |  low memory  |  |           |
+-----------+  +--------------+  +-----------+

So, just "use NODE(0)" is not enough hack for our machine.
If "use NODE(0)" is selected, kernel must sort pgdat link and
node id by memory address. I think that hot add code will be a 
bit messy instead.

Thanks.

-- 
Yasunori Goto 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux