>From Christoph Lameter <[email protected]>
Make alloc_pages_node() ignore cpusets.
Currently alloc_pages_node() obeys cpusets. If you ask for a page
on a node outside the current tasks cpuset, you will be forced to
take a page within your cpuset instead.
Several kernel mechanisms use alloc_pages_node(), directly or
indirectly, including the numa-aware slab allocator, various device
and bus controller drivers, the page migration facility, the hugetlb
allocator, node local data, numa aware block io scheduler, per-node
mmtimers, per-node network buffers, per-node oprofile buffers, memory
pools, some netfilter counters, and any other caller of kmalloc_node()
or vmalloc_node().
These mechanisms are expecting to get memory on the node they asked
for, regardless of user imposed cpuset memory placement constraints.
This patch adds a __GFP_NOCPUSET flag to disable cpuset memory
placement. It is set in alloc_pages_node() and checked in
__cpuset_zone_allowed(). The routine alloc_pages_node() is the
common routine that all node-specific allocation calls resolve to, and
__cpuset_zone_allowed() is called from the hook beneath __alloc_pages()
to enforce cpuset memory constraints.
Signed-off-by: Christoph Lameter <[email protected]>
Signed-off-by: Paul Jackson <[email protected]>
---
Andrew,
This is needed for memory migration to work if it is invoked from
a task that is cpuset memory constrained. Without this, writing a
cpusets 'mems' file (when its memory_migrate flag is set '1') from
a task that is in some limited cpuset (not all memory nodes allowed)
causes the migration to go to the memory nodes in that writing tasks
cpuset, not to the requested memory nodes in the 'mems' value written.
I recommend it as a fix in 2.6.16. -pj
include/linux/gfp.h | 6 ++++++
kernel/cpuset.c | 2 ++
2 files changed, 8 insertions(+)
--- 2.6.16-rc6.orig/include/linux/gfp.h 2006-03-13 20:19:30.000000000 -0800
+++ 2.6.16-rc6/include/linux/gfp.h 2006-03-17 21:52:03.000000000 -0800
@@ -47,6 +47,7 @@ struct vm_area_struct;
#define __GFP_ZERO ((__force gfp_t)0x8000u)/* Return zeroed page on success */
#define __GFP_NOMEMALLOC ((__force gfp_t)0x10000u) /* Don't use emergency reserves */
#define __GFP_HARDWALL ((__force gfp_t)0x20000u) /* Enforce hardwall cpuset memory allocs */
+#define __GFP_NOCPUSET ((__force gfp_t)0x40000u)/* Ignore cpuset constraints */
#define __GFP_BITS_SHIFT 20 /* Room for 20 __GFP_FOO bits */
#define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
@@ -113,6 +114,11 @@ static inline struct page *alloc_pages_n
/* Unknown node is current node */
if (nid < 0)
nid = numa_node_id();
+ /*
+ * Specified (or implied by nid < 0) node overrides cpuset placement.
+ * Various slab, page and device node specific allocations need this.
+ */
+ gfp_mask |= __GFP_NOCPUSET;
return __alloc_pages(gfp_mask, order,
NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_mask));
--- 2.6.16-rc6.orig/kernel/cpuset.c 2006-03-13 20:19:36.000000000 -0800
+++ 2.6.16-rc6/kernel/cpuset.c 2006-03-17 21:52:18.000000000 -0800
@@ -2164,6 +2164,8 @@ int __cpuset_zone_allowed(struct zone *z
node = z->zone_pgdat->node_id;
if (node_isset(node, current->mems_allowed))
return 1;
+ if (gfp_mask & __GFP_NOCPUSET)
+ return 1;
if (gfp_mask & __GFP_HARDWALL) /* If hardwall request, stop here */
return 0;
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373, 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]