I'm probably going to be ok with this ... after a bit.
1) First concern - my primary issue:
One thing I really want to change, the name of the per-cpuset file
that controls this option. You call it "interleave_over_allowed".
Take a look at the existing per-cpuset file names:
$ grep 'name = "' kernel/cpuset.c
.name = "cpuset",
.name = "cpus",
.name = "mems",
.name = "cpu_exclusive",
.name = "mem_exclusive",
.name = "sched_load_balance",
.name = "memory_migrate",
.name = "memory_pressure_enabled",
.name = "memory_pressure",
.name = "memory_spread_page",
.name = "memory_spread_slab",
.name = "cpuset",
The name of every memory related option starts with "mem" or "memory",
and the name of every memory interleave related option starts with
"memory_spread_*".
Can we call this "memory_spread_user" instead, or something else
matching "memory_spread_*" ?
The names of things in the public API's are a big issue of mine.
2) Second concern - lessor code clarity issue:
The logic surrounding current_cpuset_interleaved_mems() seems a tad
opaque to me. It appears on the surface as if the memory policy code,
in mm/mempolicy.c, is getting a nodemask from the cpuset code by
calling this routine, as if there were an independent per-cpuset
nodemask stating over what nodes to interleave for MPOL_INTERLEAVE.
But all that is returned is either (1) an empty node mask or (2) the
current tasks allowed cpu mask. If an empty mask is returned, this
tells the MPOL_INTERLEAVE code to use the mask the user specified in
an earlier set_mempolicy MPOL_INTERLEAVE call. If a non-empty mask
is returned, then the previous user specified mask is ignored and
that non-empty mask (just all the current cpusets allowed nodes) is
used instead.
Restating this in pseudo code, from your patch, the mempolicy.c
MPOL_INTERLEAVE code to rebind memory policies after a cpuset
changes reads:
tmp = current_cpuset_interleaved_mems();
if tmp empty:
rebind over tmp (all the cpusets allowed nodes)
break;
rebind over the set_mempolicy MPOL_INTERLEAVE specified mask
break;
The above code is assymmetric, and the returning of a nodemask is
an illusion, suggesting that cpusets might have an interleaved
nodemask separate from the allowed memory nodemask.
How about instead of your current_cpuset_interleaved_mems() routine
that returns a nodemask, rather have a routine that returns a Boolean,
indicating whether this new flag is set, used as in:
if (cpuset_is_memory_spread_user())
tmp = cpuset_current_mems_allowed();
else
nodes_remap(tmp, pol->v.nodes, *mpolmask, *newmask);
pol->v.nodes = tmp;
I'll wager this saves a few bytes of kernel text space as well.
3) Maybe I haven't had enough caffiene yet third issue:
The existing kernel code for mm/mempolicy.c:mpol_rebind_policy()
looks buggy to me. The node_remap() call for the MPOL_INTERLEAVE
case seems like it should come before, not after, updating mpolmask
to the newmask. Fixing that, and consolidating the multiple lines
doing "*mpolmask = *newmask" for each case, into a single such line
at the end of the switch(){} statement, results in the following
patch. Could you confirm my suspicions and push this one too.
It should be a part of your patch set, so we don't waste Andrew's
time resolving the inevitable patch collisions we'll see otherwise.
--- 2.6.23-mm1.orig/mm/mempolicy.c 2007-10-16 18:55:34.745039423 -0700
+++ 2.6.23-mm1/mm/mempolicy.c 2007-10-25 18:06:08.474742762 -0700
@@ -1741,14 +1741,12 @@ static void mpol_rebind_policy(struct me
case MPOL_INTERLEAVE:
nodes_remap(tmp, pol->v.nodes, *mpolmask, *newmask);
pol->v.nodes = tmp;
- *mpolmask = *newmask;
current->il_next = node_remap(current->il_next,
*mpolmask, *newmask);
break;
case MPOL_PREFERRED:
pol->v.preferred_node = node_remap(pol->v.preferred_node,
*mpolmask, *newmask);
- *mpolmask = *newmask;
break;
case MPOL_BIND: {
nodemask_t nodes;
@@ -1773,13 +1771,14 @@ static void mpol_rebind_policy(struct me
kfree(pol->v.zonelist);
pol->v.zonelist = zonelist;
}
- *mpolmask = *newmask;
break;
}
default:
BUG();
break;
}
+
+ *mpolmask = *newmask;
}
/*
Thanks.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]