Andrew,
This one fixes a problem that actual cpuset users are seeing
crash their system. I hope it will be pushed along fairly soon.
There is a race in the kernel cpuset code, between the code
to handle notify_on_release, and the code to remove a cpuset.
The notify_on_release code can end up trying to access a
cpuset that has been removed. In the most common case, this
causes a NULL pointer dereference from the routine cpuset_path.
However all manner of bad things are possible, in theory at least.
The existing code decrements the cpuset use count, and if the
count goes to zero, processes the notify_on_release request,
if appropriate. However, once the count goes to zero, unless we
are holding the global cpuset_sem semaphore, there is nothing to
stop another task from immediately removing the cpuset entirely,
and recycling its memory.
The obvious fix would be to always hold the cpuset_sem
semaphore while decrementing the use count and dealing with
notify_on_release. However we don't want to force a global
semaphore into the mainline task exit path, as that might create
a scaling problem.
The actual fix is almost as easy - since this is only an issue
for cpusets using notify_on_release, which the top level big
cpusets don't normally need to use, only take the cpuset_sem
for cpusets using notify_on_release.
This code has been run for hours without a hiccup, while running
a cpuset create/destroy stress test that could crash the existing
kernel in seconds. This patch applies to the current -linus
git kernel.
Signed-off-by: Paul Jackson <[email protected]>
Index: 2.6-cpuset_path_fix/kernel/cpuset.c
===================================================================
--- 2.6-cpuset_path_fix.orig/kernel/cpuset.c 2005-05-25 19:20:27.000000000 -0700
+++ 2.6-cpuset_path_fix/kernel/cpuset.c 2005-05-26 00:50:32.000000000 -0700
@@ -166,9 +166,8 @@ static struct super_block *cpuset_sb = N
* The hooks from fork and exit, cpuset_fork() and cpuset_exit(), don't
* (usually) grab cpuset_sem. These are the two most performance
* critical pieces of code here. The exception occurs on exit(),
- * if the last task using a cpuset exits, and the cpuset was marked
- * notify_on_release. In that case, the cpuset_sem is taken, the
- * path to the released cpuset calculated, and a usermode call made
+ * when a task in a notify_on_release cpuset exits. Then cpuset_sem
+ * is taken, and if the cpuset count is zero, a usermode call made
* to /sbin/cpuset_release_agent with the name of the cpuset (path
* relative to the root of cpuset file system) as the argument.
*
@@ -1404,6 +1403,18 @@ void cpuset_fork(struct task_struct *tsk
*
* Description: Detach cpuset from @tsk and release it.
*
+ * Note that cpusets marked notify_on_release force every task
+ * in them to take the global cpuset_sem semaphore when exiting.
+ * This could impact scaling on very large systems. Be reluctant
+ * to use notify_on_release cpusets where very high task exit
+ * scaling is required on large systems.
+ *
+ * Don't even think about derefencing 'cs' after the cpuset use
+ * count goes to zero, except inside a critical section guarded
+ * by the cpuset_sem semaphore. If you don't hold cpuset_sem,
+ * then a zero cpuset use count is a license to any other task to
+ * nuke the cpuset immediately.
+ *
**/
void cpuset_exit(struct task_struct *tsk)
@@ -1415,10 +1426,13 @@ void cpuset_exit(struct task_struct *tsk
tsk->cpuset = NULL;
task_unlock(tsk);
- if (atomic_dec_and_test(&cs->count)) {
+ if (notify_on_release(cs)) {
down(&cpuset_sem);
- check_for_release(cs);
+ if (atomic_dec_and_test(&cs->count))
+ check_for_release(cs);
up(&cpuset_sem);
+ } else {
+ atomic_dec(&cs->count);
}
}
--
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]