Hi,
I wrote patches which fixes the problem regarding stop_machine_run() and
cpu hotplug.
stop_machine_run() can't accomplish its work if there is a real time process
on the CPU on which "kstopmachine" kernel thread is running. For more details,
please refer to the following thread:
http://lkml.org/lkml/2007/5/7/41
TEST RESULT:
I did the following test on my ia64 box. It works fine:
-------------------------------------------------------------------------------
# cat loop.sh
while true ; do
:
done
-------------------------------------------------------------------------------
# cat test_stop_machine_run_with_rt_proc.sh
#!/bin/sh
taskset 0x2 chrt -f 98 ./loop.sh &
PID=${!}
echo 0 >/sys/devices/system/cpu/cpu1/online
kill ${PID}
echo 1 >/sys/devices/system/cpu/cpu1/online
-------------------------------------------------------------------------------
To do the test, just issue the following command.
# ./test_stop_machine_run_with_rt_proc.sh
#
TODO list
=========
Some more works are needed. See the TODO list.
- If there is a SCHED_FIFO process having max priority, stop_machine_run doesn't
work because kstopmachine doesn't be scheduled.
-> I'm trying to fix this problem, see the followings:
http://lkml.org/lkml/2007/5/8/620
I would submit RFC patches in 1 weeks.
- On CPU hot removal, if that RT process is migrated to the CPU on which
stop_machine_run() is running, stop_machine_run can't continue to run.
-> I'm trying to fix this problem.
- Other `stop_machine_run() with FIFO` problem might exist.
-> I've not research other subsystem using stop_machine_run yet.
# FYI, I'll be offline for 2 days.
Thanks,
Satoru
---
Fix stop_machine_run() problem with naughty real time process
stop_machine_run() does its work on "kstopmachine" thread having max priority.
However that thread get such priority after woken up. Therefore, in the
following case ...
- "kstopmachine" try to run on CPU1
- There is a real time process which doesn't relinquish CPU time voluntary on CPU1
... "kstopmachine" can't start to run and the CPU on which stop_machine_run() is runing
hangs up. To fix this problem, call sched_setscheduler() before waking up that thread.
Signed-off-by: Satoru Takeuchi <[email protected]>
Index: linux-2.6.21/kernel/stop_machine.c
===================================================================
--- linux-2.6.21.orig/kernel/stop_machine.c 2007-05-11 13:45:34.000000000 +0900
+++ linux-2.6.21/kernel/stop_machine.c 2007-05-11 14:49:17.000000000 +0900
@@ -89,10 +89,6 @@ static void stopmachine_set_state(enum s
static int stop_machine(void)
{
int i, ret = 0;
- struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
-
- /* One high-prio thread per cpu. We'll do this one. */
- sched_setscheduler(current, SCHED_FIFO, ¶m);
atomic_set(&stopmachine_thread_ack, 0);
stopmachine_num_threads = 0;
@@ -184,6 +180,10 @@ struct task_struct *__stop_machine_run(i
p = kthread_create(do_stop, &smdata, "kstopmachine");
if (!IS_ERR(p)) {
+ struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
+
+ /* One high-prio thread per cpu. We'll do this one. */
+ sched_setscheduler(p, SCHED_FIFO, ¶m);
kthread_bind(p, cpu);
wake_up_process(p);
wait_for_completion(&smdata.done);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]