[PATCH 1/2] Fix stop_machine_run problem with naughty real time process

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I wrote patches which fixes the problem regarding stop_machine_run() and
cpu hotplug.

stop_machine_run() can't accomplish its work if there is a real time process
on the CPU on which "kstopmachine" kernel thread is running. For more details,
please refer to the following thread:

  http://lkml.org/lkml/2007/5/7/41

TEST RESULT:

I did the following test on my ia64 box. It works fine:

-------------------------------------------------------------------------------
# cat loop.sh
while true ; do
	:
done
-------------------------------------------------------------------------------
# cat test_stop_machine_run_with_rt_proc.sh
#!/bin/sh

taskset 0x2 chrt -f 98 ./loop.sh &
PID=${!}
echo 0 >/sys/devices/system/cpu/cpu1/online
kill ${PID}
echo 1 >/sys/devices/system/cpu/cpu1/online
-------------------------------------------------------------------------------

To do the test, just issue the following command.

# ./test_stop_machine_run_with_rt_proc.sh
# 

TODO list
=========

Some more works are needed. See the TODO list.

 - If there is a SCHED_FIFO process having max priority, stop_machine_run doesn't
   work because kstopmachine doesn't be scheduled.

     -> I'm trying to fix this problem, see the followings:

        http://lkml.org/lkml/2007/5/8/620

        I would submit RFC patches in 1 weeks.

 - On CPU hot removal, if that RT process is migrated to the CPU on which
   stop_machine_run() is running, stop_machine_run can't continue to run.

     -> I'm trying to fix this problem.

 - Other `stop_machine_run() with FIFO` problem might exist.

     -> I've not research other subsystem using stop_machine_run yet.


# FYI, I'll be offline for 2 days.

Thanks,

Satoru

---
Fix stop_machine_run() problem with naughty real time process

stop_machine_run() does its work on "kstopmachine" thread having max priority.
However that thread get such priority after woken up. Therefore, in the
following case ...

  - "kstopmachine" try to run on CPU1
  - There is a real time process which doesn't relinquish CPU time voluntary on CPU1

... "kstopmachine" can't start to run and the CPU on which stop_machine_run() is runing
hangs up. To fix this problem, call sched_setscheduler() before waking up that thread.

Signed-off-by: Satoru Takeuchi <[email protected]>

Index: linux-2.6.21/kernel/stop_machine.c
===================================================================
--- linux-2.6.21.orig/kernel/stop_machine.c	2007-05-11 13:45:34.000000000 +0900
+++ linux-2.6.21/kernel/stop_machine.c	2007-05-11 14:49:17.000000000 +0900
@@ -89,10 +89,6 @@ static void stopmachine_set_state(enum s
 static int stop_machine(void)
 {
 	int i, ret = 0;
-	struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
-
-	/* One high-prio thread per cpu.  We'll do this one. */
-	sched_setscheduler(current, SCHED_FIFO, &param);
 
 	atomic_set(&stopmachine_thread_ack, 0);
 	stopmachine_num_threads = 0;
@@ -184,6 +180,10 @@ struct task_struct *__stop_machine_run(i
 
 	p = kthread_create(do_stop, &smdata, "kstopmachine");
 	if (!IS_ERR(p)) {
+		struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
+		
+		/* One high-prio thread per cpu.  We'll do this one. */
+		sched_setscheduler(p, SCHED_FIFO, &param);
 		kthread_bind(p, cpu);
 		wake_up_process(p);
 		wait_for_completion(&smdata.done);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux