Re: RT and Cascade interrupts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



john cooper wrote:
I'm seeing the BUG assert in kernel/timers.c:cascade()
kick in (tmp->base is somehow 0) during a test which
creates a few tasks of priority higher than ksoftirqd.
This race doesn't happen if ksoftirqd's priority is
elevated (eg: chrt -f -p 75 2) so the -RT patch might
be opening up a window here.

There is a window in rpc_run_timer() which allows
it to lose track of timer ownership when ksoftirqd
(and thus itself) are preempted.  This doesn't
immediately cause a problem but does corrupt
the timer cascade list when the timer struct is
recycled/requeued.  This shows up some time later
as the list is processed.  The failure mode is cascade()
attempting to percolate a timer with poisoned
next/prev *s and a NULL base causing the assertion
BUG(tmp->base != base) to kick in.

The RPC code is attempting to replicate state of
timer ownership for a given rpc_task via RPC_TASK_HAS_TIMER
in rpc_task.tk_runstate.  Besides not working
correctly in the case of preemptable context it is
a replication of state of a timer pending in the
cascade structure (ie: timer->base).  The fix
changes the RPC code to use timer->base when
deciding whether an outstanding timer registration
exists during rpc_task tear down.

Note: this failure occurred in the 40-04 version of
the patch though it applies to more current versions.
It was seen when executing stress tests on a number
of PPC targets running on an NFS mounted root though
was not observed on a x86 target under similar
conditions.

-john


--
[email protected]
./include/linux/sunrpc/sched.h
./net/sunrpc/sched.c
=================================================================
--- ./include/linux/sunrpc/sched.h.ORG	2005-05-24 10:29:24.000000000 -0400
+++ ./include/linux/sunrpc/sched.h	2005-05-24 10:47:56.000000000 -0400
@@ -142,7 +142,6 @@ typedef void			(*rpc_action)(struct rpc_
 #define RPC_TASK_RUNNING	0
 #define RPC_TASK_QUEUED		1
 #define RPC_TASK_WAKEUP		2
-#define RPC_TASK_HAS_TIMER	3
 
 #define RPC_IS_RUNNING(t)	(test_bit(RPC_TASK_RUNNING, &(t)->tk_runstate))
 #define rpc_set_running(t)	(set_bit(RPC_TASK_RUNNING, &(t)->tk_runstate))
=================================================================
--- ./net/sunrpc/sched.c.ORG	2005-05-24 10:29:52.000000000 -0400
+++ ./net/sunrpc/sched.c	2005-05-24 11:02:44.000000000 -0400
@@ -103,9 +103,6 @@ static void rpc_run_timer(struct rpc_tas
 		dprintk("RPC: %4d running timer\n", task->tk_pid);
 		callback(task);
 	}
-	smp_mb__before_clear_bit();
-	clear_bit(RPC_TASK_HAS_TIMER, &task->tk_runstate);
-	smp_mb__after_clear_bit();
 }
 
 /*
@@ -124,7 +121,6 @@ __rpc_add_timer(struct rpc_task *task, r
 		task->tk_timeout_fn = timer;
 	else
 		task->tk_timeout_fn = __rpc_default_timer;
-	set_bit(RPC_TASK_HAS_TIMER, &task->tk_runstate);
 	mod_timer(&task->tk_timer, jiffies + task->tk_timeout);
 }
 
@@ -135,7 +131,7 @@ __rpc_add_timer(struct rpc_task *task, r
 static inline void
 rpc_delete_timer(struct rpc_task *task)
 {
-	if (test_and_clear_bit(RPC_TASK_HAS_TIMER, &task->tk_runstate)) {
+	if (task->tk_timer.base) {
 		del_singleshot_timer_sync(&task->tk_timer);
 		dprintk("RPC: %4d deleting timer\n", task->tk_pid);
 	}

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux