Re: del_timer_sync needed for UP RT systems.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Basically , there is a race condition in sys_timer_delete() and
posix_timer_event() .

>From sys_timer_delete() :

   /*
	 * This keeps any tasks waiting on the spin lock from thinking
	 * they got something (see the lock code above).
	 */
	if (timer->it_process) {
		if (timer->it_sigev_notify ==
(SIGEV_SIGNAL|SIGEV_THREAD_ID))
			put_task_struct(timer->it_process);
		timer->it_process = NULL;
	}
	unlock_timer(timer, flags);
	/* Preemption happens here. */
	release_posix_timer(timer, IT_ID_SET);


So when the timer is getting triggered , right before it takes the timer
lock, preemption happens. You finish the code above. Then your preempted
again right after unlock timer, shown above.

At this point, your triggering a timer that is half deleted, in posix_timer_fn() .
timer->it_process = NULL , so when you try to send the signal to the
timer owner you crash with an OOPS , cause the timer owner was just set
to NULL.

George, at least CC me, after all I found/documented this bug ..

Preliminary fix included ..

Daniel

On Tue, 2005-04-26 at 16:42, George Anzinger wrote:
> Ingo,
> 
> In tracking down the failure of a system running the RT patch we have found a 
> preemption between the time run_timer_list clears its spinlock and the call back 
> function (in this case in posix-timers.c) gets its spinlock.  The bad news is 
> that it is possible for the timer to be released at this point leaving the call 
> back code with a pointer to a bogus timer.
> 
> This was/is possible, of course, in SMP systems and is why del_timer_sync() 
> exists.  I suspect that del_timer_sync() needs to also do the "right thing" in 
> UP RT systems.
> 
> This means removing the #ifdef CONFIG_SMP at about line 56 of kernel/timer.c 
> thus setting up base->running_timer in all cases (or at least in SMP and RT 
> cases) and also the #ifdef CONFIG_SMP around del_timer_sync() and, of course, 
> the defines that redirect calls to these functions.
> 
> Does this make sense?
Source: MontaVista Software, Inc.
MR: 11506 
Type: Defect Fix
Disposition: needs submitting to LKML 
Signed-off-by: Daniel Walker <[email protected]>
Description:
	Ok, so here is the run down.. Basically , there is a race condition in
sys_timer_delete() and posix_timer_event() .

>From sys_timer_delete() :
		timer->it_process = NULL;
	}
	unlock_timer(timer, flags);
	/* Preemption happens here. */
	release_posix_timer(timer, IT_ID_SET);


So when the timer is getting triggered , right before it takes the timer
lock, preemption happens. You finish the code above. Then your preempted
again right after unlock timer, shown above.

At this point, your triggering a timer that is half deleted, in posix_timer_fn() .
timer->it_process = NULL , so when you try to send the signal to the
timer owner you crash with an OOPS , because the timer owner was just set
to NULL.

Index: linux-2.6.10/kernel/posix-timers.c
===================================================================
--- linux-2.6.10.orig/kernel/posix-timers.c	2005-04-26 17:38:25.000000000 +0000
+++ linux-2.6.10/kernel/posix-timers.c	2005-04-26 17:53:54.000000000 +0000
@@ -433,6 +433,14 @@ exit:
 
 int posix_timer_event(struct k_itimer *timr,int si_private)
 {
+	/*
+	 * If it_process is NULL then this timer is
+	 * in the process of being deleted. At this
+	 * point we can't do very much. So we
+	 * try to return gracefuly.
+	 */
+	if (timr->it_process == NULL) return 1;
+
 	memset(&timr->sigq->info, 0, sizeof(siginfo_t));
 	timr->sigq->info.si_sys_private = si_private;
 	/*

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux