Re: [PATCH -rt] Fix TASKLET_STATE_SCHED WARN_ON()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2007-06-05 at 19:17 -0700, john stultz wrote:
> Hey Ingo,
> 	So we've been seeing the following trace fairly frequently on our SMP
> boxes when running kernbench:
> 
> BUG: at kernel/softirq.c:639 __tasklet_action()
> 
> Call Trace:
>  [<ffffffff8106d5da>] dump_trace+0xaa/0x32a
>  [<ffffffff8106d89b>] show_trace+0x41/0x5c
>  [<ffffffff8106d8cb>] dump_stack+0x15/0x17
>  [<ffffffff81094a97>] __tasklet_action+0xdf/0x12e
>  [<ffffffff81094f76>] tasklet_action+0x27/0x29
>  [<ffffffff8109530a>] ksoftirqd+0x16c/0x271
>  [<ffffffff81033d4d>] kthread+0xf5/0x128
>  [<ffffffff8105ff68>] child_rip+0xa/0x12
> 
> 
> Paul also pointed this out awhile back: http://lkml.org/lkml/2007/2/25/1
> 
> 
> Anyway, I think I finally found the issue. Its a bit hard to explain,
> but the idea is while __tasklet_action is running the tasklet function
> on CPU1, if a call to tasklet_schedule() on CPU2 is made, and if right
> after we mark the TASKLET_STATE_SCHED bit we are preempted,
> __tasklet_action on CPU1 might be able to re-run the function, clear the
> bit and unlock the tasklet before CPU2 enters __tasklet_common_schedule.
> Once __tasklet_common_schedule locks the tasklet, we will add the
> tasklet to the list with the TASKLET_STATE_SCHED *unset*. 
> 
> I've verified this race occurs w/ a WARN_ON in
> __tasklet_common_schedule().
> 
> 
> This fix avoids this race by making sure *after* we've locked the
> tasklet that the STATE_SCHED bit is set before adding it to the list.
> 
> Does it look ok to you?
> 
> thanks
> -john
> 
> Signed-off-by: John Stultz <[email protected]>
> 
> Index: 2.6-rt/kernel/softirq.c
> ===================================================================
> --- 2.6-rt.orig/kernel/softirq.c	2007-06-05 18:30:54.000000000 -0700
> +++ 2.6-rt/kernel/softirq.c	2007-06-05 18:36:44.000000000 -0700
> @@ -544,10 +544,17 @@ static void inline
>  __tasklet_common_schedule(struct tasklet_struct *t, struct tasklet_head *head, unsigned int nr)
>  {
>  	if (tasklet_trylock(t)) {
> -		WARN_ON(t->next != NULL);
> -		t->next = head->list;
> -		head->list = t;
> -		raise_softirq_irqoff(nr);
> +		/* We may have been preempted before tasklet_trylock
> +		 * and __tasklet_action may have already run.
> +		 * So double check the sched bit while the takslet
> +		 * is locked before adding it to the list.
> +		 */
> +		if (test_bit(TASKLET_STATE_SCHED, &t->state)) {
> +			WARN_ON(t->next != NULL);
> +			t->next = head->list;
> +			head->list = t;
> +			raise_softirq_irqoff(nr);
> +		}
>  		tasklet_unlock(t);
>  	}
>  }

So while digging on a strange OOM issue we were seeing (which actually
ended up being fixed by Steven's softirq patch), I noticed that the fix
above is incomplete. With only the patch above, we may no longer have
unscheduled tasklets added to the list, but we may end up with scheduled
tasklets that are not on the list (and will stay that way!).

The following additional patch should correct this issue. Although since
we weren't actually hitting it, the issue is a bit theoretical, so I've
not been able to prove it really fixes anything.

thanks
-john

Index: 2.6-rt/kernel/softirq.c
===================================================================
--- 2.6-rt.orig/kernel/softirq.c	2007-06-12 20:30:11.000000000 -0700
+++ 2.6-rt/kernel/softirq.c	2007-06-12 20:37:22.000000000 -0700
@@ -544,6 +544,7 @@
 __tasklet_common_schedule(struct tasklet_struct *t, struct tasklet_head *head, unsigned int nr)
 {
 	if (tasklet_trylock(t)) {
+again:
 		/* We may have been preempted before tasklet_trylock
 		 * and __tasklet_action may have already run.
 		 * So double check the sched bit while the takslet
@@ -554,8 +555,21 @@
 			t->next = head->list;
 			head->list = t;
 			raise_softirq_irqoff(nr);
+			tasklet_unlock(t);
+		} else {
+			/* This is subtle. If we hit the corner case above
+			 * It is possible that we get preempted right here,
+			 * and another task has successfully called
+			 * tasklet_schedule(), then this function, and
+			 * failed on the trylock. Thus we must be sure
+			 * before releasing the tasklet lock, that the
+			 * SCHED_BIT is clear. Otherwise the tasklet
+			 * may get its SCHED_BIT set, but not added to the
+			 * list
+			 */
+			if (!tasklet_tryunlock(t))
+				goto again;
 		}
-		tasklet_unlock(t);
 	}
 }
 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux