Re: Soft lockup with -mm

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Brice Goglin <[email protected]> wrote:
>
> I was seeing a lockup with several -mm releases since 2.6.12-rc2-mm1
> (IIRC). With 2.6.12-rc2-mm1, I remember getting the lockup a few minutes
> after boot time.
> With 2.6.12-rc3-mm1, I waited several days before getting it.
> But, I finally caught this one with netconsole. So here it is:
> 
> BUG: soft lockup detected on CPU#0!
> Pid: 0, comm:              swapper
> EIP: 0060:[<c02d40a5>] CPU: 0
> EIP is at _spin_unlock_irqrestore+0x5/0x30
>  EFLAGS: 00000286    Not tainted  (2.6.12-rc3-mm1=Pignouf)
> EAX: c18e8160 EBX: c18e8160 ECX: 00000001 EDX: 00000286
> ESI: c18e0160 EDI: dbf96c64 EBP: ffffffff DS: 007b ES: 007b
> CR0: 8005003b CR2: b6e43000 CR3: 0e912000 CR4: 00000690
>  [<c012a635>] __mod_timer+0xc5/0xf0

It could be the timer bug.  Can you try it with Oleg's fix?


From: Oleg Nesterov <[email protected]>

The bug was identified by Maneesh Soni.

When __mod_timer() changes timer's base it waits for the completion of
timer->function.  It is just stupid: the caller of __mod_timer() can held
locks which would prevent completion of the timer's handler.

Solution: do not change the base of the currently running timer.

Side effect: __mod_timer() doesn't garantees anymore that timer will run on
the local cpu.

Signed-off-by: Oleg Nesterov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---

 kernel/timer.c |   42 ++++++++++++++++++++----------------------
 1 files changed, 20 insertions(+), 22 deletions(-)

diff -puN kernel/timer.c~timers-fix-__mod_timer-vs-__run_timers-deadlock kernel/timer.c
--- 25/kernel/timer.c~timers-fix-__mod_timer-vs-__run_timers-deadlock	2005-05-01 02:20:28.415889280 -0700
+++ 25-akpm/kernel/timer.c	2005-05-01 02:20:28.420888520 -0700
@@ -211,41 +211,39 @@ int __mod_timer(struct timer_list *timer
 	timer_base_t *base;
 	tvec_base_t *new_base;
 	unsigned long flags;
-	int ret = -1;
+	int ret;
 
 	BUG_ON(!timer->function);
 	check_timer(timer);
 
-	do {
-		base = lock_timer_base(timer, &flags);
-		new_base = &__get_cpu_var(tvec_bases);
+	base = lock_timer_base(timer, &flags);
 
-		/* Ensure the timer is serialized. */
-		if (base != &new_base->t_base
-			&& base->running_timer == timer)
-			goto unlock;
+	ret = 0;
+	if (timer_pending(timer)) {
+		detach_timer(timer, 0);
+		ret = 1;
+	}
 
-		ret = 0;
-		if (timer_pending(timer)) {
-			detach_timer(timer, 0);
-			ret = 1;
-		}
+	new_base = &__get_cpu_var(tvec_bases);
 
-		if (base != &new_base->t_base) {
+	if (base != &new_base->t_base) {
+		if (unlikely(base->running_timer == timer))
+			/* Don't change timer's base while it is running.
+			 * Needed for serialization of timer wrt itself. */
+			new_base = container_of(base, tvec_base_t, t_base);
+		else {
 			timer->base = NULL;
 			/* Safe: the timer can't be seen via ->entry,
 			 * and lock_timer_base checks ->base != 0. */
 			spin_unlock(&base->lock);
-			base = &new_base->t_base;
-			spin_lock(&base->lock);
-			timer->base = base;
+			spin_lock(&new_base->t_base.lock);
+			timer->base = &new_base->t_base;
 		}
+	}
 
-		timer->expires = expires;
-		internal_add_timer(new_base, timer);
-unlock:
-		spin_unlock_irqrestore(&base->lock, flags);
-	} while (ret < 0);
+	timer->expires = expires;
+	internal_add_timer(new_base, timer);
+	spin_unlock_irqrestore(&new_base->t_base.lock, flags);
 
 	return ret;
 }
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux