Re: [ANNOUNCE] 2.6.15-rc5-hrt2 - hrtimers based high resolution patches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2005-12-14 at 09:43 +0100, Ingo Molnar wrote:
> * Ingo Molnar <[email protected]> wrote:
> 
> > hm, in that case it should be base->running that protects against this 
> > case. Did you run an UP PREEMPT_RT kernel? In that case could you 
> > check whether the fix below resolves the crash too?
> 
> try the patch below instead. (Thomas pointed out that the correct 
> dependency in this case is PREEMPT_SOFTIRQS)
> 
> 	Ingo
> 
> Index: linux/kernel/hrtimer.c
> ===================================================================
> --- linux.orig/kernel/hrtimer.c
> +++ linux/kernel/hrtimer.c
> @@ -118,7 +118,7 @@ static DEFINE_PER_CPU(struct hrtimer_bas
>   * Functions and macros which are different for UP/SMP systems are kept in a
>   * single place
>   */
> -#ifdef CONFIG_SMP
> +#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_SOFTIRQS)
>  
>  #define set_curr_timer(b, t)		do { (b)->curr_timer = (t); } while (0)
>  

With the above patch, I get the following soft lockup.

BUG: jitter_inc:3085, possible softlockup detected on CPU#0!
 [<c0104223>] dump_stack+0x23/0x30 (20)
 [<c014c226>] softlockup_detected+0x46/0x60 (24)
 [<c014c301>] softlockup_tick+0xc1/0xd0 (20)
 [<c0126aaa>] update_process_times+0x8a/0x180 (32)
 [<c013e79a>] handle_tick_update+0x5a/0x70 (16)
 [<c013e609>] timer_interrupt+0x39/0x110 (24)
 [<c014c77d>] handle_IRQ_event+0x7d/0x110 (52)
 [<c014cbfd>] __do_IRQ+0xcd/0x170 (44)
 [<c0105724>] do_IRQ+0x44/0x60 (28)
 [<c0103ce6>] common_interrupt+0x1a/0x20 (80)
 [<c0136b98>] hrtimer_cancel+0x18/0x20 (16)
 [<c0339d6d>] schedule_hrtimer+0x4d/0xb0 (44)
 [<c013723e>] hrtimer_nanosleep+0x5e/0x130 (104)
 [<c0137383>] sys_nanosleep+0x73/0x80 (36)
 [<c010329a>] syscall_call+0x7/0xb (-8116)
---------------------------
| preempt count: 00010001 ]
| 1-level deep critical section nesting:
----------------------------------------
.. [<c0142b8c>] .... add_preempt_count+0x1c/0x20
.....[<c0136264>] ..   ( <= lock_hrtimer_base+0x34/0x80)

The system is locked up with this periodically spitting out.  My only
recourse is to hit the big P button (or the litter r one).

And going into gdb, I get:

(gdb) li *0xc0136b98
0xc0136b98 is in hrtimer_cancel (kernel/hrtimer.c:671).
666     int hrtimer_cancel(struct hrtimer *timer)
667     {
668             for (;;) {
669                     int ret = hrtimer_try_to_cancel(timer);
670
671                     if (ret >= 0)
672                             return ret;
673             }
674     }
675

So it may not really be locked, and if I waited a couple of hours, it
might actually finish (the test usually takes a couple of minutes to
run, and I let it run here for about 20 minutes).

Yeah, the above code would be very bad if this happens after preempting
the running timer.

So the fix to this, in the case of preempting the softirq, that we need
to introduce some wait queue that will allow processes to wait for the
softirq to finish, and then the softirq will wake up all the processes.

Ingo, would something like that fly?

-- Steve


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux