Re: Realtime Preemption, 2.6.12, Beginners Guide?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wednesday 06 Jul 2005 17:24, Ingo Molnar wrote:
> * Alistair John Strachan <[email protected]> wrote:
> > > (generally i try to mark every message in the -RT kernel that signals
> > > some sort of anomaly with a 'BUG:' prefix - that makes it easy to do a
> > > 'dmesg | grep BUG:' to find out whether anything bad is going on. All
> > > other messages should be benign.)
> >
> > Okay, I've got multiple other BUG: messages within 2-3 minutes of
> > booting that highlight problems in areas other than ACPI. Are they
> > useful to you?
>
> yeah, please send them too - typically it's the first message that
> matters most (especially if the system crashes - which isnt the case
> here) - but sometimes the other ones are independent.

BUG: soft lockup detected on CPU#0!
 [<c010421f>] dump_stack+0x1f/0x30 (20)
 [<c0144a3a>] softlockup_tick+0x8a/0xb0 (24)
 [<c0108607>] timer_interrupt+0x57/0x100 (20)
 [<c0144ce5>] handle_IRQ_event+0x85/0x110 (52)
 [<c0144e4a>] __do_IRQ+0xda/0x170 (44)
 [<c0105b15>] do_IRQ+0x65/0xa0 (-196752)
 =======================
 [<c0103ceb>] common_interrupt+0x1f/0x24 (96)
 [<c0101145>] cpu_idle+0x65/0x90 (16)
 [<c03b0902>] start_kernel+0x182/0x1c0 (32)
 [<c010019f>] 0xc010019f (1076011023)

(I get this one semi-frequently)

>
> > > yes, this is a problem. You can probably work it around by disabling
> > > ACPI, but it would be better to debug and fix it. The message was
> > > generated because the kernel spent too much time [more than 10 seconds]
> > > in acpi_processor_idle(), and the softlockup-thread (which runs at
> > > SCHED_FIFO prio 99) was not scheduled for that amount of time. [or it
> > > thought it was not scheduled.] Was there any suspend/resume activity
> > > while you got that message?
> >
> > No, this is during bootup that it occurs. Suspend and resume do work
> > and are compiled in on my laptop, but were never utilised.
>
> was there a 10 seconds delay during bootup for such a message to be
> generated? Nothing should delay the softlockup threads. (but maybe their
> initial timekeeping is somehow impacted.)

I don't think so, the whole boot process takes about 11 seconds. I have 
noticed boot-time slow down since a few kernels ago, but I was never able to 
identify whether it's just hotplug or a real freeze.

I'll get back to you on this if I find anything definitive. The later 
messages, however, appear to happen sporadically and the machine does not 
freeze or idle at the same time.

>
> > I've got a pair of nearly identical traces that highlight a 2645us
> > latency in smp_apic_timer_interrupt. I don't know how the trace is
> > formatted, so I can't tell if it occurred before or after this
> > function call. I also can't see how a ~1000us latency translates to a
> > ~2600us latency in the trace.
> >
> > Since both traces were under 6K each, and I think the list limit is
> > higher, I risked it and have attached both.
>
> thanks. They do show a real regression. softirq--3 got woken up at
> timestamp 1us:
>
>   <idle>-0     0dnh2    1us : try_to_wake_up <softirq--3> (69 8c)
>
> then we return from the (presumably timer) interrupt at timestamp 3us:
>
>   <idle>-0     0dnh.    3us < (608)
>
> ( '<' in the trace signals return activity - can happen for syscalls and
> for interrupts.)
>
> but the ACPI code is busy going to sleep:
>
>   <idle>-0     0dn..    3us : acpi_hw_register_write (acpi_set_register)
>   <idle>-0     0dn..    4us : acpi_hw_low_level_write
> (acpi_hw_register_write) <idle>-0     0dn..    4us+: acpi_os_write_port
> (acpi_hw_low_level_write) <idle>-0     0dn..    7us!:
> acpi_hw_low_level_write (acpi_hw_register_write) <idle>-0     0dnh. 2645us
> : smp_apic_timer_interrupt (c0252485 0 0)
>
> and doesnt return for another 2638 microseconds!
>
> the bug is probably that the ACPI code became interruptible when we
> introduced the IRQ soft-flag. This is clearly visible from the "d" flag:
>
>                  _------=> CPU#
>                 / _-----=> irqs-off
>
>                | / _----=> need-resched
>                |
>                || / _---=> hardirq/softirq
>                ||
>                ||| / _--=> preempt-depth
>                |||
>                |||| /
>                ||||
>                |||||     delay
>
>    cmd     pid ||||| time  |   caller
>       \   /    |||||   \   |   /
>   <idle>-0     0dn..    3us : acpi_hw_register_write (acpi_set_register)
>     /-----------^
>  here
>
> small 'd' means the soft IRQ-flag was disabled. Capital 'D' means that
> the CPU's irq-flag got disabled too. The ACPI code, when it prepares to
> sleep, has to disable direct interrupts too, otherwise the above
> scenario may occur. If a timer interrupt hits the ACPI code in that
> small window where it has already checked for need_resched(), but has
> not gone to sleep yet (so it cannot react to the timer IRQ by waking
> up), then we lose the wakeup.
>
> could you try the patch below (or the -51-05 patch that i just
> uploaded), does it fix this latency?
>
> 	Ingo

I'm beginning to understand the issue, and I see why you think the proposed 
patch fixes it. I'll compile and boot V0.7.51-05 now.

-- 
Cheers,
Alistair.

personal:   alistair()devzero!co!uk
university: s0348365()sms!ed!ac!uk
student:    CS/CSim Undergraduate
contact:    1F2 55 South Clerk Street,
            Edinburgh. EH8 9PP.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]
  Powered by Linux