On Wednesday 06 Jul 2005 17:24, Ingo Molnar wrote:
> * Alistair John Strachan <[email protected]> wrote:
> > > (generally i try to mark every message in the -RT kernel that signals
> > > some sort of anomaly with a 'BUG:' prefix - that makes it easy to do a
> > > 'dmesg | grep BUG:' to find out whether anything bad is going on. All
> > > other messages should be benign.)
> >
> > Okay, I've got multiple other BUG: messages within 2-3 minutes of
> > booting that highlight problems in areas other than ACPI. Are they
> > useful to you?
>
> yeah, please send them too - typically it's the first message that
> matters most (especially if the system crashes - which isnt the case
> here) - but sometimes the other ones are independent.
BUG: soft lockup detected on CPU#0!
[<c010421f>] dump_stack+0x1f/0x30 (20)
[<c0144a3a>] softlockup_tick+0x8a/0xb0 (24)
[<c0108607>] timer_interrupt+0x57/0x100 (20)
[<c0144ce5>] handle_IRQ_event+0x85/0x110 (52)
[<c0144e4a>] __do_IRQ+0xda/0x170 (44)
[<c0105b15>] do_IRQ+0x65/0xa0 (-196752)
=======================
[<c0103ceb>] common_interrupt+0x1f/0x24 (96)
[<c0101145>] cpu_idle+0x65/0x90 (16)
[<c03b0902>] start_kernel+0x182/0x1c0 (32)
[<c010019f>] 0xc010019f (1076011023)
(I get this one semi-frequently)
>
> > > yes, this is a problem. You can probably work it around by disabling
> > > ACPI, but it would be better to debug and fix it. The message was
> > > generated because the kernel spent too much time [more than 10 seconds]
> > > in acpi_processor_idle(), and the softlockup-thread (which runs at
> > > SCHED_FIFO prio 99) was not scheduled for that amount of time. [or it
> > > thought it was not scheduled.] Was there any suspend/resume activity
> > > while you got that message?
> >
> > No, this is during bootup that it occurs. Suspend and resume do work
> > and are compiled in on my laptop, but were never utilised.
>
> was there a 10 seconds delay during bootup for such a message to be
> generated? Nothing should delay the softlockup threads. (but maybe their
> initial timekeeping is somehow impacted.)
I don't think so, the whole boot process takes about 11 seconds. I have
noticed boot-time slow down since a few kernels ago, but I was never able to
identify whether it's just hotplug or a real freeze.
I'll get back to you on this if I find anything definitive. The later
messages, however, appear to happen sporadically and the machine does not
freeze or idle at the same time.
>
> > I've got a pair of nearly identical traces that highlight a 2645us
> > latency in smp_apic_timer_interrupt. I don't know how the trace is
> > formatted, so I can't tell if it occurred before or after this
> > function call. I also can't see how a ~1000us latency translates to a
> > ~2600us latency in the trace.
> >
> > Since both traces were under 6K each, and I think the list limit is
> > higher, I risked it and have attached both.
>
> thanks. They do show a real regression. softirq--3 got woken up at
> timestamp 1us:
>
> <idle>-0 0dnh2 1us : try_to_wake_up <softirq--3> (69 8c)
>
> then we return from the (presumably timer) interrupt at timestamp 3us:
>
> <idle>-0 0dnh. 3us < (608)
>
> ( '<' in the trace signals return activity - can happen for syscalls and
> for interrupts.)
>
> but the ACPI code is busy going to sleep:
>
> <idle>-0 0dn.. 3us : acpi_hw_register_write (acpi_set_register)
> <idle>-0 0dn.. 4us : acpi_hw_low_level_write
> (acpi_hw_register_write) <idle>-0 0dn.. 4us+: acpi_os_write_port
> (acpi_hw_low_level_write) <idle>-0 0dn.. 7us!:
> acpi_hw_low_level_write (acpi_hw_register_write) <idle>-0 0dnh. 2645us
> : smp_apic_timer_interrupt (c0252485 0 0)
>
> and doesnt return for another 2638 microseconds!
>
> the bug is probably that the ACPI code became interruptible when we
> introduced the IRQ soft-flag. This is clearly visible from the "d" flag:
>
> _------=> CPU#
> / _-----=> irqs-off
>
> | / _----=> need-resched
> |
> || / _---=> hardirq/softirq
> ||
> ||| / _--=> preempt-depth
> |||
> |||| /
> ||||
> ||||| delay
>
> cmd pid ||||| time | caller
> \ / ||||| \ | /
> <idle>-0 0dn.. 3us : acpi_hw_register_write (acpi_set_register)
> /-----------^
> here
>
> small 'd' means the soft IRQ-flag was disabled. Capital 'D' means that
> the CPU's irq-flag got disabled too. The ACPI code, when it prepares to
> sleep, has to disable direct interrupts too, otherwise the above
> scenario may occur. If a timer interrupt hits the ACPI code in that
> small window where it has already checked for need_resched(), but has
> not gone to sleep yet (so it cannot react to the timer IRQ by waking
> up), then we lose the wakeup.
>
> could you try the patch below (or the -51-05 patch that i just
> uploaded), does it fix this latency?
>
> Ingo
I'm beginning to understand the issue, and I see why you think the proposed
patch fixes it. I'll compile and boot V0.7.51-05 now.
--
Cheers,
Alistair.
personal: alistair()devzero!co!uk
university: s0348365()sms!ed!ac!uk
student: CS/CSim Undergraduate
contact: 1F2 55 South Clerk Street,
Edinburgh. EH8 9PP.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
|
|