Re: [RFC][PATCH] kdump: x86_64 timer interrupt lockup due to pending interrupt

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Vivek Goyal <[email protected]> writes:

> On Mon, Mar 06, 2006 at 10:43:32PM +0100, Andi Kleen wrote:
>> On Mon, Mar 06, 2006 at 11:40:34AM -0500, Vivek Goyal wrote:
>> > 
>> > o check_timer() routine fails while second kernel is booting after a crash
>> > on an opetron box. Problem happens because timer vector (0x31) seems to be
>> >   locked.
>> > 
>> > o After a system crash, it is not safe to service interrupts any more, hence
>> >   interrupts are disabled. This leads to pending interrupts at LAPIC. LAPIC
>> > sends these interrupts to the CPU during early boot of second kernel. Other
>> > pending interrupts are discarded saying unexpected trap but timer interrupt
>> >   is serviced and CPU does not issue an LAPIC EOI because it think this
>> > interrupt came from i8259 and sends ack to 8259. This leads to vector 0x31
>> >   locking as LAPIC does not clear respective ISR and keeps on waiting for
>> >   EOI.
>> > 
>> > o In this patch, one extra EOI is being issued in check_timer() to unlock
> the
>> >   vector. Please suggest if there is a better way to handle this situation.
>> 
>> Shouldn't we rather do this for all interrupts when the APIC is set up? 
>> I don't see how the timer is special here.
>>
>
> Timer is a special case here.
>
> In other cases, the moment interrupts are enabled on cpu, LAPIC pushes pending
> interrupts to cpu and it is ignored as bad irq using ack_bad_irq(). This
> still sends EOI to LAPIC if LPAIC support is compiled in.
>
> But for timer, the moment pending interrupt is pushed to cpu, it is handled
> as valid interrupt and cpu assumes that it came from 8259 and sends ack to
> 8259 and not to LAPIC. Hence leads to missing EOI for timer vector and 
> deadlock.
>
> But still doing it generic manner for all interrupts while setting up LAPIC
> probably makes more sense. Please find attached the patch.

A couple of questions. 

Does this need to be in #ifdef CONFIG_CRASS_DUMP?
If this code is truly safe I expect we could run it on every bootup
simply to be more robust.

Why is APIC_ISR_NR a hard code?  I think there is an apic register
that tells the count.

Does ack_APIC_irq take an argument?  I am confused that we are calling
ack_APIC_irq() potentially 8*32 times without passing it anything.


Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux