"Siddha, Suresh B" <[email protected]> writes:
> on x86_64 kernel, level triggered irq migration gets initiated in the context
> of that interrupt(after executing the irq handler) and following steps are
> followed to do the irq migration.
>
> 1. mask IOAPIC RTE entry; // write to IOAPIC RTE
> 2. EOI; // processor EOI write
> 3. reprogram IOAPIC RTE entry // write to IOAPIC RTE with new destination and
> // and interrupt vector due to per cpu vector
> // allocation.
> 4. unmask IOAPIC RTE entry; // write to IOAPIC RTE
>
> Because of the per cpu vector allocation in x86_64 kernels, when the irq
> migrates to a different cpu, new vector(corresponding to the new cpu) will
> get allocated.
>
> An EOI write to local APIC has a side effect of generating an EOI write
> for level trigger interrupts (normally this is a broadcast to all IOAPICs).
> The EOI broadcast generated as a side effect of EOI write to processor may
> be delayed while the other IOAPIC writes (step 3 and 4) can go through.
>
> Normally, the EOI generated by local APIC for level trigger interrupt
> contains vector number. The IOAPIC will take this vector number and
> search the IOAPIC RTE entries for an entry with matching vector number and
> clear the remote IRR bit (indicate EOI). However, if the vector number is
> changed (as in step 3) the IOAPIC will not find the RTE entry when the EOI
> is received later. This will cause the remote IRR to get stuck causing the
> interrupt hang (no more interrupt from this RTE).
>
> Current x86_64 kernel assumes that remote IRR bit is cleared by the time
> IOAPIC RTE is reprogrammed. Fix this assumption by checking for remote IRR
> bit and if it still set, delay the irq migration to the next interrupt
> arrival event(hopefully, next time remote IRR bit will get cleared
> before the IOAPIC RTE is reprogrammed).
>
> Initial analysis and patch from Nanhai.
In essence this makes sense, and it may be the best work around for
buggy hardware available. However I am not convinced that the remote
IRR on ioapics works reliably enough to be used for anything. I
tested this earlier and I could not successfully poll the remote irr
bit to see if an ioapic had received an irq acknowledgement. Instead
I locked up irq controllers.
If remote IRR worked reliably I could have pulled the irq migration
out of irq context. So this fix looks dubious to me.
Why is the EOI delayed? Can we work around that?
It would be nice if ioapics and local apics actually obeyed the pci
ordering rules where a read would flush all traffic. And we do have a
read in there.
I'm assuming the symptom you are seeing on current kernels is that occasionally
the irq gets stuck and never fires again?
I'm not certain I like the patch either, but I need to look more closely.
You are mixing changes to generic and arch specific code together.
I think pending_eoi should not try to reuse __DO_ACTION as that
helper bit of code does not seem appropriate.
It would probably be best if the pending_eoi check was in
ack_apic_level with the rest of the weird logic we are working around.
Could we please have more detail on this hardware behavior. Why is
the EIO write write delayed? Why can't we just issue reads in
appropriate places to flush the write?
I need to think about this some more to see if there are any other
possible ways we could address this issue that would be more robust.
Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]