Re: MSI problem since 2.6.21 for devices not providing a mask in their MSI capability

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Loic Prylli <[email protected]> writes:

> Hi,
>
> We observe a problem with MSI since kernel 2.6.21 where interrupts would
> randomly stop working. We have tracked it down to the new
> msi_set_mask_bit definition in 2.6.21. In the MSI case with a device not
> providing a "native" MSI mask, it was a no-op before, and now it
> disables MSI in the MSI-ctl register which according to the PCI spec is
> interpreted as reverting the device to legacy interrupts. If such a
> device try to generate a new interrupt during the "masked" window, the
> device will try a legacy interrupt which is generally
> ignored/never-acked and cause interrupts to no longer work for the
> device/driver combination (even after the enable bit is restored).

We should also be leaving the INTx irqs disabled.  So no irq
should be generated.

If you have a mask bit implemented you are required to be
able to refire it after the msi is enabled.  I don't recall
the requirements for when both intx and msi irqs are both
disabled.  Intuitively I would expect no irq message to
be generated, and at most the card would need to be polled
manually to recognize a device event happened.

Certainly firing an irq and having it get completely lost is
unfortunate, and a major pain if you are trying to use the
card.

As for the previous no-op behavior that was a bug.

> Is there anything apart from irq migration that strongly requires
> masking? Is is possible to do the irq migration without masking?

enable_irq/disable_irq.  Although we can get away with a software
emulation there and those are only needed if the driver calls them.

The PCI spec requires disabling/masking the msi when reprogramming it.
So as a general rule we can not do better.  Further because we are
writing to multiple pci config registers the only way we can safely
reprogram the message is with the msi disabled/masked on the card in
some fashion.

I suspect what needs to happen is a spec search to verify that the
current linux behavior is at least reasonable within the spec.

Once we have verified that the generic code can not do better.
We can look at work-arounds.   One possibility is for the generic
code to provide some overrides for the methods for masking and
reading/writing to a msi message.

I don't want to break anyones hardware, but at the same time I want us
to be careful and in spec for the default case.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux