Re: MSI interrupts and disable_irq

Ayaz Abdulla wrote:

I am trying to track down a forcedeth driver issue described by bug 9047in bugzilla (2.6.23-rc7-git1 forcedeth w/ MCP55 oops under heavy load).I added a patch to synchronize the timer handlers so that one handlerdoesn't accidently enable the IRQ while another timer handler is running(see attachment 'Add timer lock' in bug report) and for other processingprotection.
However, the system still had an Oops. So I added a lock around thenv_rx_process_optimized() and the Oops has not happened (see attachment'New patch for locking' in bug report). This would imply asynchronization issue. However, the only callers of that function arethe IRQ handler and the timer handlers (in non-NAPI case). The timerhandlers use disable_irq so that the IRQ handler does not contend withthem. It looks as if disable_irq is not working properly.
This issue repros only with MSI interrupt and not legacy INTxinterrupts. Any ideas?

(added linux-kernel to CC, since I think it's more of a general kernelissue)

To be brutally frank, I always thought this disable_irq() mess was ahack both ugly and fragile. This disable_irq() work that appeared in acouple net drivers was correct at the time, so I didn't feel I had thejustification to reject it, but it still gave me a bad feeling.

I think the scenario you outline is an illustration of the approach'sfragility: disable_irq() is a heavy hammer that originated with INTx,and it relies on a chip-specific disable method (kernel/irq/manage.c)that practically guarantees behavior will vary across MSI/INTx/etc.

Practices like forcedeth's unique locking work for a time, but it shouldbe a warning sign any time you stray from the normal spin_lock_irqsave()method of synchronization.

Based on your report, it is certainly possible that there is a problemwith MSI's desc->chip->disable() method... but I would actuallyrecommend working around the problem by making the forcedeth lockingmore standardized by removing all those disable_irq() hacks.

Using spinlocks like other net drivers (note: avoid NETIF_F_LLTXdrivers) has a high probability of both fixing your current problem, andgiving forcedeth a more stable foundation for the long term. In myhumble opinion :)

	Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: MSI interrupts and disable_irq
  - From: Stephen Hemminger <shemminger@linux-foundation.org>

Prev by Date: Re: Truncated Filesystem
Next by Date: Re: [patch/backport] CFS scheduler, -v22, for v2.6.23-rc8, v2.6.22.8, v2.6.21.7, v2.6.20.20
Previous by thread: regression in 2.6.23-rc8 - power off failed on old laptop
Next by thread: Re: MSI interrupts and disable_irq
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]