Re: [PATCH 6/7] ppc64: EEH Avoid racing reports of errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Oct 05, 2005 at 09:23:11PM +1000, Paul Mackerras was heard to remark:
> Linas writes:
> 
> > 06-eeh-report-race.patch
> 
> Shouldn't you pass in pe_dn->child here, or
> alternatively rearrange __eeh_mark_slot to do the node you give it
> plus its children (recursively)?

Yes; that's right; this gets fixed in a later patch in the series. 
I guess this one snuck by while I was trying to sync up all the
different patches I was carrying :-/

> Two other comments about __eeh_mark_slot: (1) despite the comment, the
> function doesn't do anything to any pci_dev or pci_driver 

The comment is also a "back port" of function that shows up in a later
patch, and so indeed is inappropriate for this patch. Again, my excuse 
is that I got sloppy while juggling all of these patchlets. Sorry.

> (not that it
> should be touching any pci_driver), 

One problem I was seeing was that after getting an EEH error, 
some device drivers would start spinning in thier interrupt handlers.
I tried to break out of this spin-loop by adding a call to a
function that asked "am I the victim of an EEH event"?  
Unfortunately, the first implementation of this call was not 
interrupt safe (pci_device_to_OF_node calls traverse_pci_devices).
While scratching my head on to how to best fix this, I decided that 
the best thing to do would be to mark up the pci driver with a flag;
that way, the driver can look up te EEH state without any further ado.

One might be able to get rid of this state in pci_driver, 
although it seemed generically useful to have.  For example,
later on, I futzed with a version that disabled the irq line 
for that adapter "as soon as possible", and that seems to also 
work, at least on an SMP machine. On a non-SMP machine, there 
is still the danger that the device driver is spinning with 
interrupts disabled, waiting on a status regiser to change, 
that will never change. (And because of the deadlock, the 
code to disable a given irq line never runs).  Its all
depends on how the device driver got written.

> and (2) a recursive function can't
> really be inline 

Well, no, but at least the first level call can be inlined; I assumed 
that gcc would do at least that, but didn't check.

--linas

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux