Alan Cox wrote:
On Mer, 2006-05-03 at 21:25 +0100, Tim Small wrote:
something with NMI-signalled errors, I was wondering what the problems
with using NMI-signalled ECC errors were?
The big problem with NMI is that it can occur *during* a PCI
configuration sequence (ie during pci_config_* functions). That means we
can't safely do some I/O, especially configuration space I/O in an NMI
handler. At best we could set a flag and catch it afterwards.
I was assuming this was the case - but I don't think that deferring the
work until after the NMI handler has returned is necessarily a big
disadvantage - at least as far as ECC register-status checking is
concerned - since none of the hardware that I've looked at makes any
sort of guarantee about the timeliness of ECC-error-triggered NMI
delivery anyway - so any of the really smart (and urgent) stuff that you
could potentially do as part of the ECC error handling (e.g. terminating
a process if one of their physical pages was mangled) is not possible to
do in a reliable manner anyway.
About the best thing it is possible to do is to try and arrange to take
the page(s) in which an uncorrectable error occurred out of further use
(maybe do the same for correctable errors, if the same physical page
sees repeated correctable errors), plus maybe give the option of
panicing if an uncorrectable page was in use by the kernel?
My first thought was to schedule a tasklet as part of the ECC-specific
NMI handling, or are there any gotchas with doing this from within an
NMI handler?
Cheers,
Tim.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]