RE: Problems with EDAC coexisting with BIOS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




>-----Original Message-----
>From: Tim Small [mailto:[email protected]]
>Sent: Wednesday, May 03, 2006 1:25 PM
>To: Alan Cox
>Cc: Ong, Soo Keong; Gross, Mark; [email protected];
>LKML; Carbonari, Steven; Wang, Zhenyu Z
>Subject: Re: Problems with EDAC coexisting with BIOS
>
>Alan Cox wrote:
>
>>On Llu, 2006-04-24 at 22:15 +0800, Ong, Soo Keong wrote:
>>
>>
>>>To me, periodical is not a good design for error handling, it wastes
>>>transaction bandwidth that should be used for other more productive
>>>purposes.
>>>
>>>
>>
>>The periodical choice is mostly down to the brain damaged choice of
NMI
>>as the viable alternative, which is as good as 'not usable'
>>
>>
>Hi,
>
>As I believe that the majority of the bluesmoke/EDAC developers are
>(were) operating under the assumption that it would be possible to do
>something with NMI-signalled errors, I was wondering what the problems
>with using NMI-signalled ECC errors were?

Soo Keong or Steve, can you answer this one?

>
>Are there some systems states in which an incoming NMI throws a spanner
>in to the works in an unrecoverable way?  If this is the case, is it so
>on all x86/x86-64 systems, or just a subset, and is there no way to
>implement some sort of top half / bottom half style NMI handler
>cleanly?  As I am certainly not an x86 architecture expert, I would
>appreciate any input from the resident gurus ;o).

I don't think NMI's are so much the problem as those blasted SMI's.  And
SMI's are not for sharing :(

These problems are BIOS specific, Your Mileage Will Vary from one bios
to the next.

>
>Quickly returning to the original problem - I know this isn't a proper
>API by any stretch of the imagination, and would require changes to
>existing BIOSs, but the EDAC module could reprogram the chipset
>error-signalling registers, so that an ECC error no longer triggers an
>SMI.  The BIOS SMI handler could then read the signalling registers,
and
>leave the ECC registers well alone if ECC errors are not set to
generate
>an SMI.

It's not the SMI from ECC events that are the problem.  It's the BIOS
assuming no one else would want to use those registers on Dev0:Fun1, and
the fact that you still end up with a race between the OS and the BIOS
SMI to handle the events.

>
>Cheers,
>
>Tim.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux