RE: Problems with EDAC coexisting with BIOS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




>-----Original Message-----
>From: Alan Cox [mailto:[email protected]]
>Sent: Monday, April 24, 2006 6:19 AM
>To: Gross, Mark
>Cc: [email protected]; LKML; Carbonari, Steven;
Ong,
>Soo Keong; Wang, Zhenyu Z
>Subject: Re: Problems with EDAC coexisting with BIOS
>
>On Gwe, 2006-04-21 at 09:01 -0700, Gross, Mark wrote:
>> 1) The default AMI BIOS behavior on SMI is to check the chipset error
>> registers (Dev0:Fun1) and re-hide them.
>
>The words "bad design" come to mind (followed by a large number of more
>accurate phrases that are inappropriate for a public list)
>

Yes, EDAC should not have existed without first working more closely
with the BIOS folks.  It's too bad we (Intel) didn't catch this.  The
BIOS folks where not in the loop with the driver folks making
contributions to EDAC previously.

>> Basically if device 0 : function 1 is hidden by the platform at boot
>> time un-hiding and using the device and function is a risky thing to
do,
>
>Intel provided patches that do exactly this for some of the chip
>workarounds. Are you saying the Intel chip work around also needs
>fixing ?
>

I think what I'm saying is pretty clear and I don't think it is related
to whatever workarounds where done earlier.

>> The driver should never get loaded by default or automatically.  If
the
>> user knows enough about there BIOS to trust that the SMI behavior
will
>> coexist with the driver then its OK to load otherwise using this
driver
>> is not a safe thing to do.
>
>So Intel and/or the BIOS vendors also forgot to put in any kind of
>indicator ? How do they expect end users to know this, or OS vendors ?
>Is there a technote that covers this mess ?
>

I don't know of any technote.  It took me working with Soo Keong for a
few weeks to chase this issue down to the level I have.  The short
answer is that the BIOS assumes the payload OS would not be fighting it
for hidden device access and the EDAC driver violates this assumption.  

>> I think the best thing to do is to have the driver error out in its
init
>> or probe code if the dev0:fun1 is hidden at boot time.
>>
>> Comments?
>
>Why did Intel bother implementing this functionality and then screwing
>it up so that OS vendors can't use it ? It seems so bogus.
>

It was just a screw up not to have identified this issue sooner.  

>At the very least we should print a warning advising the user that the
>BIOS is incompatible and to ask the BIOS vendor for an update so that
>they can enable error detection and management support.

I would place the warning in the probe or init code.

Attached is a test patch I'm testing now.  I don't like it, but it seems
to be working so far.  It basically fails the probe call leaving the
driver loaded.  I'm going to move the test to e752x_int so the driver
fails at init this am at restart my tests.


>
>Is only the AMI BIOS this braindamaged, should we just blacklist AMI
>bioses in EDAC or is this shared Intel supplied code that may be found
>in other vendors systems.
>

Unknown.  Also the BOIS teams for various platforms can modify the base
AMI functionality.  I know that at least one Intel e7520 based system
with AMI based bios seems to not expose this issue.  The point is that
without working out a handshake between the OS and the platform / BIOS
for this type of thing, loading EDAC without a patch like mine is
equivalent to playing Russian roulette.  You can't know which platform /
bios will blow up on you if you load the thing.

--mgross

Attachment: edac_2_6_16.patch
Description: edac_2_6_16.patch


[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux