Re: [patch] SMP alternatives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Iau, 2005-11-24 at 15:22 +0100, Andi Kleen wrote:
> What do you need a special driver for if the northbridge just
> can do the scrubbing by itself?

You need a driver to collect and report all the ECC single bit errors to
the user so that they can decide if they have problem hardware.

EDAC is more than one thing
	- Control response to a fatal error
	- Report non-fatal events for analysis/user decision making

Hardware scrubbing is good, but knowing the rate of non-fatal errors and
the trend in rate of errors is essential to planning and management of
systems.

> On the modern systems I'm familiar with it's an machine check (although
> not necessarily a recoverable one and there might be other bad
> side effects) 

The Intel I have looked at generates MCE if the L2/L1/bus parity errors
but not on a RAM ECC error as that is memory controller not CPU level.
That usually asserts NMI. Same for most older chips PIII/AMD Athlon etc

> > The -mm EDAC code works on the basic assumption that unrecovered ECC is
> > a system halter although that is configurable.
> 
> I don't know what you could do over the default code for K8 at least.
> And on modern Intel server chipsets I would expect it also to not
> be needed.

Varies a lot again. Hopefully that'll simplify as/when/if Intel put the
memory controller on CPU.

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux