On Sun, 12 Aug 2007 18:51:31 +0200, Folkert van Heusden said: > a question and an idea: Q: is ecc guaranteed to detect all bitflips? It depends on the exact ECC function the hardware implements. Usually it provides performance such as: "Correct all 1-bit errors. Detect all 2-bit errors, and most 3 and higher, but not correct". (Of course, "correct all 1 or 2 bit and detect all 3 bit" can be done, it just takes more bits of ECC.) > Idea: what about a multicore system (3 or more) that runs the same > processes on 2 cores and a third core verifying that they both do the > same? As I think it is not only ram that can become faulty. This is actually done for high-reliability systems (Google for "tell me twice" and "tell me three times"). The problem is that it takes a lot of extra hardware. The G5 and later IBM Z-series mainframe chipsets (not to be confused with the PowerPC G5) implemented dual computation units and a comparator that signals a 'Machine Check' condition if the two CPUs don't end up in the same exact state (as an added bonus, at the end of each instruction that both *do* compare good, it latches the *entire* state of the CPU out, and then does the following: 1) Retry the instruction on the same CPU - if it compares correctly, keep going and flag a "soft" error. 2) If it still fails, read out the last "known good" status latch, and load it into a spare CPU, and fire it up, and flag the failing one as bad. http://www.research.ibm.com/journal/rd/435/spainhower.pdf http://www.research.ibm.com/journal/rd/435/mueller.pdf These guys have forgotten more about designing highly reliable systems than most of us will ever know. ;) Needless to say, not everybody is willing to pay the costs of the hardware overhead of this approach.
Attachment:
pgpbUAwFdPhdD.pgp
Description: PGP signature
- References:
- Software based ECC ?
- From: "roland" <[email protected]>
- Re: Software based ECC ?
- From: [email protected]
- Re: Software based ECC ?
- From: Folkert van Heusden <[email protected]>
- Software based ECC ?
- Prev by Date: Re: [PATCH] gfs2: better code for translating characters
- Next by Date: RE: Improving read/write/close system call reliability when used with pthreads
- Previous by thread: Re: Software based ECC ?
- Next by thread: Re: Software based ECC ?
- Index(es):