Phillip Susi wrote:
Your understanding of statistics leaves something to be desired. As you
add disks the probability of a single failure is grows linearly, but the
probability of double failure grows much more slowly. For example:
If 1 disk has a 1/1000 chance of failure, then
2 disks have a (1/1000)^2 chance of double failure, and
3 disks have a (1/1000)^2 * 3 chance of double failure
4 disks have a (1/1000)^2 * 7 chance of double failure
After the first drive fails you have no redundancy, the chance of an
additional failure is linear to the number of remaining drives.
Assume:
p - probability of a drive failing in unit time
n - number of drives
F - probability of double failure
The chance of a single drive failure is n*p. After that you have a new
"independent trial" for the failure any one of n-1 drives, so the chance
of a double drive failure is actually:
F = (n*p) * (n-1)*p
But wait, there's more:
p - chance of a drive failing in unit time
n - number of drives
R - the time to rebuild to a hot spare in the same units as p
F - probability of double failure
So:
F = n*p * (n-1)*(R * p)
If you rebuild a track at a time, each track takes the time to read the
slowest drive plus the time to write the spare. If the array remains in
use load increases those times.
And the ugly part is that p is changing all the time, there's infant
mortality on new drives, fairly constant electronic probability and
increasing probability of mechanical failure over time. If all of your
drives are the same age they are less reliable than mixed age drives.
Thus the probability of double failure on this 4 drive array is ~142
times less than the odds of a single drive failing. As the probably of
a single drive failing becomes more remote, then the ratio of that
probability to the probability of double fault in the array grows
exponentially.
( I think I did that right in my head... will check on a real calculator
later )
This is why raid-5 was created: because the array has a much lower
probabiliy of double failure, and thus, data loss, than a single drive.
Then of course, if you are really paranoid, you can go with raid-6 ;)
If you're paranoid you mirror over two RAID-5 arrays. The mirrors are on
independent controllers. RAID-10.
Michael Loftis wrote:
Absolutely not. The more spindles the more chances of a double
failure. Simple statistics will mean that unless you have mirrors the
more drives you add the more chance of two of them (really) failing at
once and choking the whole system.
That said, there very well could be (are?) cases where md needs to do
a better job of handling the world unravelling.
-
A small graph of the effect of the rebuild time on RAID-5 attached, it
assumes probability of failure = 1/1000 per the original post, for
various rebuild times the probability of failure drops.
--
-bill davidsen ([email protected])
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]