Re: FYI: RAID5 unusably unstable through 2.6.14

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Cynbe ru Taren wrote:
> The current Linux kernel RAID5 implementation is just
> too fragile to be used for most of the applications
> where it would be most useful.

I'm not sure I agree.

> What happens repeatedly, at least in my experience over
> a variety of boxes running a variety of 2.4 and 2.6
> Linux kernel releases, is that any transient I/O problem
> results in a critical mass of RAID5 drives being marked
> 'failed', at which point there is no longer any supported

What "transient" I/O problem would this be. I've had loads of issues with
flaky motherboard/PCI bus implementations that make RAID using addin cards
(all 5 slots filled with other devices) a nightmare. The built in controllers
seem to be more reliable.

> way of retrieving the data on the RAID5 device, even
> though the underlying drives are all fine, and the underlying
> data on those drives almost certainly intact.

This is no problem, just use something like

	mdadm --assemble --force /dev/md5 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
/dev/sde1

(Then of course do a fsck)

You can even do this with (nr.drives-1), then add in the last one to be
sync'ed up in the background.

> This has just happened to me for at least the sixth time,
> this time in a brand new RAID5 consisting of 8 200G hotswap
> SATA drives backing up the contents of about a dozen onsite
> and offsite boxes via dirvish, which took me the better part
> of December to get initialized and running, and now two weeks
> later I'm back to square one.

:-( .. maybe try the force assemble?

> I'm currently digging through the md kernel source code
> trying to work out some ad-hoc recovery method, but this
> level of flakiness just isn't acceptable on systems where
> reliable mass storage is a must -- and when else would
> one bother with RAID5?

It isn't flaky for me now I'm using a better quality motherboard, in fact it's
saved me through 3 near simultaneous failures of WD 250GB drives.

> We need RAID5 to be equally resilient in the face of
> real-world problems, people -- it isn't enough to
> just be able to function under ideal lab conditions!

I think it is. The automatics are paranoid (as they should be) when failures
are noticed. The array can be assembled manually though.

> A design bug is -still- a bug, and -still- needs to
> get fixed.

It's not a design bug - in my opinion.

> Something HAS to be done to make the RAID5 logic
> MUCH more conservative about destroying RAID5
> systems in response to a transient burst of I/O
> errors, before it can in good conscience be declared

If such things are common you should investigate the hardware.

> ready for production use -- or at MINIMUM to provide
> a SUPPORTED way of restoring a butchered RAID5 to
> last-known-good configuration or such once transient
> hardware issues have been resolved.

It is. See above.

> In the meantime, IMHO Linux RAID5 should be prominently flagged
> EXPERIMENTAL -- NONCRITICAL USE ONLY or some such, to avoid
> building up ill-will and undeserved distrust of Linux
> software quality generally.

I'd calm down if I were you.

Cheers
David

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux