On Jan 17, 2006, at 14:35, Cynbe ru Taren wrote:
What happens repeatedly, at least in my experience over a variety
of boxes running a variety of 2.4 and 2.6 Linux kernel releases, is
that any transient I/O problem results in a critical mass of RAID5
drives being marked 'failed', at which point there is no longer any
supported way of retrieving the data on the RAID5 device, even
though the underlying drives are all fine, and the underlying data
on those drives almost certainly intact.
Insufficient detail. Please provide a full bug report detailing the
problem, then we can help you.
I've NEVER had a RAID1 throw a temper trantrum and go into
apoptosis mode the way RAID5s do given the slightest opportunity.
I've never had either RAID1 _or_ RAID5 throw temper tantrums on me,
_including_ during drive failures. In fact, I've dealt easily with
Linux RAID multi-drive failures that threw all our shiny 3ware RAID
hardware into fits it took me an hour to work out.
Something HAS to be done to make the RAID5 logic MUCH more
conservative about destroying RAID5
systems in response to a transient burst of I/O errors, before it
can in good conscience be declared ready for production use -- or
at MINIMUM to provide a SUPPORTED way of restoring a butchered
RAID5 to last-known-good configuration or such once transient
hardware issues have been resolved.
The problem is that such errors are _rarely_ transient, or indicate
deeper media problems. Have you then verified your disks using
smartctl? There already _is_ such a way to restore said "butchered"
RAID5: "mdadm --assemble --force" In any case, I suspect your RAID-
on-SATA problems are more due to the primitive nature of the SATA
error handling; much of the code does not do more than a basic bus
reset before failing the whole I/O.
In the meantime, IMHO Linux RAID5 should be prominently flagged
EXPERIMENTAL -- NONCRITICAL USE ONLY or some such, to avoid
building up ill-will and undeserved distrust of Linux software
quality generally.
It works great for me, and for a lot of other people too, including
production servers. In fact, I've had fewer issues with Linux RAID5
than with a lot of hardware RAIDs, especially when the HW raid
controller died and the company was no longer in business :-\. If
you can provide actual bug reports, we'd be happy to take a look at
your problems, but as it is, we can't help you.
Cheers,
Kyle Moffett
--
There is no way to make Linux robust with unreliable memory
subsystems, sorry. It would be like trying to make a human more
robust with an unreliable O2 supply. Memory just has to work.
-- Andi Kleen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]