Gordon Messmer wrote:
Philip A. Prindeville wrote:
If you're *not* a database weenie, and you're doing usual manly things
with your filesystem (like lots of compiles, for instance), you're
typically not going to be modifying files in place at all.
That's not quite it. RAID 5 performance suffers because every write
requires that the entire block that's being written be read from every
drive in the array, parity calculated, and then the data and parity
written out. For each block written, the array has to do N reads plus
two writes.
No. Even in the worst case it would read N-2 blocks (you are writing a new data
block and calculating new parity), and two writes. But normally, writing
sequential data, you can wait until you have enough data for an entire stripe at
once, read nothing, and write once to each drive. You should be able to do this
in parallel, but unlike RAID0 I've never measured it happening. Tuning the
"stripe_cache_size" and creating a filesystem using the "stride=" option will help.
It doesn't matter whether you're writing new files or modifying existing
files, because all of this happens at the block level. It's especially
bad on journalled filesystems, where writing to a file will update the
files blocks, plus the filesystem's journal's blocks, and finally the
filesystem's blocks.
No again. You read the parity block and the old data block, XOR first the old
then the new data with the parity block, and write the new data and parity.
So is it just the database-heads that are maligning RAID5, or are
there other performance issues I don't know about?
Most of your comments don't reflect the way RAID 5 actually functions in
any way.
Because my empirical experience has always been that when writing
large files, RAID5 performs on par with RAID0.
The system on which you were testing was probably limited by other
factors, if that was the case. A RAID 0 disk array will be much faster
than a RAID 5 array.
RAID 5 tends to be most appropriate when you're trying to get as much
disk space as you can with the lowest cost, you won't be running
multiple simultaneous jobs on the same disk array, and when you'll be
collecting data at a rate that's relatively low. Usually, that's
backups. Your network is probably slower than your disk array (unless
the array is very large -- array speed decreases with array size), so
streaming data in over the network to your disk array won't bog it down.
Virtually any interactive workload will benefit from a better disk
configuration.
--
Bill Davidsen <davidsen@xxxxxxx>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot
--
fedora-list mailing list
fedora-list@xxxxxxxxxx
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines