Re: limits on raid — Linux Kernel

Bill Davidsen wrote:

David Greaves wrote:
[email protected] wrote:
On Fri, 22 Jun 2007, David Greaves wrote:
If you end up 'fiddling' in md because someone specified--assume-clean on a raid5 [in this case just to save a few minutes*testing time* on system with a heavily choked bus!] then that adds*even more* complexity and exception cases into all the stuff youdescribed.
A "few minutes?" Are you reading the times people are seeing withmulti-TB arrays? Let's see, 5TB at a rebuild rate of 20MB... three days.

Yes. But we are talking initial creation here.

And as soon as you believe that the array is actually "usable" you cutthat rebuild rate, perhaps in half, and get dog-slow performance fromthe array. It's usable in the sense that reads and writes work, but foruseful work it's pretty painful. You either fail to understand themagnitude of the problem or wish to trivialize it for some reason.

I do understand the problem and I'm not trying to trivialise it :)

I _suggested_ that it's worth thinking about things rather than jumping in tosay "oh, we can code up a clever algorithm that keeps track of what stripes havevalid parity and which don't and we can optimise the read/copy/write for validstripes and use the raid6 type read-all/write-all for invalid stripes and thenwe can write a bit extra on the check code to set the bitmaps......"

Phew - and that lets us run the array at semi-degraded performance (raid6-like)for 3 days rather than either waiting before we put it into production orrunning it very slowly.

Now we run this system for 3 years and we saved 3 days - hmmm IS IT WORTH IT?

What happens in those 3 years when we have a disk fail? The solution doesn'tapply then - it's 3 days to rebuild - like it or not.

By delaying parity computation until the first write to a stripe onlythe growth of a filesystem is slowed, and all data are protected withoutwaiting for the lengthly check. The rebuild speed can be set very low,because on-demand rebuild will do most of the work.

I am not saying you are wrong.
I ask merely if the balance of benefit outweighs the balance of complexity.

If the benefit were 24x7 then sure - eg using hardware assist in the raid calcs- very useful indeed.

I'm very much for the fs layer reading the lower block structure so Idon't have to fiddle with arcane tuning parameters - yes, *please*help make xfs self-tuning!
Keeping life as straightforward as possible low down makes the upwardsinterface more manageable and that goal more realistic...
Those two paragraphs are mutually exclusive. The fs can be simplebecause it rests on a simple device, even if the "simple device" isprovided by LVM or md. And LVM and md can stay simple because they reston simple devices, even if they are provided by PATA, SATA, nbd, etc.Independent layers make each layer more robust. If you want tocompromise the layer separation, some approach like ZFS with fullintegration would seem to be promising. Note that layers allowspecialized features at each point, trading integration for flexibility.


That's a simplistic summary.

You *can* loosely couple the layers. But you can enrich the interface andtightly couple them too - XFS is capable (I guess) of understanding md morefully than say ext2.XFS would still work on a less 'talkative' block device where performance wasn'tas important (USB flash maybe, dunno).

My feeling is that full integration and independent layers each havebenefits, as you connect the layers to expose operational details youneed to handle changes in those details, which would seem to make layersmore complex.

Agreed.

What I'm looking for here is better performance in oneparticular layer, the md RAID5 layer. I like to avoid unnecessarycomplexity, but I feel that the current performance suggests room forimprovement.


I agree there is room for improvement.
I suggest that it may be more fruitful to write a tool called "raid5prepare"

that writes zeroes/ones as appropriate to all component devices and then you canuse --assume-clean without concern. That could look to see if the devices arescsi or whatever and take advantage of the hyperfast block writes that can be done.


David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

References:
- limits on raid
  - From: [email protected]
- Re: limits on raid
  - From: Neil Brown <[email protected]>
- Re: limits on raid
  - From: [email protected]
- Re: limits on raid
  - From: Neil Brown <[email protected]>
- Re: limits on raid
  - From: Bill Davidsen <[email protected]>
- Re: limits on raid
  - From: Bill Davidsen <[email protected]>
- Re: limits on raid
  - From: Neil Brown <[email protected]>
- Re: limits on raid
  - From: David Greaves <[email protected]>
- Re: limits on raid
  - From: [email protected]
- Re: limits on raid
  - From: David Greaves <[email protected]>
- Re: limits on raid
  - From: Bill Davidsen <[email protected]>

Prev by Date: Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3
Next by Date: Re: ACPI Regression on Dell E1501
Previous by thread: Re: limits on raid
Next by thread: Re: limits on raid
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]