Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

Neil Brown wrote:

 md/dm modules could keep count of requests as has been suggested
 (though that would be a fairly big change for raid0 as it currently
 doesn't know when a request completes - bi_endio goes directly to the

filesystem).

Are you sure? I believe that dm handles bi_endio because it waits forall in progress bio to complete before switching tables.

2/ Maybe barriers provide stronger semantics than are required.

 All write requests are synchronised around a barrier write.  This is
 often more than is required and apparently can cause a measurable
 slowdown.

I'm not quite sure I understand this correctly, but the purpose of abarrier request is to prevent the elevator from reordering requestsaround a barrier. Previous requests must be completed before thebarrier, and latter requests must be executed after. That is asufficiently strong guarantee for careful write or journal filesystemsto ensure that a log block hits the disk before the actual transactionblocks, and then the log block is marked as complete only after theactual transaction. This is a weaker guarantee than a flush, and allowsfor some reordering to improve performance.

 Also the FUA for the actual commit write might not be needed.  It is
 important for consistency that the preceding writes are in safe
 storage before the commit write, but it is not so important that the
 commit write is immediately safe on storage.  That isn't needed until
 a 'sync' or 'fsync' or similar.

Right, the barrier doesn't need to be flushed right away, so theelevator could complete writes after the barrier if it wishes, thencomplete the ones before, and finally the barrier itself. Not settingthe FUA bit allows the disk to cache the barrier write so it can becompleted sooner, but before the queue sends any more requests to thedisk, it must be flushed to ensure that the barrier has hit the mediabefore the new requests.

 One possible alternative is:
   - writes can overtake barriers, but barrier cannot overtake writes.
   - flush before the barrier, not after.

 This is considerably weaker, and hence cheaper. But I think it is
 enough for all filesystems (providing it is still an option to call
 blkdev_issue_flush on 'fsync').

Again I am not sure I quite understand what you mean here, but onlywrites issued after the barrier can complete before the barrier. Thoseissued before the barrier can not overtake it in the queue.

 Another alternative would be to tag each bio was being in a
 particular barrier-group.  Then bio's in different groups could
 overtake each other in either direction, but a BARRIER request must
 be totally ordered w.r.t. other requests in the barrier group.
 This would require an extra bio field, and would give the filesystem
 more appearance of control.  I'm not yet sure how much it would
 really help...
 It would allow us to set FUA on all bios with a non-zero
 barrier-group.  That would mean we don't have to flush the entire
 cache, just those blocks that are critical.... but I'm still not sure
 it's a good idea.


This all seems unnecessary work.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.
  - From: Neil Brown <[email protected]>

References:
- [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.
  - From: Neil Brown <[email protected]>
- Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.
  - From: Neil Brown <[email protected]>

Prev by Date: Re: [LAU] 2.6.21-rt7 Oopses, More
Next by Date: Re: [ck] 2.6.22-rc3-ck1
Previous by thread: Re: [dm-devel] Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.
Next by thread: Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]