[PATCH linux-2.6-block:master 00/10] blk: reimplementation of I/O barrier

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



 Hello, Jens, James, Jeff and Bartlomiej.

 This is the third posting of blk ordered reimplementation.  Changes
since the scond posting are...

 * Dispatch queue patchset is reordered in front of this patchset.
 * Draining bug fix
 * Proper requeueing of requests in an ordered sequence for TAG
   ordered queues.
 * Fallback mechanism stripped out.  This also makes -EOPNOTSUPP
   changes in SCSI and IDE unnecessary.
 * TAG ordering request issue fix
 * SCSI/libata/IDE patches splitted as requested
 * Hopefully, changes to libata and IDE are more acceptable
 * Documentation/block/barrier.txt added

 This patchset is part of the patch series described in the following mail.
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112238377602033&w=2

 As all previous patches don't really concern subsystems other than
block layer.  They are only sent to Jens and LKML.  If they're needed,
preceding patches are...

fix-elevator_find:
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112238419731035&w=2

fix-cfq_find_next_crq:
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112238456712203&w=2

generic-dispatch-queue:
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112238633622498&w=2
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112238647101632&w=2
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112238693809889&w=2
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112238675926411&w=2
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112238660407245&w=2
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112238708918364&w=2

reimplement-elevator-switch:
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112238771305863&w=2

 For general description please read barrier.txt and the previous posting.
	http://marc.theaimsgroup.com/?l=linux-kernel&m=111795127124020&w=2

 ==== Some excerpts from barrier.txt. ====

* SCSI layer currently can't use TAG ordering even if the drive,
  controller and driver support it.  The problem is that SCSI midlayer
  request dispatch function is not atomic.  It releases queue lock and
  switch to SCSI host lock during issue and it's possible and likely
  to happen in time that requests change their relative positions.
  Once this problem is solved, TAG ordering can be enabled.

* Error handling.  Currently, block layer will report error to upper
  layer if any of requests in an ordered sequence fails.
  Unfortunately, this doesn't seem to be enough.  Look at the
  following request flow.  QUEUE_ORDERED_TAG_FLUSH is in use.

 [0] [1] [2] [3] [pre] [barrier] [post] < [4] [5] [6] ... >
                                          still in elevator

  Let's say request [2], [3] are write requests to update file system
  metadata (journal or whatever) and [barrier] is used to mark that
  those updates are valid.  Consider the following sequence.

 i.     Requests [0] ~ [post] leaves the request queue and enters
        low-level driver.
 ii.    After a while, unfortunately, something goes wrong and the
        drive fails [2].  Note that any of [0], [1] and [3] could have
        completed by this time, but [pre] couldn't have been finished
        as the drive must process it in order and it failed before
        processing that command.
 iii.   Error handling kicks in and determines that the error is
        unrecoverable and fails [2], and resumes operation.
 iv.    [pre] [barrier] [post] gets processed.
 v.     *BOOM* power fails

  The problem here is that the barrier request is *supposed* to
  indicate that filesystem update requests [2] and [3] made it safely
  to the physical medium and, if the machine crashes after the barrier
  is written, filesystem recovery code can depend on that.  Sadly,
  that isn't true in this case anymore.  IOW, the success of a I/O
  barrier should also be dependent on success of some of the preceding
  requests, where only upper layer (filesystem) knows what 'some' is.

  This can be solved by implementing a way to tell the block layer
  which requests affect the success of the following barrier request
  and making lower lever drivers to resume operation on error only
  after block layer tells it to do so.

  As the probability of this happening is very low and the drive
  should be faulty, implementing the fix is probably an overkill.
  But, still, it's there.

[ Start of patch descriptions ]

01_blk_add-uptodate-to-end_that_request_last.patch
	: add @uptodate to end_that_request_last() and @error to rq_end_io_fn()

	Add @uptodate argument to end_that_request_last() and @error
        to rq_end_io_fn().  There's no generic way to pass error code
        to request completion function, making generic error handling
        of non-fs request difficult (rq->errors is driver-specific and
        each driver uses it differently).  This patch adds @uptodate
        to end_that_request_last() and @error to rq_end_io_fn().

        For fs requests, this doesn't really matter, so just using the
        same uptodate argument used in the last call to
        end_that_request_first() should suffice.  IMHO, this can also
        help the generic command-carrying request Jens is working on.

02_blk_implement-init_request_from_bio.patch
	: separate out bio init part from __make_request

	Separate out bio initialization part from __make_request.  It
        will be used by the following blk_ordered_reimpl.

03_blk_reimplement-ordered.patch
	: reimplement handling of barrier request

	 Reimplement handling of barrier requests.

        * Flexible handling to deal with various capabilities of
          target devices.
	* Retry support for falling back.
	* Tagged queues which don't support ordered tag can do ordered.

04_blk_scsi-update-ordered.patch
	: update SCSI to use new blk_ordered

	All ordered request related stuff delegated to HLD.  Midlayer
        now doens't deal with ordered setting or prepare_flush
        callback.  sd.c updated to deal with blk_queue_ordered
        setting.  Currently, ordered tag isn't used as SCSI midlayer
        cannot guarantee request ordering.

05_blk_scsi-add-fua-support.patch
	: add FUA support to SCSI disk

	Add FUA support to SCSI disk.

06_blk_libata-update-ordered.patch
	: update libata to use new blk_ordered

	Reflect changes in SCSI midlayer and updated to use new
        ordered request implementation

07_blk_libata-add-fua-support.patch
	: add FUA support to libata

	Add FUA support to libata.

08_blk_ide-update-ordered.patch
	: update IDE to use new blk_ordered

	Update IDE to use new blk_ordered.

09_blk_ide-add-fua-support.patch
	: add FUA support to IDE

	Add FUA support to IDE

10_blk_add-barrier-doc.patch
	: I/O barrier documentation

	I/O barrier documentation

[ End of patch descriptions ]

 Thanks.

--
tejun

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]
  Powered by Linux