Re: [rfc][patch 2.6.18-rc7] block: explicit plugging

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/16/06, Nick Piggin <[email protected]> wrote:
Hi,

I've been tinkering with this idea for a while, and I'd be interested
in seeing what people think about it. The patch isn't in a great state
of commenting or splitting ;) but I'd be interested feelings about the
general approach, and whether I'm going to hit any bad problems (eg.
with SCSI or IDE).

I *really* like this idea, and I would like to help get it merged.  I
can even run some benchmarks for you once I get my test rig running
again.


This is a patch to perform block device plugging explicitly in the submitting
process context rather than implicitly by the block device.

There are several advantages to plugging in process context over plugging
by the block device:

- Implicit plugging is only active when the queue empties, so any
  advantages are lost if there is parallel IO occuring. Not so with
  explicit plugging.

- Implicit plugging relies on a timer and watermarks and a kind-of-explicit
  directive in lock_page which directs plugging. These are heuristics and
  can cost performance due to holding a block device idle longer than it
  should be. Explicit plugging avoids most of these issues by only holding
  the device idle when it is known more requests will be submitted.

- This lock_page directive uses a roundabout way to attempt to minimise
  intrusiveness of plugging on the VM. In doing so, it gets needlessly
  complex: the VM really is in a good position to direct the block layer
  as to the nature of its requests, so there is no need to try to hide
  the fact.

I had a related idea that I have not been able to work on yet.  I
called it "kernel anticipation", and it explicitly instructs the
scheduler when a function is submitting I/O that subsequent I/O is
dependent on.  In other words, when we are composing a bio and get the
"BH_Boundary" flag in a buffer head, mark the bio for mandatory
anticipation, since we know we'll have a hit.  This would enable the
anticipation code to act in some cases even for processes with very
high thinktimes.

On a parallel tiobench benchmark, of the 800 000 calls to __make_request
performed, this patch avoids 490 000 (62%) of queue_lock aquisitions by
early merging on the private plugged list.

Have you run any FS performance benchmorks to check for regressions in
performance?  Who knows, you might even see be able to show a visible
increase :)

@@ -2865,68 +2762,48 @@ static int __make_request(request_queue_
         */
        blk_queue_bounce(q, &bio);

-       spin_lock_prefetch(q->queue_lock);
-
        barrier = bio_barrier(bio);
        if (unlikely(barrier) && (q->next_ordered == QUEUE_ORDERED_NONE)) {
                err = -EOPNOTSUPP;
                goto end_io;
        }

+       /* Attempt merging with the plugged list before taking locks */
+       ioc = current->io_context;
+       if (ioc && ioc->plugged && !list_empty(&ioc->plugged_list)) {
+               struct request *rq;
+               rq = list_entry_rq(ioc->plugged_list.prev);
+
+               el_ret = elv_try_merge(rq, bio);
+               if (el_ret == ELEVATOR_BACK_MERGE) {
+                       if (bio_attempt_back_merge(q, rq, nr_sectors, bio))
+                               goto out;
+               } else if (el_ret == ELEVATOR_FRONT_MERGE) {
+                       if (bio_attempt_front_merge(q, rq, nr_sectors, bio))
+                               goto out;
+               }
+       }
+
        spin_lock_irq(q->queue_lock);

-       if (unlikely(barrier) || elv_queue_empty(q))
+       if (elv_queue_empty(q))
                goto get_rq;

        el_ret = elv_merge(q, &req, bio);
-       switch (el_ret) {

Have you considered skipping the queue merge entirely, if there is not
a hit in the plugged_list?  I would be interested to see a "hit rate"
of how many queue merge attempts are successful here.  I bet it's
pretty low.  The difference froim just skipping these entirely might
not even be visible in a benchmark.  It'd be pretty cool to be able to
eliminate the queue merging interface altogether.

Thanks for doing this work, it looks really promising.

NATE
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux