Re: cfq performance gap

On Fri, Dec 08 2006, Avantika Mathur wrote:
> On Fri, 2006-12-08 at 13:05 +0100, Jens Axboe wrote:
> > On Thu, Dec 07 2006, Avantika Mathur wrote:
> > > Hi Jens,
> > 
> > (you probably noticed now, but the [email protected] email is no longer
> > valid)
> 
> I saw that, thanks!
> > > I've noticed a performance gap between the cfq scheduler and other io
> > > schedulers when running the rawio benchmark.
> > > Results from rawio on 2.6.19, cfq and noop schedulers:
> > >
> > > CFQ:
> > >
> > > procs           device    num read   KB/sec     I/O Ops/sec
> > > -----  ---------------  ----------  -------  --------------
> > >   16         /dev/sda       16412     8338            2084
> > > -----  ---------------  ----------  -------  --------------
> > >   16                        16412     8338            2084
> > >
> > > Total run time 0.492072 seconds
> > >
> > >
> > > NOOP:
> > >
> > > procs           device    num read   KB/sec     I/O Ops/sec
> > > -----  ---------------  ----------  -------  --------------
> > >   16         /dev/sda       16399    29224            7306
> > > -----  ---------------  ----------  -------  --------------
> > >   16                        16399    29224            7306
> > >
> > > Total run time 0.140284 seconds
> > >
> > > The benchmark workload is 16 processes running 4k random reads.
> > >
> > > Is this performance gap a known issue?
> > 
> > CFQ could be a little slower at this benchmark, but your results are
> > much worse than I would expect. What is the queueing depth of sda? How
> > are you invoking rawio?
> 
> I am running rawio with the following options:
> rawread -p 16 -m 1 -d 1 -x -z -t 0 -s 4096
>  
> The queue depth on sda is 4.
> 
> > 
> > Your runtime is very low, how does it look if you allow the test to run
> > for much longer? 30MiB/sec random read bandwidth seems very high, I'm
> > wondering what exactly is being tested here.
> > 
> 
> rawio is actually performing sequential reads, but I don't believe it is
> purely sequential with the multiple processes.
> I am currently running the test with longer runtimes and will post
> results once it is complete. 
> I've also attached the rawio source.

It's certainly the slice and idling hurting here. But at the same time,
I don't really think your test case is very interesting. The test area
is very small and you have 16 threads trying to read the same thing,
optimizing for that would be silly as I don't think it has much real
world relevance.

That said, I might add some logic to detect when we can cheaply switch
queues instead of waiting for a new request from the same queue.
Averaging slice times over a period of time instead of 1:1 with that
logic, should help cases like this while still being fair.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: cfq performance gap
  - From: "AVANTIKA R. MATHUR" <[email protected]>

References:
- cfq performance gap
  - From: Avantika Mathur <[email protected]>
- Re: cfq performance gap
  - From: Jens Axboe <[email protected]>
- Re: cfq performance gap
  - From: Avantika Mathur <[email protected]>

Prev by Date: [PATCH] libata: don't initialize sg in ata_exec_internal() if DMA_NONE
Next by Date: Re: [PATCH] ata_piix: use piix_host_stop() in ich_pata_ops
Previous by thread: Re: cfq performance gap
Next by thread: Re: cfq performance gap
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]