Re: A quick fio test (was Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Feb 23, 2007 at 01:52:47PM +0100, Jens Axboe wrote:
> On Wed, Feb 21 2007, Ingo Molnar wrote:
> > this is the v3 release of the syslet/threadlet subsystem:
> > 
> >    http://redhat.com/~mingo/syslet-patches/
> 
> [snip]
> 
> Ingo, some testing of the experimental syslet queueing stuff, in the
> syslet-testing branch of fio.
> 
> Fio job file:
> 
> [global]
> bs=8k
> size=1g
> direct=0
> ioengine=syslet-rw
> iodepth=32
> rw=read
> 
> [file]
> filename=/ramfs/testfile
> 
> Only changes between runs was changing ioengine and iodepth as indicated
> in the table below.
> 
> Results:
> 
> Engine          Depth           Bw (MiB/sec)
> --------------------------------------------
> libaio            1             441
> syslet            1             574
> sync              1             589
> libaio           32             613
> syslet           32             681
> 
> Results are stable to within +/- 1MiB/sec. So you can see that syslet
> are still a bit slower than sync for depth 1, but beats the pants off
> libaio for equal depths. Note that this is buffered IO, I'll be out for
> the weekend, but I'll hack some direct IO testing up next week to
> compare "real" queuing.
> 
> Just a quick microbenchmark to gauge current overhead...

This is just ramfs, to gauge pure overheads, is that correct ? 

BTW, I'm not surprised at Ingo's initial results of syslet vs libaio
overheads, for aio-stress/fio type streaming io runs, because these cases
do not involve large numbers of outstanding ios. So the overhead of
thread creation with syslets is amortized across the entire run of io
submissions because of the reuse of already created async threads. While
in the libaio case there is a setup and teardown of kiocbs per request.

What I have been concerned about instead in the past when considering
thread based AIO implementations is the resource(memory) consumption impact
on overall system performance and adaptability to varying loads. It is nice
that we can avoid that for the cached cases, but for the general blocking
cases, it is still not clear to me whether we have addressed this well
enough yet. I used to think that even the kiocb was too heavyweight for its
purpose ... especially in terms of scaling to larger loads.

As a really crude (and not very realistic) example of the potential impact
of large numbers of outstanding IOs, I tried some quick direct IO comparisons
using fio:

[global]
ioengine=syslet-rw
buffered=0
rw=randread
bs=64k
size=1024m
iodepth=64

Engine		Depth		Bw (MiB/sec)

libaio		64		17.323
syslet		64		17.524
libaio		20000		15.226
syslet		20000		11.015


Regards
Suparna

-- 
Suparna Bhattacharya ([email protected])
Linux Technology Center
IBM Software Lab, India

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux