Re: Lockless page cache test results

On Wed, Apr 26 2006, Linus Torvalds wrote:
> 
> 
> On Wed, 26 Apr 2006, Andrew Morton wrote:
> 
> > Jens Axboe <[email protected]> wrote:
> > > 
> > > Once per page, it's basically exercising the generic_file_splice_read()
> > > path. Basically X number of "clients" open the same file, and fill those
> > > pages into a pipe using splice. The output end of the pipe is then
> > > spliced to /dev/null to toss it away again.
> > 
> > OK.  That doesn't sound like something which a real application is likely
> > to do ;)
> 
> True, but on the other hand, it does kind of "distill" one (small) part of 
> something that real apps _are_ likely to do.
> 
> The whole 'splice to /dev/null' part can be seen as totally irrelevant, 
> but at the same time a way to ignore all the other parts of normal page 
> cache usage (ie the other parts of page cache usage tend to be the "map it 
> into user space" or the actual "memcpy_to/from_user()" or the "TCP send" 
> part).
> 
> The question, of course, is whether the part that remains (the actual page 
> lookup) is important enough to matter, once it is part of a bigger chain 
> in a real application.
> 
> In other words, the splice() thing is just a way to isolate one part of a 
> chain that is usually much more involved, and micro-benchmark just that 
> one part.

Nick called it a find_get_page() micro benchmark, which is pretty might
spot on. So naturally it shows the absolute best side of the lockless
page cache, but that is also very interesting. The /dev/null output can
just be seen as a "infinitely" fast output method, both from a
throughput and light weight POV.

> It would be interesting to see where doing gang-lookup moves the target, 
> but on the other hand, with smaller files (and small files are still 
> common), gang lookup isn't going to help as much.

With a 16-page gang lookup in splice, the top profile for the 4-client
case (which is now at 4GiB/sec instead of 3) are:

samples  %        symbol name
30396    36.7217  __do_page_cache_readahead
25843    31.2212  find_get_pages_contig
9699     11.7174  default_idle

Even disregarding that readahead contender that could probably be made a
little more clever, we are still spending an awful lot of time in the
page lookup. I didn't mention this before, but the get_page/put_page
overhead is also a lot smaller with the lockless patches.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: Lockless page cache test results
  - From: Andrew Morton <[email protected]>

References:
- Lockless page cache test results
  - From: Jens Axboe <[email protected]>
- Re: Lockless page cache test results
  - From: Andrew Morton <[email protected]>
- Re: Lockless page cache test results
  - From: Jens Axboe <[email protected]>
- Re: Lockless page cache test results
  - From: Andrew Morton <[email protected]>
- Re: Lockless page cache test results
  - From: Linus Torvalds <[email protected]>

Prev by Date: Re: [PATCH 001/001] INPUT: new ioctl's to retrieve values of EV_REP and EV_SND event codes
Next by Date: drbd-0.7.18 doesn't compile under 2.617-rc2 because SLAB_NO_REAP is missing
Previous by thread: Re: Lockless page cache test results
Next by thread: Re: Lockless page cache test results
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]