Re: Lockless page cache test results

On Wed, 26 Apr 2006, Andrew Morton wrote:

> Jens Axboe <[email protected]> wrote:
> > 
> > Once per page, it's basically exercising the generic_file_splice_read()
> > path. Basically X number of "clients" open the same file, and fill those
> > pages into a pipe using splice. The output end of the pipe is then
> > spliced to /dev/null to toss it away again.
> 
> OK.  That doesn't sound like something which a real application is likely
> to do ;)

True, but on the other hand, it does kind of "distill" one (small) part of 
something that real apps _are_ likely to do.

The whole 'splice to /dev/null' part can be seen as totally irrelevant, 
but at the same time a way to ignore all the other parts of normal page 
cache usage (ie the other parts of page cache usage tend to be the "map it 
into user space" or the actual "memcpy_to/from_user()" or the "TCP send" 
part).

The question, of course, is whether the part that remains (the actual page 
lookup) is important enough to matter, once it is part of a bigger chain 
in a real application.

In other words, the splice() thing is just a way to isolate one part of a 
chain that is usually much more involved, and micro-benchmark just that 
one part.

Splice itself can be optimized to do the lookup locking only once per N 
pages (where N currently is on the order of ~16), but that may not be as 
easy for some other paths (ie the normal read path).

And the "reading from the same file in multiple threads" _is_ a real load. 
It may sound stupid, but it would happen for any server that has a lot of 
locality across clients (and that's very much true for web-servers, for 
example).

That said, under most real loads, the page cach elookup is obviously 
always going to be just a tiny tiny part (as shown by the fact that Jens 
quotes 35 GB/s throughput - possible only because splice to /dev/null 
doesn't need to actually ever even _touch_ the data).

The fact that it drops to "just" 3GB/s for four clients is somewhat 
interesting, though, since that does put a limit on how well we can serve 
the same file (of course, 3GB/s is still a lot faster than any modern 
network will ever be able to push things around, but it's getting closer 
to the possibilities for real hardware (ie IB over PCI-X should be able to 
do about 1GB/s in "real life")

So the fact that basically just lookup/locking overhead can limit things 
to 3GB/s is absolutely not totally uninteresting. Even if in practice 
there are other limits that would probably hit us much earlier.

It would be interesting to see where doing gang-lookup moves the target, 
but on the other hand, with smaller files (and small files are still 
common), gang lookup isn't going to help as much.

Of course, with small files, the actual filename lookup is likely to be 
the real limiter.

			Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: Lockless page cache test results
  - From: Jens Axboe <[email protected]>
- Re: Lockless page cache test results
  - From: Nick Piggin <[email protected]>
- Re: Lockless page cache test results
  - From: Jens Axboe <[email protected]>

References:
- Lockless page cache test results
  - From: Jens Axboe <[email protected]>
- Re: Lockless page cache test results
  - From: Andrew Morton <[email protected]>
- Re: Lockless page cache test results
  - From: Jens Axboe <[email protected]>
- Re: Lockless page cache test results
  - From: Andrew Morton <[email protected]>

Prev by Date: Re: [RFC] PATCH 0/4 - Time virtualization
Next by Date: Re: Lockless page cache test results
Previous by thread: Re: Lockless page cache test results
Next by thread: Re: Lockless page cache test results
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]