Re: Lockless page cache test results

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Linus Torvalds wrote:

The question, of course, is whether the part that remains (the actual page lookup) is important enough to matter, once it is part of a bigger chain in a real application.

It can be. If one applies a rough estimate, reading only 512MB/s
from the same file from pagecache on each CPU simply cannot scale
past about 16 CPUs before it becomes tree_lock bound (depending
on what sort of lock transfer latency numbers one plugs into the
model).

And the "reading from the same file in multiple threads" _is_ a real load. It may sound stupid, but it would happen for any server that has a lot of locality across clients (and that's very much true for web-servers, for example).

I think something like postgresql, which uses a WAL via pagecache, things
that use shared memory (which is more important now that fork got lazier.


And it isn't even reading from the same file in multiple threads (although
that's obviously my poster child).

If one has F files in cache, and N CPUs each accessing a random file, the
number of files whos tree_lock a CPU last touched is F/N. The chance of a
CPU next hitting one of these is (F/N) / F = 1/N -> 0 pretty quickly.
include/ perhaps.

Now this won't be so bad as everyone piling up on a single cacheline, but
it will still hurt, especially on systems with fsb bus, broadcast snoop
(Opteron, POWER) etc.

Nor will it only be the read side which is significant. Consider a single
CPU reading a very large file, and each node's kswapd running on other CPUs
to reclaim pagecache (which needs to take the write lock).


That said, under most real loads, the page cach elookup is obviously always going to be just a tiny tiny part (as shown by the fact that Jens quotes 35 GB/s throughput - possible only because splice to /dev/null doesn't need to actually ever even _touch_ the data).

The fact that it drops to "just" 3GB/s for four clients is somewhat interesting, though, since that does put a limit on how well we can serve the same file (of course, 3GB/s is still a lot faster than any modern network will ever be able to push things around, but it's getting closer to the possibilities for real hardware (ie IB over PCI-X should be able to do about 1GB/s in "real life")

So the fact that basically just lookup/locking overhead can limit things to 3GB/s is absolutely not totally uninteresting. Even if in practice there are other limits that would probably hit us much earlier.

It would be interesting to see where doing gang-lookup moves the target, but on the other hand, with smaller files (and small files are still common), gang lookup isn't going to help as much.

Of course, with small files, the actual filename lookup is likely to be the real limiter.

Although that's lockless so it scales. find_get_page will overtake it
at some point.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com -
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux