Eric St-Laurent wrote:
On Mon, 2007-23-07 at 19:00 +1000, Nick Piggin wrote:
I don't like this kind of conditional information going from something
like readahead into page reclaim. Unless it is for readahead _specific_
data such as "I got these all wrong, so you can reclaim them" (which
this isn't).
But I don't like it as a use-once thing. The VM should be able to get
that right.
Question: How work the use-once code in the current kernel? Is there
any? I doesn't quite work for me...
What *I* think is supposed to happen is that newly read in pages get
put on the inactive list, and unless they get accessed againbefore
being reclaimed, they are allowed to fall off the end of the list
without disturbing active data too much.
I think there is a missing piece here, that we used to ease the reclaim
pressure off the active list when the inactive list grows relatively
much larger than it (which could indicate a lot of use-once pages in
the system).
Andrew got rid of that logic for some reason which I don't know, but I
can't see that use-once would be terribly effective today (so your
results don't surprise me too much).
I think I've been banned from touching vmscan.c, but if you're keen to
try a patch, I might be convinced to come out of retirement :)
See my previous email today, I've done a small test case to demonstrate
the problem and the effectiveness of Peter's patch. The only piece
missing is the copy case (read once + write once).
Regardless of how it's implemented, I think a similar mechanism must be
added. This is a long standing issue.
In the end, I think it's a pagecache resources allocation problem. the
VM lacks fair-share limits between processes. The kernel doesn't have
enough information to make the right decisions.
You can refine or use more advanced page reclaim, but some fair-share
splitting (like the CPU scheduler) between the processes must be
present. Of course some process should have large or unlimited VM
limits, like databases.
Maybe the "containers" patchset and memory controller can help. With
some specific configuration and/or a userspace daemon to adjust the
limits on the fly.
Independently, the basic large file streaming read (or copy) once cases
should not trash the pagecache. Can we agree on that?
One man's trash is another's treasure: some people will want the
files to remain in cache because they'll use them again (copy it
somewhere else, or start editing it after being copied or whatever).
But yeah, we can probably do better at the sequential read/write
case.
I say, let's add some code to fix the problem. If we hear about any
regression in some workloads, we can add a tunable to limit or disable
its effects, _if_ a better compromised solution cannot be found.
Sure, but let's figure out the workloads and look at all the
alternatives first.
--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]