Hugh Dickins wrote:
On Thu, 5 Jan 2006, Dave McCracken wrote:
Here's a new version of my shared page tables patch.
The primary purpose of sharing page tables is improved performance for
large applications that share big memory areas between multiple processes.
It eliminates the redundant page tables and significantly reduces the
number of minor page faults. Tests show significant performance
improvement for large database applications, including those using large
pages. There is no measurable performance degradation for small processes.
This version of the patch uses Hugh's new locking mechanism, extending it
up the page table tree as far as necessary for proper concurrency control.
The patch also includes the proper locking for following the vma chains.
Hugh, I believe I have all the lock points nailed down. I'd appreciate
your input on any I might have missed.
The architectures supported are i386 and x86_64. I'm working on 64 bit
ppc, but there are still some issues around proper segment handling that
need more testing. This will be available in a separate patch once it's
solid.
Dave McCracken
The locking looks much better now, and I like the way i_mmap_lock seems
to fall naturally into place where the pte lock doesn't work. But still
some raciness noted in comments on patch below.
The main thing I dislike is the
16 files changed, 937 insertions(+), 69 deletions(-)
(with just i386 and x86_64 included): it's adding more complexity than
I can welcome, and too many unavoidable "if (shared) ... else ..."s.
With significant further change needed, not just adding architectures.
Worthwhile additional complexity? I'm not the one to judge that.
Brian has posted dramatic improvments (25%, 49%) for the non-huge OLTP,
and yes, it's sickening the amount of memory we're wasting on pagetables
in that particular kind of workload. Less dramatic (3%, 4%) in the
hugetlb case: and as yet (since last summer even) no profiles to tell
where that improvement actually comes from.
Hi,
We collected more granular performance data for the ppc64/hugepage case.
CPI decreased by 3% when shared pagetables were used. Underlying this was a
7% decrease in the overall TLB miss rate. The TLB miss rate for hugepages
decreased 39%. TLB miss rates are calculated per instruction executed.
We didn't collect a profile per se, as we would expect a CPI improvement
of this nature to be spread over a significant number of functions,
mostly in user-space.
Cheers,
Brian
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]