This turned out to be a huge win on 32-bit i386 in PAE mode, but it is
likely not as significant on x86_64; I don't know because I haven't
actually measured the cost. I don't have 64-bit hardware that I have
the luxury of rebooting right now, so this patch is untested, but if
someone wants to try this out, it might actually show a measurable win
on fork/exit. I lost my cycle count measurement diffs, but I don't
think they would apply cleanly to x86_64 anyways. This patch at least
looks good, and compiles cleanly on 2.6.13-rc5-mm1, thus passing some
level of testing.
Also, it might show reduced latency on pre-emptible kernels during heavy
fork/exit activity, possibly allowing ZAP_BLOCK_SIZE to be raised for
some architectures (I measured a ~30-50% reduction in cycle timings for
zap_pte_range on i386 with CONFIG_PREEMPT with the analogous patch).
Zach
Any architecture that has hardware updated A/D bits that require
synchronization against other processors during PTE operations
can benefit from doing non-atomic PTE updates during address space
destruction. Originally done on i386, now ported to x86_64.
Doing a read/write pair instead of an xchg() operation saves the
implicit lock, which turns out to be a big win on 32-bit (esp w PAE).
Diffs-against: 2.6.13-rc5-mm1
Signed-off-by: Zachary Amsden <[email protected]>
Index: linux-2.6.13-rc5-mm1/include/asm-x86_64/pgtable.h
===================================================================
--- linux-2.6.13-rc5-mm1.orig/include/asm-x86_64/pgtable.h 2005-08-07 04:56:37.000000000 -0700
+++ linux-2.6.13-rc5-mm1/include/asm-x86_64/pgtable.h 2005-08-07 04:59:18.601856096 -0700
@@ -104,6 +104,19 @@
((unsigned long) __va(pud_val(pud) & PHYSICAL_PAGE_MASK))
#define ptep_get_and_clear(mm,addr,xp) __pte(xchg(&(xp)->pte, 0))
+
+static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, unsigned long addr, pte_t *ptep, int full)
+{
+ pte_t pte;
+ if (full) {
+ pte = *ptep;
+ *ptep = __pte(0);
+ } else {
+ pte = ptep_get_and_clear(mm, addr, ptep);
+ }
+ return pte;
+}
+
#define pte_same(a, b) ((a).pte == (b).pte)
#define PMD_SIZE (1UL << PMD_SHIFT)
@@ -433,6 +446,7 @@
#define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
#define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_DIRTY
#define __HAVE_ARCH_PTEP_GET_AND_CLEAR
+#define __HAVE_ARCH_PTEP_GET_AND_CLEAR_FULL
#define __HAVE_ARCH_PTEP_SET_WRPROTECT
#define __HAVE_ARCH_PTE_SAME
#include <asm-generic/pgtable.h>
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
|
|