Re: 2.6.16-rc1: 28ms latency when process with lots of swapped memory exits

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 14 Mar 2006, Lee Revell wrote:
> On Tue, 2006-03-14 at 22:01 +0100, Ingo Molnar wrote:
> > hm, where does the latency come from? We do have a lockbreaker in 
> > unmap_vmas():
> > 
> >                         if (need_resched() ||
> >                                 (i_mmap_lock &&
> > need_lockbreak(i_mmap_lock))) {
> >                                 if (i_mmap_lock) {
> >                                         *tlbp = NULL;
> >                                         goto out;
> >                                 }
> >                                 cond_resched();
> >                         }
> > 
> > 
> > why doesnt this break up the 28ms latency?

That block is actually for PREEMPT n, and for truncating a mapped
file (i_mmap_lock additionally held): all Lee's PREEMPT y exit case
should need is the tlb_finish_mmu and tlb_gather_mmu around it,
letting preemption in - and the ZAP_BLOCK_SIZE 8*PAGE_SIZE.

> But the preempt count is >= 2, doesn't that mean some other lock must be
> held also, or someone called preempt_disable?

Yes, as I read the trace (and let me admit, I'm not at all skilled at
reading those traces), and as your swap observation implies, this is
not a problem with ptes present, but with swap entries: and with the
radix tree lookup involved in finding whether they have an associated
struct page in core - all handled while holding page table lock, and
while holding the per-cpu mmu_gather structure.

Oh, thank you for forcing me to take another look, 2.6.15 did make a
regression there, and this one is very simply remedied: Lee, please
try the patch below (I've done it against 2.6.16-rc6 because that's
what I have to hand; and would be a better tree for you to test),
and let us know if it fixes your case as I expect - thanks.

(Robin Holt observed how inefficient the small ZAP_BLOCK_SIZE was on
very sparse mmaps, as originally implemented; so he and Nick reworked
it to count only real work done; but the swap entries got put on the
side of "no real work", whereas you've found they may involve very
significant work.  My patch below reverses that: yes, I've got some
other cases now going the slow way when they needn't, but they're
too rare to clutter the code for.)

Hugh

--- 2.6.16-rc6/mm/memory.c	2006-03-12 15:25:45.000000000 +0000
+++ linux/mm/memory.c	2006-03-15 07:32:36.000000000 +0000
@@ -623,11 +623,12 @@ static unsigned long zap_pte_range(struc
 			(*zap_work)--;
 			continue;
 		}
+
+		(*zap_work) -= PAGE_SIZE;
+
 		if (pte_present(ptent)) {
 			struct page *page;
 
-			(*zap_work) -= PAGE_SIZE;
-
 			page = vm_normal_page(vma, addr, ptent);
 			if (unlikely(details) && page) {
 				/*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux