Re: [patch] mm: reduce pagetable-freeing latencies

On Wed, 2007-07-25 at 07:29 +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2007-07-24 at 14:13 +0200, Andi Kleen wrote:
> > Benjamin Herrenschmidt <[email protected]> writes:
> > 
> > > > What a truly putrid patch.  I am suspecting that this was a quick
> > > > get-you-out-of-trouble thing, which then got forgotten about.
> > > > 
> > > > We have two months to do the "right fix".  Please?
> > > 
> > > Working on it... 
> > 
> > Ideally the patch would DTRT even on non preemptible kernels,
> > aka do cond_resched()s when needed.
> 
> First is to rework the batch structure to make it more manageable. That
> is, patch #1 will keep the page list in per-cpu (and thus non-preempt),
> but the batch "head" will be on the stack.
> 
> Now, there are two approaches regarding getting rid of the
> get_cpu/put_cpu:
> 
>  - One is to have a small number of entries for the page list in the
> batch structure on the stack, and attempt to gfp' a page for more. If
> that fails, we can still free, though with less batching, using only the
> few entries in the batch struct itself. That's Hugh initial appraoch
> iirc.
> 
>  - Another is to hook up with those folks who've been asking for a
> notifier that we are being preempted/scheduled out. In this case, I can
> happily access the per-cpu list, and just trigger a batch flush if we
> happen to be scheduled out.
> 
> I tend to prefer the former solution though, gfp should be fast, and
> there is no need to force a flush if we get scheduled out. It would be
> rare to hit the worst case scenario of falling back to the few page
> heads in the batch itself. On the other hand, that solution has the
> problem of bloating the stack a bit (with the few page pointers) even in
> the case where I plan to use the extended batch outside of zap_*, such
> as fork, mprotect, ....
> 
> So I'll first do patch #1, which will not fix the problem, but will make
> the fix easier to fit in, in the meantime, please provide feedback of
> your preferred solution for avoiding the get/put_cpu of the 2 above,
> unless you find a good 3rd one.

I too would prefer the former solution. I think preemption notifiers are
a particular iffy hack.

You could perhaps use C99 variable length arrays to avoid the stack
waste when not needed, however Andi once told me that generates rather
dubious code.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: [patch] mm: reduce pagetable-freeing latencies
  - From: Benjamin Herrenschmidt <[email protected]>
- Re: [patch] mm: reduce pagetable-freeing latencies
  - From: Andi Kleen <[email protected]>

References:
- [patch] mm: reduce pagetable-freeing latencies
  - From: Ingo Molnar <[email protected]>
- Re: [patch] mm: reduce pagetable-freeing latencies
  - From: Andrew Morton <[email protected]>
- Re: [patch] mm: reduce pagetable-freeing latencies
  - From: Benjamin Herrenschmidt <[email protected]>
- Re: [patch] mm: reduce pagetable-freeing latencies
  - From: Andi Kleen <[email protected]>
- Re: [patch] mm: reduce pagetable-freeing latencies
  - From: Benjamin Herrenschmidt <[email protected]>

Prev by Date: Re: [RFC] fs/super.c: Why alloc_super use a static variable default_op?
Next by Date: Re: -mm merge plans for 2.6.23
Previous by thread: Re: [patch] mm: reduce pagetable-freeing latencies
Next by thread: Re: [patch] mm: reduce pagetable-freeing latencies
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]