On Wed, 2007-07-25 at 07:29 +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2007-07-24 at 14:13 +0200, Andi Kleen wrote:
> > Benjamin Herrenschmidt <[email protected]> writes:
> >
> > > > What a truly putrid patch. I am suspecting that this was a quick
> > > > get-you-out-of-trouble thing, which then got forgotten about.
> > > >
> > > > We have two months to do the "right fix". Please?
> > >
> > > Working on it...
> >
> > Ideally the patch would DTRT even on non preemptible kernels,
> > aka do cond_resched()s when needed.
>
> First is to rework the batch structure to make it more manageable. That
> is, patch #1 will keep the page list in per-cpu (and thus non-preempt),
> but the batch "head" will be on the stack.
>
> Now, there are two approaches regarding getting rid of the
> get_cpu/put_cpu:
>
> - One is to have a small number of entries for the page list in the
> batch structure on the stack, and attempt to gfp' a page for more. If
> that fails, we can still free, though with less batching, using only the
> few entries in the batch struct itself. That's Hugh initial appraoch
> iirc.
>
> - Another is to hook up with those folks who've been asking for a
> notifier that we are being preempted/scheduled out. In this case, I can
> happily access the per-cpu list, and just trigger a batch flush if we
> happen to be scheduled out.
>
> I tend to prefer the former solution though, gfp should be fast, and
> there is no need to force a flush if we get scheduled out. It would be
> rare to hit the worst case scenario of falling back to the few page
> heads in the batch itself. On the other hand, that solution has the
> problem of bloating the stack a bit (with the few page pointers) even in
> the case where I plan to use the extended batch outside of zap_*, such
> as fork, mprotect, ....
>
> So I'll first do patch #1, which will not fix the problem, but will make
> the fix easier to fit in, in the meantime, please provide feedback of
> your preferred solution for avoiding the get/put_cpu of the 2 above,
> unless you find a good 3rd one.
I too would prefer the former solution. I think preemption notifiers are
a particular iffy hack.
You could perhaps use C99 variable length arrays to avoid the stack
waste when not needed, however Andi once told me that generates rather
dubious code.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]