On Thu, Mar 09, 2006 at 04:31:11AM -0800, Chen, Kenneth W wrote:
> David Gibson wrote on Thursday, March 09, 2006 4:07 AM
> > > Well, the reservation is already done at mmap time for shared mapping. Why
> > > does kernel need to do anything at fault time? Doing it at fault time is
> > > an indication of weakness (or brokenness) - you already promised at mmap
> > > time that there will be a page available for faulting. Why check them
> > > again at fault time?
> >
> > You can't know (or bound) at mmap() time how many pages a PRIVATE
> > mapping will take (because of fork()). So unless you have a test at
> > fault time (essentialy deciding whether to draw from "reserved" and
> > "unreserved" hugepage pool) a supposedly reserved SHARED mapping will
> > OOM later if there have been enough COW faults to use up all the
> > hugepages before it's instantiated.
>
> I see. But that is easy to fix. I just need to do exactly the same
> thing as what you did to alloc_huge_page. I will then need to change
> definition of 'reservation' to needs-in-the future (also an easy thing
> to change).
Well.. except that then you *do* need to traverse the page cache on
truncate(), just like I do. Note that in my latest revision,
hugetlb_extend_reservation() no longer walks the radix tree, only
hugetlb_truncate_reservation() does (extend *does* still take the
tree_lock, an oversight which I will send a patch for tomorrow).
(Oh, and you'll need to walk the reserved range list in
alloc_huge_page(), rather than one comparison like I have. Although
in practice I imagine there will never be more than one entry on the
list, so I guess that doesn't really matter)
> The real question or discussion I want to bring up is whether kernel
> should do it's own accounting or relying on traversing the page cache.
> My opinion is that kernel should do it's own accounting because it is
> simpler: you just need to do that at mmap and ftruncate time.
And as we've seen above, a little bit at fault time. Which would be
exactly the same three places that my patch adds accounting. I'm
quite willing to be convinced your patch is the better approach, but
this isn't an argument for it.
Incidentally, I've just realised that removing the dodgy heuristic and
allowing unconstrained overcommit for PRIVATE mappings (which both our
patches do) is potentially problematic. In particular it means my
hugepage malloc() implementation will always OOM rather than fallback
to normal pages :( (I believe currently it will usually fall back, and
only OOM if you get unlucky with the timing).
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]