Re: IA64 non-contiguous memory space bugs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 22, 2006 at 04:35:34PM +0000, Hugh Dickins wrote:
> On Wed, 22 Feb 2006, 'David Gibson' wrote:
> > On Tue, Feb 21, 2006 at 05:51:52PM -0800, Chen, Kenneth W wrote:
> > > 
> > > free_pgtables() has partial crap that the check of is_hugepage_only_range()
> > > should be done on the entire vma range, not just the first hugetlb page.
> 
> We're testing whether this vma falls in a hugepage_only_range.  It's
> important to check the size (end) of the vma when setting it up; but
> when tearing down it is fair to assume that it was set up correctly,
> so unnecessary to check its size (end).
> 
> I used HPAGE_SIZE rather than 0 because one can imagine an implementation
> of is_hugepage_only_range which would go wrong with 0, or with a fraction
> of HPAGE_SIZE.  Perhaps you'd be happier to add an is_hugepage_only_addr
> macro which hides that size arg to is_hugepage_only_range.

(Aside: is_hugepage_only_range() isn't about telling where huge pages
can go, it's about telling where normal pages can't go.  As such it
must for it's primary callsite on the MAP_FIXED path in
get_unmapped_area() work with parameters that aren't HPAGE_SIZE
aligned).

> > > Though, it's not possible to have a hugetlb vma while having normal page
> > > instantiated inside that vma.
> 
> That is, if the setup does its checks correctly, you're not allowed
> to have a normal vma in (or spanning) a hugepage_only_range.
> 
> > > So the bug is mostly phantom.  For correctness, it should be fixed.
> 
> I believe the bug is non-existent, and therefore needs no fix.

The bug is real alright, I've watched it call hugetlb_free_pgd_range()
for a normal page VMA on powerpc.

> > Actually, from ppc64's point of view, the problem with the test is
> > that the whole vma could be *less* than HPAGE_SIZE - we don't test
> > that the address is aligned before checking is_hugepage_only_range().
> > We thus can call hugetlb_free_pgd_range() on normal page VMAs - which
> > we only get away with because the ppc64 hugetlb_free_pgd_range() is
> > (so far) an alias for the normal free_pgd_range().
> 
> Are you saying that on ppc64, you can put non-hugepage vmas into a
> hugepage_only_range?  If so, why is it called a hugepage_only_range?
> Or, are you saying that your hugepage_only_range is smaller than
> HPAGE_SIZE, so no hugepages can fit in it?  Or that you have
> hugepage vma start addresses not aligned to HPAGE_SIZE?
> None of that makes sense to me.

None of the above.  However, is_hugepage_only_range() does not need to
be called on a hugepage aligned range (and is not here), and returns
true (and must do so) if the given range intersects a hugepage only
area, not only if it lies entirely within a hugepage only area.

Consider a HPAGE_SIZE hugepage VMA starting at 4GB, and a normal page
VMA starting at (4GB-PAGE_SIZE).  This situation is possible on
powerpc, and is_hugepage_only_range(4GB-PAGE_SIZE, HPAGE_SIZE) will
(and must) return true.  Therefore the free_pgtables() logic will call
hugetlb_free_pgd_range() across the normal page VMA.

> > Your patch below is insufficient, because there's a second test of
> > is_hugepage_only_range() further down.  However, instead of tweaking
> > the tested ranges, I think what we really want to do is check for
> > is_vm_hugetlb_page() instead.
> 
> No, that looks cleaner, but it's wrong.  hugetlb_free_pgd_range does
> something useful on those architectures which have a not-always-false
> is_hugepage_only_range (ia64 and powerpc alone): it's paired with it.
> (Though as you've noticed, powerpc only does the usual free_pgd_range.)
> 
> hugetlb_free_pgd_range does nothing on most architectures, even those
> (i386 etc) which have a not-always-false is_vm_hugetlb_page: we do
> want to free_pgd_range on those.  So using is_vm_hugetlb_page instead
> of is_hugepage_only_range is wrong for them.  Though I guess you could
> change their hugetlb_free_pgd_range definitions to free_pgd_range, and
> then use is_vm_hugetlb_page as you did, that would work too (though
> with less combining of vmas in that loop, so not an improvement).

Yes, I realised that was wrong shortly after posting.  In fact it's
wrong in just the same way that is_hugepage_only_range() is wrong for
powerpc right now - which we work around becuse
hugetlb_free_pgd_range() is identical to free_pgd_range().

I can see two ways of fixing this.  The quick, hacky fix is to use
is_vm_hugetlb_page(), and work around the problems by having
hugetlb_free_pgd_range() be identical to free_pgd_range() in most
cases.  Fixing it more cleanly will need a new callback that actually
encodes this "need special pagetable freeing" concept.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux