On Mon, 21 Nov 2005, Takashi Iwai wrote:
>
> Sorry, I still don't figure out how __GFP_COMP solved this problem.
> Could you enlighten me a bit?
The sequence of problems was this.
Nick's core PageReserved removal patch in 2.6.15-rc1 (and -rc2)
changed VM_RESERVED vmas never to free their pages on unmapping (e.g.
on exit) - fine for remap_pfn_range areas, but a leak where others set
VM_RESERVED; and PageReserved not to inhibit decrementing page count.
In -rc1-mm2 I tried to fix that leak by restoring VM_RESERVED to its
previous behaviour, and using a different flag VM_UNPAGED, set in
remap_pfn_range, for the don't-free-when-unmapping behaviour.
But there's then a problem when the underlying page was allocated
as a high-order page, but the separate individual 0-order constituent
pages are mapped into userspace by nopage: the page count of the first
0-order is raised by allocation, but the following 0-order pages are
left with page count 0. nopage's get_page raises that to 1,
zap_pte_range (or whatever it uses to actually do the freeing) lowers
that to 0, and hence frees the page, even though it's a constituent of
the not-yet-freed high-order page. (This had not been a problem while
PageReserved was inhibiting decrementing the page count.)
So another of my patches in -rc1-mm2 made the PageCompound technique
available always, no longer under #ifdef CONFIG_HUGETLB_PAGE: so that
get_page and put_page on the later constituents of the high-order
page get redirected to the first one, and it should work okay again.
Except that I'd missed that you actually have to choose to have your
high-order pages supplied as compound pages, by passing __GFP_COMP.
Since I wasn't passing that, they still weren't allocated as compound
pages, so were still being freed too soon - and the PG_reserved flag
found while freeing gave rise to the "Bad page state" messages seen.
> Isn't it needed for dma_alloc_coherent() (for i386, particularly),
> too? dma_alloc_coherent() also gets pages with __get_free_pages().
Didn't I deal with that by adding __GFP_COMP in snd_malloc_dev_pages?
And (in a separate patch run past davem and wli first, to be aggregated
with the sound/core/memalloc patch when I sign off and send to Andrew)
in the sparc and sparc64 sbus_alloc_consistent.
It's only an issue when the high-order page might be mapped into
userspace, then its constituents freed up by zap_pte_range; or
locked down with get_user_pages and later released: when constituents
of a high-order page pass through common code designed for 0-order pages.
> Also, I think we can remove all Set/ClearPageReserved() in memalloc.c
> now. It was there just for mmap...
That is so, but we'd prefer you to hold off for now. The way it's
proceeding is, for 2.6.15 we're not actually removing the Set/Clear
PageReserved from any of the drivers or from any of the architecture
initialization; but PageReserved is no longer serving any functional
purpose, except where PG_reserved acts as a double-check when the page
is freed, as to whether it's all working right. Which was useful in
this case, to identify where I'd forgotten all about __GFP_COMP; and
I fear may prove useful in some other cases too. Retaining this use
of PG_reserved for now, gives greater confidence in our safety when we
later advance to removing all the Set/ClearPageReserved hocus-pocus.
Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]