Re: [PATCH 10 of 20] ipath - support for userspace apps using core driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 15 Mar 2006, Andrew Morton wrote:
> "Bryan O'Sullivan" <[email protected]> wrote:
> > 
> >         Bad page state at __free_pages_ok (in process 'mpi_hello', page ffff8100020e2f88)
> >         flags:0x0100000000000804 mapping:0000000000000000 mapcount:0 count:0 (Not tainted)
> 
> Someone left PG_private set on this page (!)

That's pretty sure to be from page_alloc.c's internal use of
PagePrivate, for the buddy system of free pages.

I think Bryan's problem is probably that he has allocated a
high-order page (via pci_alloc_consistent or dma_alloc_coherent),
and is then exposing the constituents of that high-order page to
userspace via nopage: when the final munmap or exit comes,
the constituent pages get freed without realizing they're
supposed to be part of one high-order page.

We had the same problem in sound/core/memalloc.c, and fixed it
by using a compound page (a high-order page in which each constituent
knows it's part of the larger whole): see the two uses of __GFP_COMP
in that file, and in particular the one where snd_malloc_dev_pages
calls dma_alloc_coherent.

Yes, as Roland suggested, it sounds sensible to use dma_alloc_coherent
throughout rather than pci_alloc_consistent, then you can pass __GFP_COMP
down with the other flags and it should work (without any PageReserved).

I haven't checked Bryan's get_pag'ing, but he seemed to have the right
idea of how it should work: was just being a bit diffident about it in
the face of the fact that it clearly wasn't working (through lack of
__GFP_COMP).

Of course, there is an alternative way of going about it, using
remap_pfn_range more (VM_PFNMAP keeping the VM away from refcounting
the pages) and nopage less.  But I think Bryan should probably keep on
with the division he already has, no strong reason to change it around.

[ snipping lots of helpful explanation and advice from Andrew ]

> If you have pages which are created by ->nopage() but which are supposed to
> be shared across forks (the VMA should use VM_DONTCOPY|VM_SHARED) then each
> time a new process starts accessing these already-existing pages it'll take
> a minor fault.

No.  That isn't how VM_DONTCOPY behaves, and I don't know what you were
thinking to bring it into the discussion there - if you have pages which
are supposed to be shared across forks, that happens without any special
flag, doesn't it?  We only teach VM_DONTCOPY in the second year,
I think Bryan would best forget it for now ;)

Oh well, I'd better say a bit on it.  VM_DONTCOPY excludes the whole vma
from being forked into the child.  So if the child tries to access one
of those addresses, the child'll get a SIGSEGV not a minor fault.  RDMA
is finding it a useful flag to get around a peculiarly frustrating issue
with get_user_pages: though that successfully pins pages, a fork followed
by parent writing to pinned private page will put a _copy_ of that page
into the parent's userspace (if child holds a reference to the original),
which likely defeats the purpose of pinning.

So far as Bryan is dealing with shared mappings, he won't be interested
in VM_DONTCOPY at all.  If he's dealing with private mappings, he might
want to consider VM_DONTCOPY if get_user_pages and fork can occur
concurrently; but it's an advanced topic, right now he wants to get
the basis working without "Bad page state"s.

He shouldn't set VM_SHM, it means nothing.  If he's dealing with real
allocated pages, he shouldn't set VM_IO (but remap_pfn_range will set
it for him if it's used).  He shouldn't set VM_RESERVED (won't do any
harm, but we're quite likely to remove it soon).  He can set VM_DONT-
EXPAND to avoid worrying about whether mremap to a larger extent,
followed by fault on that larger extent, would behave correctly.
He shouldn't set VM_LOCKED (it would make the locked_vm accounting
go wrong - unless he's careful to participate in that accounting,
as Infiniband's uverbs_mem.c creditably does).

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux