Re: User switchable HW mappings & cie

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Mon, 9 Oct 2006, Thomas Hellström wrote:
> 
> One problem that occurs is that the rule for ptes with non-backing struct
> pages
> Which I think was introduced in 2.6.16:
> 
>    pfn_of_page == vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT)

No.

No such rule exists.

The above is true only fora _very_ special case, namely the case of 
allowing a COW-mapping of a non-backing-store set. And the _only_ case 
where that even makes sense is /dev/mem, and quite frankly, even there the 
only reason it exists is just one or two legacy programs that really did 
use COW mappings on something that really shouldn't be COW-mapped.

(I think the only program that did was actually something that tried to 
read the BIOS tables and mapped /dev/mem privately but read-write, and 
then edited the BIOS mappings in place).

So _especially_ with 2.6.16+, using a non-backing-store mapping is really 
really easy: you just use

	int vm_insert_page(struct vm_area_struct *vma, unsigned long addr, struct page *page)

and you're done (if a ref-counted page is what is wanted).

That one requires a "struct page *", because all current users had that as 
their standard setup (ie direct rendering etc), but the thing is, the VM 
these days really doesn't care.

However, if you don't want to have a ref-counted page, you just use 
VM_PFNMAP, and insert any random pte you damn well want.

Once you have set VM_PFNMAP, and the mapping is _not_ a cow mapping, the 
page something points to will never actually be used for anything but 
mapping (modulo bugs, of course, but the new VM is a hell of a lot more 
straightforward in this, and the whole "vm_normal_page()" thing is really 
trivial).

So you never have any "struct page *" associated with it at all, and no 
refcounting is taking place - you always act purely on a pfn and a page 
protection thing (ie effectively a "pte_t").

NOTE! The important part here is really to just understand the fact that 
COW mappings are special. So you need to make sure that you always map 
such a thing "shared" (and if you have security issues, it obviously needs 
to be just marked read-only - "shared" does _not_ imply that it might be 
read-write).

COW-mappings (and _only_ cow-mappings) have the added rule of how the page 
offsets need to be handled, since otherwise there is no way to distinguish 
between somethign that got mapped originally, and something that got 
COW'ed (where the latter _does_ need reference counting, of course, since 
it's a regular page that has been copied into the address space).

It's probably also a good idea to map such things as VM_DONTCOPY, since 
they probably don't make sense across forks, and it slows down forking to 
copy the page tables. And add a VM_INSERTPAGE, although right now that 
doesn't actually do anything special (but it was originally meant to be a 
"we've done single-page random things, and so you can't _assume_ the 
virtual-address/pg_off thing to hold", so setting it is a good idea just 
for consistency).

Anyway, so right now you can use "vm_insert_page()" and it will increment 
the page count and add things to the rmap lists, which is what current 
users want. But if you don't have a normal page, you should be able to 
basically avoid that part entirely, and just use

	set_pte_at(mm, addr, pte, make-up-a-pte-here);

and you're done (of course, you need to use all the appropriate magic to 
set up the pte, ie you'd normally have something like

	pte = get_locked_pte(mm, addr, &ptl);
	..
	pte_unmap_unlock(pte, ptl);

around it). Note that "vm_insert_page()" is _not_ for VM_PFNMAP mappings, 
exactly because it does actually increment page counts. It's for a 
"normal" mapping that just wants to insert a reference-counted page.

		Linus

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux