Re: More info for DSM w/r/t sunffb on 2.6.15-rc6

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[ Hugh and Nick added to Cc, just in case they can see anything wrong with 
  this. It's actually very simple, but I hoped to avoid having to support
  the insane cases. I guess it was naive of me to think that there isn't 
  always _some_ insane user ;^]

David: please read the final note before the patch about software 
dirty/accessed bits. It may or may not be an issue on sparc64, I don't 
know where you do the dirty/accessed bit handling.

On Fri, 23 Dec 2005, David S. Miller wrote:
> 
> Linus, X.org is doing a MAP_PRIVATE mmap() of these discontiguous
> I/O mappings of the sparc frame buffer device it seems.  So the
> MAP_SHARED check in is_cow_mapping() doesn't pass.

Ok. I actually had a backup plan for that too, but was hoping that nobody 
would be quite that insane.

And doing a private mapping on a device and then expecting to do writes 
through that mapping is just totally insane. The fact that it happened to 
work before was arguably very much a bug (as it didn't actually create a 
private mapping).

But hey, here's the backup plan. It's entirely untested, but it is 
actually very simple, and has way more comments than actual code, so 
hopefully it's understandable and "obviously right" (yeah sure, famous 
last words ;).

NOTE! This very much means that an insane user that first does a private 
writable mapping of a device like this , and then a fork(), will _not_ see 
the mapping in the child. It will remain "private" and writable in the 
parent instead (writable only if the page protections that were passed in 
to remap_pfn_range() were writable, of course).

The reason? Doing a COW after the fork would be insane. Both from a VM 
complexity issue (it's what all the work has been trying to avoid), but 
also from a "what the hell does it mean?" kind of issue. 

So I'm pretty damn sure nobody depends on _that_ at least (old kernels 
would have done the insane and meaningless thing: it would basically be 
mapped shared until the fork() happened, and then it would be 
copy-on-write in _both_ processes after the fork).

NOTE NOTE NOTE! This will _not_ work if the pages need a software-dirty 
and/or software-accessed bit, and the low-level architecture page fault 
handler says "this is a write to a nonwritable area" and raises a SIGSEGV. 

So it may be that the insane sparc remap_pfn_range() users need to set the 
dirty/accessed bits in the page protection flags by hand before to avoid 
that. David?

Michael, can you test this patch (with the note that David may need to 
change something else too)?

So there are certainly some subtle issues left with this, and I'm not 
saying it's quite this simple, but they should be easy enough to handle.

			Linus

----
diff --git a/mm/memory.c b/mm/memory.c
index d8dde07..a93654d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1326,9 +1326,32 @@ int remap_pfn_range(struct vm_area_struc
 	 * un-COW'ed pages by matching them up with "vma->vm_pgoff".
 	 */
 	if (is_cow_mapping(vma->vm_flags)) {
-		if (addr != vma->vm_start || end != vma->vm_end)
-			return -EINVAL;
-		vma->vm_pgoff = pfn;
+		/*
+		 * We can do a real COW mapping _if_ it covers the whole area,
+		 * at which point we do the magic "vm_pgoff" trick.
+		 *
+		 * Otherwise we will have to turn it into non-copyable shared
+		 * area which has VM_WRITE turned off.
+		 *
+		 * NOTE NOTE NOTE! If the remap_pfn_range() was called with
+		 * a writable page protection, this means that the pages will
+		 * still be writable, but we will refuse to ever take a
+		 * write fault on any pages that weren't so.
+		 *
+		 * Also, we refuse to do the SHARED conversion if we already
+		 * have taken a C-O-W fault on the area.
+		 *
+		 * This is all just for insane old X servers, which map the
+		 * video pages private.
+		 */
+		if (addr == vma->vm_start && end == vma->vm_end) {
+			vma->vm_pgoff = pfn;
+		} else {
+			if (vma->anon_vma)
+				return -EINVAL;
+			vma->vm_flags |= VM_DONTCOPY | VM_SHARED;
+			vma->vm_flags &= ~(VM_WRITE | VM_MAYWRITE);
+		}
 	}
 
 	vma->vm_flags |= VM_IO | VM_RESERVED | VM_PFNMAP;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux