Re: Nick's core remove PageReserved broke vmware...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Quoting Hugh Dickins <[email protected]>:
> In the time since we discussed before, I've rather come full circle
> round to my original position: abandoning such ideas of trying to
> handle it from get_user_pages itself, appreciating the simplicity
> of the original PROT_DONTCOPY idea from you guys; but sticking to my
> initial reaction that this is better done by madvise(MADV_DONTCOPY),
> not by the mmap/mprotect route in Michael's patch.  (I never bought
> the "racy" argument advanced in favour of the mmap flag.)
> 
> One of the factors which has swayed me to the DONTCOPY approach, is
> Nick's 2.6.14 optimization in fork's copy_page_range, where areas
> which can be safely faulted later are not copied pte by pte.  But
> that doesn't apply to all areas, and in particular cannot apply to
> VM_NONLINEAR shared areas.  It should be of benefit to apps which
> use large such areas, and also do a lot of forking children who don't
> need those areas, to be able to mark them VM_DONTCOPY.  Or any other
> vmas the children won't need.  (But there's one big distinction between
> the optimization and VM_DONTCOPY: the optimization copies vma but
> doesn't fill in its ptes, VM_DONTCOPY doesn't even copy the vma.)
> 
> Two warnings if someone would like to post a MADV_DONTCOPY patch.
> It should include a matching MADV_DOCOPY to clear the condition, but
> that must not be allowed to clear VM_DONTCOPY set originally by driver:
> perhaps you'll end up with a VM_UDONTCOPY or something like that.
> 
> And Badari has a MADV_REMOVE patch in the works, taking the next
> slot (just after MADV_DONTNEED in most of the arches): probably
> best for you to base yours on top of his (though yours is simpler
> and might jump ahead).
> 
> Hugh
> 

Hugh, did you have something like the following in mind
(this is only boot-tested and only on x86-64)?
Hmm, maybe MADV_INHERIT and MADV_DONT_INHERIT would be better names,
since the copy is only dont one write ...

Comments?

----

Signed-off-by: Michael S. Tsirkin <[email protected]>

Index: linux-2.6.14-dontcopy/kernel/fork.c
===================================================================
--- linux-2.6.14-dontcopy.orig/kernel/fork.c	2005-11-08 23:41:30.000000000 +0200
+++ linux-2.6.14-dontcopy/kernel/fork.c	2005-11-08 23:41:08.000000000 +0200
@@ -209,7 +209,7 @@ static inline int dup_mmap(struct mm_str
 	for (mpnt = current->mm->mmap ; mpnt ; mpnt = mpnt->vm_next) {
 		struct file *file;
 
-		if (mpnt->vm_flags & VM_DONTCOPY) {
+		if (mpnt->vm_flags & (VM_DONTCOPY | VM_UDONTCOPY)) {
 			long pages = vma_pages(mpnt);
 			mm->total_vm -= pages;
 			__vm_stat_account(mm, mpnt->vm_flags, mpnt->vm_file,
Index: linux-2.6.14-dontcopy/mm/mmap.c
===================================================================
--- linux-2.6.14-dontcopy.orig/mm/mmap.c	2005-11-08 23:42:01.000000000 +0200
+++ linux-2.6.14-dontcopy/mm/mmap.c	2005-11-08 23:41:48.000000000 +0200
@@ -840,7 +840,7 @@ void __vm_stat_account(struct mm_struct 
 
 #ifdef CONFIG_HUGETLB
 	if (flags & VM_HUGETLB) {
-		if (!(flags & VM_DONTCOPY))
+		if (!(flags & (VM_DONTCOPY|VM_UDONTCOPY)))
 			mm->shared_vm += pages;
 		return;
 	}
Index: linux-2.6.14-dontcopy/mm/madvise.c
===================================================================
--- linux-2.6.14-dontcopy.orig/mm/madvise.c	2005-10-28 02:02:08.000000000 +0200
+++ linux-2.6.14-dontcopy/mm/madvise.c	2005-11-08 23:28:56.000000000 +0200
@@ -31,6 +31,12 @@ static long madvise_behavior(struct vm_a
 	case MADV_RANDOM:
 		new_flags |= VM_RAND_READ;
 		break;
+	case MADV_DONTCOPY:
+		new_flags |= VM_UDONTCOPY;
+		break;
+	case MADV_DOCOPY:
+		new_flags &= ~VM_UDONTCOPY;
+		break;
 	default:
 		break;
 	}
@@ -150,6 +156,8 @@ madvise_vma(struct vm_area_struct *vma, 
 	case MADV_NORMAL:
 	case MADV_SEQUENTIAL:
 	case MADV_RANDOM:
+	case MADV_DONTCOPY:
+	case MADV_DOCOPY:
 		error = madvise_behavior(vma, prev, start, end, behavior);
 		break;
 
Index: linux-2.6.14-dontcopy/include/linux/mm.h
===================================================================
--- linux-2.6.14-dontcopy.orig/include/linux/mm.h	2005-11-08 23:24:58.000000000 +0200
+++ linux-2.6.14-dontcopy/include/linux/mm.h	2005-11-08 23:25:09.000000000 +0200
@@ -154,6 +154,7 @@ extern unsigned int kobjsize(const void 
 					/* Used by sys_madvise() */
 #define VM_SEQ_READ	0x00008000	/* App will access data sequentially */
 #define VM_RAND_READ	0x00010000	/* App will not benefit from clustered reads */
+#define VM_UDONTCOPY	0x02000000      /* App wants to set VM_DONTCOPY */
 
 #define VM_DONTCOPY	0x00020000      /* Do not copy this vma on fork */
 #define VM_DONTEXPAND	0x00040000	/* Cannot expand with mremap() */
Index: linux-2.6.14-dontcopy/include/asm-x86_64/mman.h
===================================================================
--- linux-2.6.14-dontcopy.orig/include/asm-x86_64/mman.h	2005-11-08 23:19:35.000000000 +0200
+++ linux-2.6.14-dontcopy/include/asm-x86_64/mman.h	2005-11-08 23:19:46.000000000 +0200
@@ -36,6 +36,8 @@
 #define MADV_SEQUENTIAL	0x2		/* read-ahead aggressively */
 #define MADV_WILLNEED	0x3		/* pre-fault pages */
 #define MADV_DONTNEED	0x4		/* discard these pages */
+#define MADV_DONTCOPY	0x30		/* dont inherit across fork */
+#define MADV_DOCOPY	0x31		/* do inherit across fork */
 
 /* compatibility flags */
 #define MAP_ANON	MAP_ANONYMOUS

-- 
MST
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux