Quoting Hugh Dickins <[email protected]>:
> In the time since we discussed before, I've rather come full circle
> round to my original position: abandoning such ideas of trying to
> handle it from get_user_pages itself, appreciating the simplicity
> of the original PROT_DONTCOPY idea from you guys; but sticking to my
> initial reaction that this is better done by madvise(MADV_DONTCOPY),
> not by the mmap/mprotect route in Michael's patch. (I never bought
> the "racy" argument advanced in favour of the mmap flag.)
>
> One of the factors which has swayed me to the DONTCOPY approach, is
> Nick's 2.6.14 optimization in fork's copy_page_range, where areas
> which can be safely faulted later are not copied pte by pte. But
> that doesn't apply to all areas, and in particular cannot apply to
> VM_NONLINEAR shared areas. It should be of benefit to apps which
> use large such areas, and also do a lot of forking children who don't
> need those areas, to be able to mark them VM_DONTCOPY. Or any other
> vmas the children won't need. (But there's one big distinction between
> the optimization and VM_DONTCOPY: the optimization copies vma but
> doesn't fill in its ptes, VM_DONTCOPY doesn't even copy the vma.)
>
> Two warnings if someone would like to post a MADV_DONTCOPY patch.
> It should include a matching MADV_DOCOPY to clear the condition, but
> that must not be allowed to clear VM_DONTCOPY set originally by driver:
> perhaps you'll end up with a VM_UDONTCOPY or something like that.
>
> And Badari has a MADV_REMOVE patch in the works, taking the next
> slot (just after MADV_DONTNEED in most of the arches): probably
> best for you to base yours on top of his (though yours is simpler
> and might jump ahead).
>
> Hugh
>
Hugh, did you have something like the following in mind
(this is only boot-tested and only on x86-64)?
Hmm, maybe MADV_INHERIT and MADV_DONT_INHERIT would be better names,
since the copy is only dont one write ...
Comments?
----
Signed-off-by: Michael S. Tsirkin <[email protected]>
Index: linux-2.6.14-dontcopy/kernel/fork.c
===================================================================
--- linux-2.6.14-dontcopy.orig/kernel/fork.c 2005-11-08 23:41:30.000000000 +0200
+++ linux-2.6.14-dontcopy/kernel/fork.c 2005-11-08 23:41:08.000000000 +0200
@@ -209,7 +209,7 @@ static inline int dup_mmap(struct mm_str
for (mpnt = current->mm->mmap ; mpnt ; mpnt = mpnt->vm_next) {
struct file *file;
- if (mpnt->vm_flags & VM_DONTCOPY) {
+ if (mpnt->vm_flags & (VM_DONTCOPY | VM_UDONTCOPY)) {
long pages = vma_pages(mpnt);
mm->total_vm -= pages;
__vm_stat_account(mm, mpnt->vm_flags, mpnt->vm_file,
Index: linux-2.6.14-dontcopy/mm/mmap.c
===================================================================
--- linux-2.6.14-dontcopy.orig/mm/mmap.c 2005-11-08 23:42:01.000000000 +0200
+++ linux-2.6.14-dontcopy/mm/mmap.c 2005-11-08 23:41:48.000000000 +0200
@@ -840,7 +840,7 @@ void __vm_stat_account(struct mm_struct
#ifdef CONFIG_HUGETLB
if (flags & VM_HUGETLB) {
- if (!(flags & VM_DONTCOPY))
+ if (!(flags & (VM_DONTCOPY|VM_UDONTCOPY)))
mm->shared_vm += pages;
return;
}
Index: linux-2.6.14-dontcopy/mm/madvise.c
===================================================================
--- linux-2.6.14-dontcopy.orig/mm/madvise.c 2005-10-28 02:02:08.000000000 +0200
+++ linux-2.6.14-dontcopy/mm/madvise.c 2005-11-08 23:28:56.000000000 +0200
@@ -31,6 +31,12 @@ static long madvise_behavior(struct vm_a
case MADV_RANDOM:
new_flags |= VM_RAND_READ;
break;
+ case MADV_DONTCOPY:
+ new_flags |= VM_UDONTCOPY;
+ break;
+ case MADV_DOCOPY:
+ new_flags &= ~VM_UDONTCOPY;
+ break;
default:
break;
}
@@ -150,6 +156,8 @@ madvise_vma(struct vm_area_struct *vma,
case MADV_NORMAL:
case MADV_SEQUENTIAL:
case MADV_RANDOM:
+ case MADV_DONTCOPY:
+ case MADV_DOCOPY:
error = madvise_behavior(vma, prev, start, end, behavior);
break;
Index: linux-2.6.14-dontcopy/include/linux/mm.h
===================================================================
--- linux-2.6.14-dontcopy.orig/include/linux/mm.h 2005-11-08 23:24:58.000000000 +0200
+++ linux-2.6.14-dontcopy/include/linux/mm.h 2005-11-08 23:25:09.000000000 +0200
@@ -154,6 +154,7 @@ extern unsigned int kobjsize(const void
/* Used by sys_madvise() */
#define VM_SEQ_READ 0x00008000 /* App will access data sequentially */
#define VM_RAND_READ 0x00010000 /* App will not benefit from clustered reads */
+#define VM_UDONTCOPY 0x02000000 /* App wants to set VM_DONTCOPY */
#define VM_DONTCOPY 0x00020000 /* Do not copy this vma on fork */
#define VM_DONTEXPAND 0x00040000 /* Cannot expand with mremap() */
Index: linux-2.6.14-dontcopy/include/asm-x86_64/mman.h
===================================================================
--- linux-2.6.14-dontcopy.orig/include/asm-x86_64/mman.h 2005-11-08 23:19:35.000000000 +0200
+++ linux-2.6.14-dontcopy/include/asm-x86_64/mman.h 2005-11-08 23:19:46.000000000 +0200
@@ -36,6 +36,8 @@
#define MADV_SEQUENTIAL 0x2 /* read-ahead aggressively */
#define MADV_WILLNEED 0x3 /* pre-fault pages */
#define MADV_DONTNEED 0x4 /* discard these pages */
+#define MADV_DONTCOPY 0x30 /* dont inherit across fork */
+#define MADV_DOCOPY 0x31 /* do inherit across fork */
/* compatibility flags */
#define MAP_ANON MAP_ANONYMOUS
--
MST
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]