* Nick Piggin <[email protected]> wrote:
> Hi,
>
> Not sure if this should be fixed for 2.6.13. It can result in
> pagecache corruption: so I guess that answers my own question.
>
> This was tested by Robin and appears to solve the problem. Roland had
> a quick look and thought the basic idea was sound. I'd like to get a
> couple more acks before going forward, and in particular Robin was
> contemplating possible efficiency improvements (although efficiency
> can wait on correctness).
>
> Feedback please, anyone.
it looks good to me, but wouldnt it be simpler (in terms of patch and
architecture impact) to always retry the follow_page() in
get_user_pages(), in case of a minor fault? The sequence of minor faults
must always be finite so it's a safe solution, and should solve the race
too. The extra overhead of an additional follow_page() is small.
Especially with 2.6.13 being close this looks like the safest bet, any
improvement to this (i.e. your patch) can then be done in 2.6.14.
Ingo
When get_user_pages for write access races with another process in the page
fault handler that has established the pte for read access, handle_mm_fault
in get_user_pages will return VM_FAULT_MINOR even if it hasn't made the page
correctly writeable (for example, broken COW).
Thus the assumption that get_user_pages has a writeable page at the mapping
after handle_mm_fault returns is incorrect. Fix this by retrying the lookup
and fault in get_user_pages before making the assumption that we have a
writeable page.
Great work by Robin Holt <[email protected]> to debug the problem.
Originally-from: Nick Piggin <[email protected]>
simplified the patch into a two-liner:
Signed-off-by: Ingo Molnar <[email protected]>
mm/memory.c | 7 +++++++
1 files changed, 7 insertions(+)
Index: linux-sched-curr/mm/memory.c
===================================================================
--- linux-sched-curr.orig/mm/memory.c
+++ linux-sched-curr/mm/memory.c
@@ -961,6 +961,13 @@ int get_user_pages(struct task_struct *t
switch (handle_mm_fault(mm,vma,start,write)) {
case VM_FAULT_MINOR:
tsk->min_flt++;
+ /*
+ * We might have raced with a readonly
+ * pagefault, retry to make sure we
+ * got write access:
+ */
+ if (write)
+ continue;
break;
case VM_FAULT_MAJOR:
tsk->maj_flt++;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
|
|