Re: [patch 2.6.13-rc4] fix get_user_pages bug

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


On Mon, 1 Aug 2005, Nick Piggin wrote:
> Not sure if this should be fixed for 2.6.13. It can result in
> pagecache corruption: so I guess that answers my own question.

Hell no.

This patch is clearly untested and must _not_ be applied:

+                               case VM_FAULT_RACE:
+                                       /*
+                                        * Someone else got there first.
+                                        * Must retry before we can assume
+                                        * that we have actually performed
+                                        * the write fault (below).
+                                        */
+                                       if (write)
+                                               continue;
+                                       break;

that "continue" will continue without the spinlock held, and now do 
follow_page() will run without page_table_lock, _and_ it will release the 
spinlock once more afterwards, so if somebody else is racing on this, we 
might remove the spinlock for them too.

Don't do it.

Instead, I'd suggest changing the logic for "lookup_write". Make it 
require that the page table entry is _dirty_ (not writable), and then 
remove the line that says:

	lookup_write = write && !force;

and you're now done. A successful mm fault for write _should_ always have 
marked the PTE dirty (and yes, part of testing this would be to verify 
that this is true - but since architectures that don't have HW dirty 
bits depend on this anyway, I'm pretty sure it _is_ true).

Ie something like the below (which is totally untested, obviously, but I 
think conceptually is a lot more correct, and obviously a lot simpler).


diff --git a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -811,18 +811,15 @@ static struct page *__follow_page(struct
 	pte = *ptep;
 	if (pte_present(pte)) {
-		if (write && !pte_write(pte))
+		if (write && !pte_dirty(pte))
 			goto out;
 		if (read && !pte_read(pte))
 			goto out;
 		pfn = pte_pfn(pte);
 		if (pfn_valid(pfn)) {
 			page = pfn_to_page(pfn);
-			if (accessed) {
-				if (write && !pte_dirty(pte) &&!PageDirty(page))
-					set_page_dirty(page);
+			if (accessed)
-			}
 			return page;
@@ -972,14 +969,6 @@ int get_user_pages(struct task_struct *t
-				/*
-				 * Now that we have performed a write fault
-				 * and surely no longer have a shared page we
-				 * shouldn't write, we shouldn't ignore an
-				 * unwritable page in the page table if
-				 * we are forcing write access.
-				 */
-				lookup_write = write && !force;
 			if (pages) {
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at
Please read the FAQ at

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux