Re: Ok, explained.. (was Re: [PATCH] mm: fix page_mkclean_one)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey nice work Linus!

Linus Torvalds wrote:

On Fri, 29 Dec 2006, Linus Torvalds wrote:

Hmm? I'd love it if somebody else wrote the patch and tested it, because I'm getting sick and tired of this bug ;)


Who the hell am I kidding? I haven't been able to sleep right for the last few days over this bug. It was really getting to me.

And putting on the thinking cap, there's actually a fairly simple an nonintrusive patch.

Yeah *this* makes more sense. And in retrospect it was simple, we
can't just throw out pte dirtiness information if the page doesn't
have all buffers dirtied.

It still has a tiny tiny race (see the comment), but I bet nobody can really hit it in real life anyway, and I know several ways to fix it, so I'm not really _that_ worried about it.

Well the race isn't a data loss one, is it? Just a case where the
pte may be dirty but the page dirty state not accounted for.

Can we fix it by just putting the page_mkclean back inside the
TestClearPageDirty check, and re-clearing PG_dirty after redoing
the set_page_dirty?



The patch is mostly a comment. The "real" meat of it is actually just a few lines.

Can anybody get corruption with this thing applied? It goes on top of plain v2.6.20-rc2.

		Linus

----
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index b3a198c..ec01da1 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -862,17 +862,46 @@ int clear_page_dirty_for_io(struct page *page)
 {
 	struct address_space *mapping = page_mapping(page);
- if (!mapping)
-		return TestClearPageDirty(page);
-
-	if (TestClearPageDirty(page)) {
-		if (mapping_cap_account_dirty(mapping)) {
-			page_mkclean(page);
+	if (mapping && mapping_cap_account_dirty(mapping)) {
+		/*
+		 * Yes, Virginia, this is indeed insane.
+		 *
+		 * We use this sequence to make sure that
+		 *  (a) we account for dirty stats properly
+		 *  (b) we tell the low-level filesystem to
+		 *      mark the whole page dirty if it was
+		 *      dirty in a pagetable. Only to then
+		 *  (c) clean the page again and return 1 to
+		 *      cause the writeback.
+		 *
+		 * This way we avoid all nasty races with the
+		 * dirty bit in multiple places and clearing
+		 * them concurrently from different threads.
+		 *
+		 * Note! Normally the "set_page_dirty(page)"
+		 * has no effect on the actual dirty bit - since
+		 * that will already usually be set. But we
+		 * need the side effects, and it can help us
+		 * avoid races.
+		 *
+		 * We basically use the page "master dirty bit"
+		 * as a serialization point for all the different
+		 * threds doing their things.
+		 *
+		 * FIXME! We still have a race here: if somebody
+		 * adds the page back to the page tables in
+		 * between the "page_mkclean()" and the "TestClearPageDirty()",
+		 * we might have it mapped without the dirty bit set.
+		 */
+		if (page_mkclean(page))
+			set_page_dirty(page);
+		if (TestClearPageDirty(page)) {
 			dec_zone_page_state(page, NR_FILE_DIRTY);
+			return 1;
 		}
-		return 1;
+		return 0;
 	}
-	return 0;
+	return TestClearPageDirty(page);
 }
 EXPORT_SYMBOL(clear_page_dirty_for_io);


--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com -
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux