Re: Possible memory ordering bug in page reclaim?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Herbert Xu wrote:
Nick Piggin <[email protected]> wrote:

Well yes, that's on the store side (1, above). However can't a CPU
still speculatively (eg. guess the branch) load the page->flags
cacheline which might be satisfied from memory before the page->count
cacheline loads? Ie. you can still have the correct write ordering
but have incorrect read ordering?

Because neither PageDirty nor page_count is a barrier, and there is
no read barrier between them.


Yes you're right.  A read barrier is required here.

I think Ben was actually agreeing with you.  He's just questioning
whether the corresponding write barrier existed on CPU 1 (the answer
to which is affirmative).

Ah, that clears up my misunderstanding.

Yes I agree the write side is OK.

Thanks Ben and Herbert. I guess I should do a proper patch then.

--
SUSE Labs, Novell Inc.

In mm/vmscan.c, the page reclaim may have the following sequence 2
running concurrently with sequence 1 on another CPU:

1                                2
find_get_page();
write to page                    write_lock(tree_lock);
SetPageDirty();                  if (page_count != 2
put_page();                              || PageDirty())
                                     /* page dirty or busy */
				 else
				     /* free it */

The comment indicates that PageDirty must be checked *after* page_count
indicates there are no users of this page, which prevents the dirty bit
from being lost in the case that that sequence 2 might see the state of
PageDirty() *before* SetPageDirty() in 1, but page_count *after* put_page
in 1.

However, there is no read memory barrier there, and so nothing to stop a
CPU from loading page_count before PageDirty (ie. ->flags). Theoretically,
data corruption is possible.

Signed-off-by: Nick Piggin <[email protected]>

Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c
+++ linux-2.6/mm/vmscan.c
@@ -511,7 +511,12 @@ static int shrink_list(struct list_head 
 		 * PageDirty _after_ making sure that the page is freeable and
 		 * not in use by anybody. 	(pagecache + us == 2)
 		 */
-		if (page_count(page) != 2 || PageDirty(page)) {
+		if (page_count(page) != 2) {
+			write_unlock_irq(&mapping->tree_lock);
+			goto keep_locked;
+		}
+		smp_rmb();
+		if (PageDirty(page)) {
 			write_unlock_irq(&mapping->tree_lock);
 			goto keep_locked;
 		}

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux