Re: msync(2) bug(?), returns AOP_WRITEPAGE_ACTIVATE to userland

Huge,

I took your advise regarding ~(__GFP_FS|__GFP_IO), AOP_WRITEPAGE_ACTIVATE,
and such.  I revised my unionfs_writepage and unionfs_sync_page, and tested
it under memory pressure: I have a couple of live CDs that use tmpfs and can
deterministically reproduce the conditions resulting in A_W_A.  I also went
back to using grab_cache_page but with the gfp_mask suggestions you made.

I'm happy to report that it all works great now!  Below is the entirety of
the new unionfs_mmap and unionfs_sync_page code.  I'd appreciate if you and
others can look it over and see if you find any problems.

Thanks,
Erez.


static int unionfs_writepage(struct page *page, struct writeback_control *wbc)
{
	int err = -EIO;
	struct inode *inode;
	struct inode *lower_inode;
	struct page *lower_page;
	char *kaddr, *lower_kaddr;
	struct address_space *mapping; /* lower inode mapping */
	gfp_t old_gfp_mask;

	inode = page->mapping->host;
	lower_inode = unionfs_lower_inode(inode);
	mapping = lower_inode->i_mapping;

	/*
	 * find lower page (returns a locked page)
	 *
	 * We turn off __GFP_IO|__GFP_FS so as to prevent a deadlock under
	 * memory pressure conditions.  This is similar to how the loop
	 * driver behaves (see loop_set_fd in drivers/block/loop.c).
	 * If we can't find the lower page, we redirty our page and return
	 * "success" so that the VM will call us again in the (hopefully
	 * near) future.
	 */
	old_gfp_mask = mapping_gfp_mask(mapping);
	mapping_set_gfp_mask(mapping, old_gfp_mask & ~(__GFP_IO|__GFP_FS));

	lower_page = grab_cache_page(mapping, page->index);
	mapping_set_gfp_mask(mapping, old_gfp_mask);

	if (!lower_page) {
		err = 0;
		set_page_dirty(page);
		goto out;
	}

	/* get page address, and encode it */
	kaddr = kmap(page);
	lower_kaddr = kmap(lower_page);

	memcpy(lower_kaddr, kaddr, PAGE_CACHE_SIZE);

	kunmap(page);
	kunmap(lower_page);

	BUG_ON(!mapping->a_ops->writepage);

	/* call lower writepage (expects locked page) */
	clear_page_dirty_for_io(lower_page); /* emulate VFS behavior */
	err = mapping->a_ops->writepage(lower_page, wbc);

	/* b/c grab_cache_page locked it and ->writepage unlocks on success */
	if (err)
		unlock_page(lower_page);
	/* b/c grab_cache_page increased refcnt */
	page_cache_release(lower_page);

	if (err < 0) {
		ClearPageUptodate(page);
		goto out;
	}
	/*
	 * Lower file systems such as ramfs and tmpfs, may return
	 * AOP_WRITEPAGE_ACTIVATE so that the VM won't try to (pointlessly)
	 * write the page again for a while.  But those lower file systems
	 * also set the page dirty bit back again.  Since we successfully
	 * copied our page data to the lower page, then the VM will come
	 * back to the lower page (directly) and try to flush it.  So we can
	 * save the VM the hassle of coming back to our page and trying to
	 * flush too.  Therefore, we don't re-dirty our own page, and we
	 * don't return AOP_WRITEPAGE_ACTIVATE back to the VM (we consider
	 * this a success).
	 */
	if (err == AOP_WRITEPAGE_ACTIVATE)
		err = 0;

	/* all is well */
	SetPageUptodate(page);
	/* lower mtimes has changed: update ours */
	unionfs_copy_attr_times(inode);

	unlock_page(page);

out:
	return err;
}


static void unionfs_sync_page(struct page *page)
{
	struct inode *inode;
	struct inode *lower_inode;
	struct page *lower_page;
	struct address_space *mapping; /* lower inode mapping */
	gfp_t old_gfp_mask;

	inode = page->mapping->host;
	lower_inode = unionfs_lower_inode(inode);
	mapping = lower_inode->i_mapping;

	/*
	 * Find lower page (returns a locked page).
	 *
	 * We turn off __GFP_IO|__GFP_FS so as to prevent a deadlock under
	 * memory pressure conditions.  This is similar to how the loop
	 * driver behaves (see loop_set_fd in drivers/block/loop.c).
	 * If we can't find the lower page, we redirty our page and return
	 * "success" so that the VM will call us again in the (hopefully
	 * near) future.
	 */
	old_gfp_mask = mapping_gfp_mask(mapping);
	mapping_set_gfp_mask(mapping, old_gfp_mask & ~(__GFP_IO|__GFP_FS));

	lower_page = grab_cache_page(mapping, page->index);
	mapping_set_gfp_mask(mapping, old_gfp_mask);

	if (!lower_page) {
		printk(KERN_ERR "unionfs: grab_cache_page failed\n");
		goto out;
	}

	/* do the actual sync */

	/*
	 * XXX: can we optimize ala RAIF and set the lower page to be
	 * discarded after a successful sync_page?
	 */
	if (mapping && mapping->a_ops && mapping->a_ops->sync_page)
		mapping->a_ops->sync_page(lower_page);

	/* b/c grab_cache_page locked it */
	unlock_page(lower_page);
	/* b/c grab_cache_page increased refcnt */
	page_cache_release(lower_page);

out:
	return;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: msync(2) bug(?), returns AOP_WRITEPAGE_ACTIVATE to userland
  - From: Hugh Dickins <[email protected]>

References:
- Re: msync(2) bug(?), returns AOP_WRITEPAGE_ACTIVATE to userland
  - From: Hugh Dickins <[email protected]>

Prev by Date: [PATCH] Only show RESOURCES_64BIT on relevant architectures
Next by Date: Re: [RFC: 2.6 patch] remove the USB_STORAGE_ONETOUCH driver
Previous by thread: Re: msync(2) bug(?), returns AOP_WRITEPAGE_ACTIVATE to userland
Next by thread: Re: msync(2) bug(?), returns AOP_WRITEPAGE_ACTIVATE to userland
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]