On Mon, 3 Dec 2007 16:52:47 +0300
"Denis V. Lunev" <[email protected]> wrote:
> There is a AB-BA deadlock regarding drop_caches sysctl. Here are the code
> paths:
>
> drop_pagecache
> spin_lock(&inode_lock);
> invalidate_mapping_pages
> try_to_release_page
> ext3_releasepage
> journal_try_to_free_buffers
> __journal_try_to_free_buffer
> spin_lock(&journal->j_list_lock);
>
> __journal_temp_unlink_buffer (called under journal->j_list_lock by comments)
> mark_buffer_dirty
> __set_page_dirty
> __mark_inode_dirty
> spin_lock(&inode_lock);
>
> The patch tries to address the issue - it drops inode_lock before digging into
> invalidate_inode_pages. This seems sane as inode hold should not gone from the
> list and should not change its place.
>
> Signed-off-by: Denis V. Lunev <[email protected]>
> --
> diff --git a/fs/drop_caches.c b/fs/drop_caches.c
> index 59375ef..4ac80d8 100644
> --- a/fs/drop_caches.c
> +++ b/fs/drop_caches.c
> @@ -14,15 +14,27 @@ int sysctl_drop_caches;
>
> static void drop_pagecache_sb(struct super_block *sb)
> {
> - struct inode *inode;
> + struct inode *inode, *old;
>
> + old = NULL;
> spin_lock(&inode_lock);
> list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> if (inode->i_state & (I_FREEING|I_WILL_FREE))
> continue;
> - __invalidate_mapping_pages(inode->i_mapping, 0, -1, true);
> + __iget(inode);
> + spin_unlock(&inode_lock);
> +
> + if (old != NULL)
> + iput(old);
> + invalidate_mapping_pages(inode->i_mapping, 0, -1);
> + old = inode;
> +
> + spin_lock(&inode_lock);
> }
> spin_unlock(&inode_lock);
> +
> + if (old != NULL)
> + iput(old);
> }
We need to hold onto inode_lock while walking sb->s_inodes. Otherwise the
inode which we're currently looking at could get removed from i_sb_list and
bad things will happen (drop_pagecache_sb will go infinite, or will oops, I
guess).
drop_caches is bad this way - it has a couple of ranking errors. A
suitable fix would be to remove the drop_caches feature, but it seems to be
fairly popular as a developer thing. The approach thus far has been "yeah,
sorry about that, but drop_caches is only for development and it is
root-only anyway".
We could fix this particular issue by changing JBD to run
mark_inode_dirty() outside list_lock (which would be a good change
independent of the drop_caches issue) but other problems with drop_caches
will remain.
One way to fix jbd (and jbd2) would be:
static void __journal_temp_unlink_buffer(struct journal_head *jh,
struct buffer_head **bh_to_dirty)
{
*bh_to_dirty = NULL;
...
if (test_clear_buffer_jbddirty(bh))
*bh_to_dirty = bh;
}
{
struct buffer_head *bh_to_dirty; /* probably needs uninitialized_var() */
...
__journal_temp_unlink_buffer(jh, &bh_to_dirty);
...
jbd_mark_buffer_dirty(bh_to_dirty);
brelse(bh_to_dirty);
...
}
static inline void jbd_mark_buffer_dirty(struct buffer_head *bh)
{
if (bh)
mark_buffer_dirty(bh);
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]