[-mm patch] Fix inotify umount hangs.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Andrew, Robert,

The below patch against 2.6.13-rc1-mm1 fixes the umount hangs caused by 
inotify.

It excludes more inodes from being messed around with in 
inotify_unmount_inodes(): the ones with zero i_count as they cannot have 
any watches and the I_WILL_FREE ones which it is not allowed to do 
__iget() on.

Note that at this stage MS_ACTIVE is cleared on the superblock which means 
that iput() does not need shrink_icache_memory(), it evicts the inodes 
reaching zero i_count itself and immediately thus if inotify does __iget 
and then iput on such zero i_count inode it will evict it from icache and 
from the i_sb_list!  So the assumption that i_sb_list does not change 
whilts inotify_umount_inodes is running is bogus.  Even with this patch 
the list changes, e.g. when watches are removed i_count can hit zero and 
the iput at the bottom of the list_for_each_safe() loop _will_ modify 
i_sb_list by evicting the inode from icache there and then.

Further to this NTFS drops references to internal system inodes (also in 
icache) during its ->put_inode() method, e.g. this happens for directory 
inodes, their bitmap inode is iput when the directory has its final iput 
done on it which can be triggered by a removed inotify watch.  This can 
cause the "next_i" inode to also be evicted which leads to hangs and 
crashes since it no longer is on i_sb_list and is indeed freed memory 
which it is illegal to access.

This patch solves this by taking a reference on next_i for as long as it 
is needed, again with the same constraits as placed on taking a reference 
on the inode itself.

This may not be the prettiest but it works.  (-:

Signed-off-by: Anton Altaparmakov <[email protected]>

NB. A cleaner solution _might_ be (I am not sure, haven't tried to 
write it) to do-your-own for loop instead of list_for_each_safe() and 
protect the current inode with a reference and only drop the reference 
when the next inode has had a reference obtained on it but this pretty 
much amounts to what I do in this patch...

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/

--- linux-2.6.13-rc1-mm1-vanilla/fs/inotify.c	2005-07-01 14:51:09.000000000 +0100
+++ linux-2.6.13-rc1-mm1/fs/inotify.c	2005-07-04 20:04:31.000000000 +0100
@@ -560,43 +560,65 @@ EXPORT_SYMBOL_GPL(inotify_get_cookie);
  */
 void inotify_unmount_inodes(struct list_head *list)
 {
-	struct inode *inode, *next_i;
+	struct inode *inode, *next_i, *need_iput = NULL;
 
 	list_for_each_entry_safe(inode, next_i, list, i_sb_list) {
+		struct inode *need_iput_tmp;
 		struct inotify_watch *watch, *next_w;
 		struct list_head *watches;
 
 		/*
-		 * We cannot __iget() an inode in state I_CLEAR or I_FREEING,
-		 * which is fine becayse by that point the inode cannot have
-		 * any associated watches.
+		 * If i_count is zero, the inode cannot have any watches and
+		 * doing an __iget/iput with MS_ACTIVE clear would actually
+		 * evict all inodes with zero i_count from icache which is
+		 * unnecessarily violent and may in fact be illegal to do.
 		 */
-		if (inode->i_state & (I_CLEAR | I_FREEING))
+		if (!atomic_read(&inode->i_count))
 			continue;
-
-		/* In case the remove_watch() drops a reference */
-		__iget(inode);
-
 		/*
-		 * We can safely drop inode_lock here because the per-sb list
-		 * of inodes must not change during unmount and iprune_sem
-		 * keeps shrink_icache_memory() away.
+		 * We cannot __iget() an inode in state I_CLEAR, I_FREEING, or
+		 * I_WILL_FREE which is fine because by that point the inode
+		 * cannot have any associated watches.
+		 */
+		if (inode->i_state & (I_CLEAR | I_FREEING | I_WILL_FREE))
+			continue;
+		need_iput_tmp = need_iput;
+		need_iput = NULL;
+		/* In case the remove_watch() drops a reference. */
+		if (inode != need_iput_tmp)
+			__iget(inode);
+		else
+			need_iput_tmp = NULL;
+		/* In case the dropping of a reference would nuke next_i. */
+		if ((&next_i->i_sb_list != list) &&
+				atomic_read(&next_i->i_count) &&
+				!(next_i->i_state & (I_CLEAR | I_FREEING |
+				I_WILL_FREE))) {
+			__iget(next_i);
+			need_iput = next_i;
+		}
+		/*
+		 * We can safely drop inode_lock here because we hold
+		 * references on both inode and next_i.  Also no new inodes
+		 * will be added since the umount has begun.  Finally,
+		 * iprune_sem keeps shrink_icache_memory() away.
 		 */
 		spin_unlock(&inode_lock);
-
-		/* for each watch, send IN_UNMOUNT and then remove it */
+		if (need_iput_tmp)
+			iput(need_iput_tmp);
+		/* For each watch, send IN_UNMOUNT and then remove it. */
 		down(&inode->inotify_sem);
 		watches = &inode->inotify_watches;
 		list_for_each_entry_safe(watch, next_w, watches, i_list) {
 			struct inotify_device *dev = watch->dev;
 			down(&dev->sem);
-			inotify_dev_queue_event(dev, watch, IN_UNMOUNT,0,NULL);
+			inotify_dev_queue_event(dev, watch, IN_UNMOUNT, 0,
+					NULL);
 			remove_watch(watch, dev);
 			up(&dev->sem);
 		}
 		up(&inode->inotify_sem);
 		iput(inode);
-
 		spin_lock(&inode_lock);
 	}
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]
  Powered by Linux