Re: 2.6.17-rc6-mm1 -- BUG: possible circular locking deadlock detected!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



* Anton Altaparmakov <[email protected]> wrote:

> I will do that when I have some more spare time...  I don't think it 
> is particularly urgent given it has always been like this and if 
> anyone is trying to kill their file system even fixing this will not 
> stop them... They would just need to run the ntfsprogs provided 
> utilities to achieve anything they like... [...]

i agree that it's not urgent, but i think we have a small disagreement 
wrt. the philosophy behind that conclusion :-)

manipulating a device file (i.e. running ntfsprogs) is a known "hang 
yourself" thing that people are and must be aware of - but messing 
around within the VFS namespace of a filesystem itself ought to be safe, 
even if the file can only be accessed via a special mount option and if 
it's called $Bitmap. The mount option doesnt really say that it's 
unsafe:

       show_sys_files
              If  show_sys_files is specified, show the system files in direc-
              tory listings.  Otherwise the default behaviour is to  hide  the
              system  files.  Note that even when show_sys_files is specified,
              "$MFT" may will not  be  visible  due  to  bugs/mis-features  in
              glibc.   Further,  note that irrespective of show_sys_files, all
              files are accessible by name, i.e. you  can  always  do  "ls  -l
              '$UpCase'" for example to specifically show the system file con-
              taining the Unicode upcase table.

and the dangers of direct namespace access are apparently anticipated in 
the permissions of the $Mft itself:

  [root@neptune mnt2]# ls -l \$Mft
  ----------  1 root root 69632 Jan  1  1970 $Mft

and finally, lets call this a real bug in NTFS: we both worked hard to 
cover it via the lock validator =B-)

in any case, the message will only trigger if someone abuses one of the 
system files, so there is no urgency, and that will also serve as a 
reminder.

below i've attached a cleaned up version of my NTFS annotations so far. 
I think they are pretty straightforward and nonintrusive - for 
!CONFIG_LOCKDEP they add zero changes to ntfs.ko. Most of the linecount 
of the patch comes from comments - so i think we might even consider 
this a "document locking rules" patch :-)

	Ingo

---------------------------
Subject: lock validator: annotate NTFS locking rules
From: Ingo Molnar <[email protected]>

NTFS uses lots of type-opaque objects which acquire their true
identity runtime - so the lock validator needs to be helped in
a couple of places to figure out object types.

Many thanks to Anton Altaparmakov for giving lots of explanations
about NTFS locking rules.

Signed-off-by: Ingo Molnar <[email protected]>
---
 fs/ntfs/inode.c |   33 +++++++++++++++++++++++++++++++++
 fs/ntfs/mft.c   |    2 +-
 fs/ntfs/super.c |   31 +++++++++++++++++++++++++++++++
 3 files changed, 65 insertions(+), 1 deletion(-)

Index: linux/fs/ntfs/inode.c
===================================================================
--- linux.orig/fs/ntfs/inode.c
+++ linux/fs/ntfs/inode.c
@@ -367,6 +367,12 @@ static void ntfs_destroy_extent_inode(nt
 	kmem_cache_free(ntfs_inode_cache, ni);
 }
 
+/*
+ * The attribute runlist lock has separate locking rules from the
+ * normal runlist lock, so split the two lock-types:
+ */
+static struct lockdep_type_key attr_list_rl_lock_type;
+
 /**
  * __ntfs_init_inode - initialize ntfs specific part of an inode
  * @sb:		super block of mounted volume
@@ -394,6 +400,8 @@ void __ntfs_init_inode(struct super_bloc
 	ni->attr_list_size = 0;
 	ni->attr_list = NULL;
 	ntfs_init_runlist(&ni->attr_list_rl);
+	lockdep_reinit_key(&ni->attr_list_rl.lock,
+				&attr_list_rl_lock_type);
 	ni->itype.index.bmp_ino = NULL;
 	ni->itype.index.block_size = 0;
 	ni->itype.index.vcn_size = 0;
@@ -405,6 +413,13 @@ void __ntfs_init_inode(struct super_bloc
 	ni->ext.base_ntfs_ino = NULL;
 }
 
+/*
+ * Extent inodes get MFT-mapped in a nested way, while the base inode
+ * is still mapped. Teach this nesting to the lock validator by creating
+ * a separate type for nested inode's mrec_lock's:
+ */
+static struct lockdep_type_key extent_inode_mrec_lock_key;
+
 inline ntfs_inode *ntfs_new_extent_inode(struct super_block *sb,
 		unsigned long mft_no)
 {
@@ -413,6 +428,7 @@ inline ntfs_inode *ntfs_new_extent_inode
 	ntfs_debug("Entering.");
 	if (likely(ni != NULL)) {
 		__ntfs_init_inode(sb, ni);
+		lockdep_reinit_key(&ni->mrec_lock, &extent_inode_mrec_lock_key);
 		ni->mft_no = mft_no;
 		ni->type = AT_UNUSED;
 		ni->name = NULL;
@@ -1722,6 +1738,15 @@ err_out:
 	return err;
 }
 
+/*
+ * The MFT inode has special locking, so teach the lock validator
+ * about this by splitting off the locking rules of the MFT from
+ * the locking rules of other inodes. The MFT inode can never be
+ * accessed from the VFS side (or even internally), only by the
+ * map_mft functions.
+ */
+static struct lockdep_type_key mft_ni_runlist_lock_key, mft_ni_mrec_lock_key;
+
 /**
  * ntfs_read_inode_mount - special read_inode for mount time use only
  * @vi:		inode to read
@@ -2148,6 +2173,14 @@ int ntfs_read_inode_mount(struct inode *
 	ntfs_attr_put_search_ctx(ctx);
 	ntfs_debug("Done.");
 	ntfs_free(m);
+
+	/*
+	 * Split the locking rules of the MFT inode from the
+	 * locking rules of other inodes:
+	 */
+	lockdep_reinit_key(&ni->runlist.lock, &mft_ni_runlist_lock_key);
+	lockdep_reinit_key(&ni->mrec_lock, &mft_ni_mrec_lock_key);
+
 	return 0;
 
 em_put_err_out:
Index: linux/fs/ntfs/mft.c
===================================================================
--- linux.orig/fs/ntfs/mft.c
+++ linux/fs/ntfs/mft.c
@@ -348,7 +348,7 @@ map_err_out:
 		base_ni->ext.extent_ntfs_inos = tmp;
 	}
 	base_ni->ext.extent_ntfs_inos[base_ni->nr_extents++] = ni;
-	mutex_unlock(&base_ni->extent_lock);
+	mutex_unlock_non_nested(&base_ni->extent_lock);
 	atomic_dec(&base_ni->count);
 	ntfs_debug("Done 2.");
 	*ntfs_ino = ni;
Index: linux/fs/ntfs/super.c
===================================================================
--- linux.orig/fs/ntfs/super.c
+++ linux/fs/ntfs/super.c
@@ -1724,6 +1724,14 @@ upcase_failed:
 	return FALSE;
 }
 
+/*
+ * The lcn and mft bitmap inodes are NTFS-internal inodes with
+ * their own special locking rules:
+ */
+static struct lockdep_type_key
+	lcnbmp_runlist_lock_key, lcnbmp_mrec_lock_key,
+	mftbmp_runlist_lock_key, mftbmp_mrec_lock_key;
+
 /**
  * load_system_files - open the system files using normal functions
  * @vol:	ntfs super block describing device whose system files to load
@@ -1780,6 +1788,10 @@ static BOOL load_system_files(ntfs_volum
 		ntfs_error(sb, "Failed to load $MFT/$BITMAP attribute.");
 		goto iput_mirr_err_out;
 	}
+	lockdep_reinit_key(&NTFS_I(vol->mftbmp_ino)->runlist.lock,
+			   &mftbmp_runlist_lock_key);
+	lockdep_reinit_key(&NTFS_I(vol->mftbmp_ino)->mrec_lock,
+			   &mftbmp_mrec_lock_key);
 	/* Read upcase table and setup @vol->upcase and @vol->upcase_len. */
 	if (!load_and_init_upcase(vol))
 		goto iput_mftbmp_err_out;
@@ -1802,6 +1814,11 @@ static BOOL load_system_files(ntfs_volum
 			iput(vol->lcnbmp_ino);
 		goto bitmap_failed;
 	}
+	lockdep_reinit_key(&NTFS_I(vol->lcnbmp_ino)->runlist.lock,
+			   &lcnbmp_runlist_lock_key);
+	lockdep_reinit_key(&NTFS_I(vol->lcnbmp_ino)->mrec_lock,
+			   &lcnbmp_mrec_lock_key);
+
 	NInoSetSparseDisabled(NTFS_I(vol->lcnbmp_ino));
 	if ((vol->nr_clusters + 7) >> 3 > i_size_read(vol->lcnbmp_ino)) {
 		iput(vol->lcnbmp_ino);
@@ -2742,6 +2759,17 @@ static int ntfs_fill_super(struct super_
 	struct inode *tmp_ino;
 	int blocksize, result;
 
+	/*
+	 * We do a pretty difficult piece of bootstrap by reading the
+	 * MFT (and other metadata) from disk into memory. We'll only
+	 * release this metadata during umount, so the locking patterns
+	 * observed during bootstrap do not count. So turn off the
+	 * observation of locking patterns (strictly for this context
+	 * only) while mounting NTFS. [The validator is still active
+	 * otherwise, even for this context: it will for example record
+	 * lock type registrations.]
+	 */
+	lockdep_off();
 	ntfs_debug("Entering.");
 #ifndef NTFS_RW
 	sb->s_flags |= MS_RDONLY;
@@ -2753,6 +2781,7 @@ static int ntfs_fill_super(struct super_
 		if (!silent)
 			ntfs_error(sb, "Allocation of NTFS volume structure "
 					"failed. Aborting mount...");
+		lockdep_on();
 		return -ENOMEM;
 	}
 	/* Initialize ntfs_volume structure. */
@@ -2939,6 +2968,7 @@ static int ntfs_fill_super(struct super_
 		mutex_unlock(&ntfs_lock);
 		sb->s_export_op = &ntfs_export_ops;
 		lock_kernel();
+		lockdep_on();
 		return 0;
 	}
 	ntfs_error(sb, "Failed to allocate root directory.");
@@ -3058,6 +3088,7 @@ err_out_now:
 	sb->s_fs_info = NULL;
 	kfree(vol);
 	ntfs_debug("Failed, returning -EINVAL.");
+	lockdep_on();
 	return -EINVAL;
 }
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux