Re: 2.6.17-rc6-mm1 -- BUG: possible circular locking deadlock detected!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



hi Anton,

* Anton Altaparmakov <[email protected]> wrote:

> [...] It perhaps is getting confused by the special case for the table 
> of inodes ($MFT) which has the lock dependency reverse to all other 
> inodes but it is special because it can never take the lock 
> recursively (and hence deadlock) because we always keep the whole 
> runlist for $MFT in memory and should a bug or memory corruption cause 
> this not to be the case then ntfs will detect this and go BUG() so it 
> still will not deadlock...

Please help me understand NTFS locking a bit better. As far as i can see 
at the moment, the NTFS locking scenario that the lock validator flagged 
does not involve two inodes - it only involves the MFT inode itself, and 
two of its locks.

Firstly, here is a list of the NTFS terms, locks in question:

  ni              - NTFS inode structure

  mft_ni          - special "Master File Table" inode - one per fs. 
                    Consists of "MFT records", which describe an inode 
                    each. (mft_ni is also called the "big inode")

  &ni->mrec_lock  - a spinlock protecting a particular inode's MFT data. 
                    (finegrained lock for the MFT record) It is 
                    typically taken by map_mft_record() and released by 
                    unmap_mft_record().

  ni->runlist     - maps logical addresses to on-disk addresses. (There 
                    are (two) runlists, one for normal inode data, 
                    another for attribute space data.)

  &rl->lock       - ni->runlist.lock, a rw semaphore that protects the 
                    mapping data. Read-locked on access, write-locked on 
                    modification (extension of an inode, etc.).

The MFT is loaded in-memory permanently at mount time, and its runlist 
gives us access to NTFS inodes. Is its runlist loaded into memory 
permanently too? An NTFS inode's runlist gives access to the actual file 
data.

What the validator flagged is the following locking construct:

we first acquired the MFT's &ni->mrec_lock in map_mft_record(), at:

       [<c0340508>] mutex_lock+0x8/0x10
       [<c01d4d61>] map_mft_record+0x51/0x2c0
       [<c01c51e8>] ntfs_map_runlist_nolock+0x3d8/0x530
       [<c01c58b1>] ntfs_map_runlist+0x41/0x70
       [<c01c1621>] ntfs_readpage+0x8c1/0x9a0
       [<c0142e1c>] read_cache_page+0xac/0x150
       [<c01e23f2>] load_system_files+0x472/0x2250
       [<c01e4e26>] ntfs_fill_super+0xc56/0x1a50
       [<c016bdee>] get_sb_bdev+0xde/0x120
       [<c01e028b>] ntfs_get_sb+0x1b/0x30
       [<c016b413>] vfs_kern_mount+0x33/0xa0
       [<c016b4d6>] do_kern_mount+0x36/0x50
       [<c01818de>] do_mount+0x28e/0x640
       [<c0181cff>] sys_mount+0x6f/0xb0

then we read-locked &rl->lock [the MFT's runlist semaphore] later in 
map_mft_record() -> ntfs_readpage(), while still holding &ni->mrec_lock:

       [<c0134c4e>] down_read+0x2e/0x40
       [<c01c159c>] ntfs_readpage+0x83c/0x9a0
       [<c0142e1c>] read_cache_page+0xac/0x150
       [<c01d4e22>] map_mft_record+0x112/0x2c0
       [<c01d229d>] ntfs_read_locked_inode+0x8d/0x15d0
       [<c01d3c6b>] ntfs_read_inode_mount+0x48b/0xba0
       [<c01e4dcb>] ntfs_fill_super+0xbfb/0x1a50
       [<c016bdee>] get_sb_bdev+0xde/0x120
       [<c01e028b>] ntfs_get_sb+0x1b/0x30
       [<c016b413>] vfs_kern_mount+0x33/0xa0
       [<c016b4d6>] do_kern_mount+0x36/0x50
       [<c01818de>] do_mount+0x28e/0x640
       [<c0181cff>] sys_mount+0x6f/0xb0

so this is a "&ni->mrec_lock => &rl->lock" dependency for the MFT, which 
the validator recorded.

Then the validator also observed the reverse order. We first 
write-locked &rl->lock (of the MFT inode):

 [<c0134c8e>] down_write+0x2e/0x50
 [<c01c5910>] ntfs_map_runlist+0x20/0x70
 [<c01c16a1>] ntfs_readpage+0x8c1/0x9a0
 [<c0142e9c>] read_cache_page+0xac/0x150
 [<c01e2472>] load_system_files+0x472/0x2250
 [<c01e4ea6>] ntfs_fill_super+0xc56/0x1a50
 [<c016be6e>] get_sb_bdev+0xde/0x120
 [<c01e030b>] ntfs_get_sb+0x1b/0x30
 [<c016b493>] vfs_kern_mount+0x33/0xa0
 [<c016b556>] do_kern_mount+0x36/0x50
 [<c018195e>] do_mount+0x28e/0x640
 [<c0181d7f>] sys_mount+0x6f/0xb0

then we took &ni->mrec_lock [this is still the MFT inode's mrec_lock, 
and we have the &rl->lock still held]:

 [<c0340588>] mutex_lock+0x8/0x10
 [<c01d4de1>] map_mft_record+0x51/0x2c0
 [<c01c5268>] ntfs_map_runlist_nolock+0x3d8/0x530
 [<c01c5931>] ntfs_map_runlist+0x41/0x70
 [<c01c16a1>] ntfs_readpage+0x8c1/0x9a0
 [<c0142e9c>] read_cache_page+0xac/0x150
 [<c01e2472>] load_system_files+0x472/0x2250
 [<c01e4ea6>] ntfs_fill_super+0xc56/0x1a50
 [<c016be6e>] get_sb_bdev+0xde/0x120
 [<c01e030b>] ntfs_get_sb+0x1b/0x30
 [<c016b493>] vfs_kern_mount+0x33/0xa0
 [<c016b556>] do_kern_mount+0x36/0x50
 [<c018195e>] do_mount+0x28e/0x640
 [<c0181d7f>] sys_mount+0x6f/0xb0

this means a "&rl->lock => &ni->mrec_lock" dependency, which stands in 
contrast with the already observed "&ni->mrec_lock => &rl->lock" 
dependency.

The dependencies were observed for the same locks (the MFT's runlist 
lock and mrec_lock), i.e. this is not a confusion of normal inodes vs. 
the MFT inode.

First and foremost, are my observations and interpretations correct? 
Assuming that i made no mistake that invalidates my analysis, why are 
the two MFT inode locks apparently taken in opposite order?

	Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux