Re: 2.6.17-rc6-mm1 -- BUG: possible circular locking deadlock detected!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 11 Jun 2006, Ingo Molnar wrote:
> * Anton Altaparmakov <[email protected]> wrote:
> 
> > I think the lock validator has the problem of not knowing that there 
> > are two different types of runlist which is why it complains about it.
> 
> ah, ok! What happened is that the rwlock_init() 'lock type keys' 
> (inlined via ntfs_init_runlist()) for the two runlists were 'merged':
> 
>         ntfs_init_runlist(&ni->runlist);
>         ntfs_init_runlist(&ni->attr_list_rl);
> 
> i have annotated things by initializing the two locks separately (via a 
> simple oneliner change), and this has solved the problem.
> 
> The two types are now properly 'split', and the validator tracks them 
> separately and understands their separate roles. So there's no need to 
> touch attribute runlist locking in the NTFS code.

Great!

> Some background: the validator uses lock initialization as a hint about 

Thanks for explaining.

> The good news is that after this fix things went pretty well for 
> readonly stuff and i got no new complaints from the validator. Phew! :-) 

Great!

> It does not fully cover read-write mode yet. When extending an existing 
> file the validator did not understand the following locking construct:
> 
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> -------------------------------------------------------
> cat/2802 is trying to acquire lock:
>  (&vol->lcnbmp_lock){----}, at: [<c01e80dd>] ntfs_cluster_alloc+0x10d/0x23a0
> 
> but task is already holding lock:
>  (&ni->mrec_lock){--..}, at: [<c01d5e53>] map_mft_record+0x53/0x2c0
> 
> which lock already depends on the new lock,
> which could lead to circular dependencies.
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #2 (&ni->mrec_lock){--..}:
>        [<c01394df>] lock_acquire+0x6f/0x90
>        [<c0346193>] mutex_lock_nested+0x73/0x2a0
>        [<c01d5e53>] map_mft_record+0x53/0x2c0
>        [<c01c54f8>] ntfs_map_runlist_nolock+0x3d8/0x530
>        [<c01c5bc1>] ntfs_map_runlist+0x41/0x70
>        [<c01c1929>] ntfs_readpage+0x8c9/0x9b0
>        [<c0142ffc>] read_cache_page+0xac/0x150
>        [<c01e213d>] ntfs_statfs+0x41d/0x660
>        [<c0163254>] vfs_statfs+0x54/0x70
>        [<c0163288>] vfs_statfs64+0x18/0x30
>        [<c0163384>] sys_statfs64+0x64/0xa0
>        [<c0347ddd>] sysenter_past_esp+0x56/0x8d
> 
> -> #1 (&rl->lock){----}:
>        [<c01394df>] lock_acquire+0x6f/0x90
>        [<c0134c8a>] down_read_nested+0x2a/0x40
>        [<c01c18a4>] ntfs_readpage+0x844/0x9b0
>        [<c0142ffc>] read_cache_page+0xac/0x150
>        [<c01e213d>] ntfs_statfs+0x41d/0x660
>        [<c0163254>] vfs_statfs+0x54/0x70
>        [<c0163288>] vfs_statfs64+0x18/0x30
>        [<c0163384>] sys_statfs64+0x64/0xa0
>        [<c0347ddd>] sysenter_past_esp+0x56/0x8d
> 
> -> #0 (&vol->lcnbmp_lock){----}:
>        [<c01394df>] lock_acquire+0x6f/0x90
>        [<c0134ccc>] down_write+0x2c/0x50
>        [<c01e80dd>] ntfs_cluster_alloc+0x10d/0x23a0
>        [<c01c427d>] ntfs_attr_extend_allocation+0x5fd/0x14a0
>        [<c01caa38>] ntfs_file_buffered_write+0x188/0x3880
>        [<c01ce2a8>] ntfs_file_aio_write_nolock+0x178/0x210
>        [<c01ce3f1>] ntfs_file_writev+0xb1/0x150
>        [<c01ce4af>] ntfs_file_write+0x1f/0x30
>        [<c0164f09>] vfs_write+0x99/0x160
>        [<c016589d>] sys_write+0x3d/0x70
>        [<c0347ddd>] sysenter_past_esp+0x56/0x8d
> 
> other info that might help us debug this:
> 
> 3 locks held by cat/2802:
>  #0:  (&inode->i_mutex){--..}, at: [<c0346118>] mutex_lock+0x8/0x10
>  #1:  (&rl->lock){----}, at: [<c01c3dbe>] ntfs_attr_extend_allocation+0x13e/0x14a0
>  #2:  (&ni->mrec_lock){--..}, at: [<c01d5e53>] map_mft_record+0x53/0x2c0
> 
> stack backtrace:
>  [<c0104bf2>] show_trace+0x12/0x20
>  [<c0104c19>] dump_stack+0x19/0x20
>  [<c0136ef1>] print_circular_bug_tail+0x61/0x70
>  [<c01389ff>] __lock_acquire+0x74f/0xde0
>  [<c01394df>] lock_acquire+0x6f/0x90
>  [<c0134ccc>] down_write+0x2c/0x50
>  [<c01e80dd>] ntfs_cluster_alloc+0x10d/0x23a0
>  [<c01c427d>] ntfs_attr_extend_allocation+0x5fd/0x14a0
>  [<c01caa38>] ntfs_file_buffered_write+0x188/0x3880
>  [<c01ce2a8>] ntfs_file_aio_write_nolock+0x178/0x210
>  [<c01ce3f1>] ntfs_file_writev+0xb1/0x150
>  [<c01ce4af>] ntfs_file_write+0x1f/0x30
>  [<c0164f09>] vfs_write+0x99/0x160
>  [<c016589d>] sys_write+0x3d/0x70
>  [<c0347ddd>] sysenter_past_esp+0x56/0x8d
> 
> this seems to be a pretty complex 3-way dependency related to 
> &vol->lcnbmp_lock and &ni->mrec_lock. Should i send a full dependency 
> events trace perhaps?

First an explanation of two relevant locks that are both going to upset 
the validator.  I half expected this to happen given what has happened so 
far.  The two locks are lcnbmp_lock and mftbmp_lock (both are r/w 
semaphores).

Both those locks lock the two big bitmaps on an NTFS volume:

lcnbmp_lock locks accesses to the cluster bitmap (i.e. the physical blocks 
bitmap where each 0 bit means an unallocated block and a 1 bit means an 
allocated block).

mftbmp_lock locks accesses to the mft bitmap (i.e. the inodes bitmap where 
each 0 bit means an unused inode and a 1 bit means the inode is in use).

The locks are taken when a cluster is being allocated / an mft record is 
being allocated.

The reason the validator will likely get very confused is that they 
introduce reverse semantics depending on what you are locking, i.e.

When extending a file the locking order is:

Lock runlist (data one) of file to be extended for writing (as it will be 
modified by the file extension).

Lock lcnbmp_lock for writing (as the bitmap will be modified by the 
allocated clusters).

Lock runlist of $Bitmap system file (the contents of which contain the 
bitmap) for reading so the on-disk physical location can be determined.

If the above fails then re-lock the runlist of $Bitmap for writing and 
lock mft record of $Bitmap system file and load in the runlist of the 
bitmap, the unlock the mft record, unlock the runlist of $Bitmap.

Load the $Bitmap data and allocate clusters in the bitmap.

Unlock lcnbmp_lock.

Update the runlist of the file being extended with the newly allocated 
clusters.

Lock the mft record of the file being extended and update the mft record 
with the new allocations.

Unlock the mft record and the runlist of the file that was extended.

So yes lock dependencies are inverted the validator complains.  But they 
are inverted in a special way that is safe.

Is the above description sufficient for you to fix it?

The mftbmp_lock works simillarly as above but instead of allocating 
clusters we are allocating an mft record, the locking sequence is the 
same though just substitute s/lcnbmp_lock/mftbmp_lock/ and s/cluster 
allocation/mft record allocation/.

Extending the mft in turn can require the mft itself to be extended which 
will take the lcnbmp_lock, thus mftbmp_lock nests outside the lcnbmp_lock 
and that should never be violated so the validator should be at least 
happy with that bit.  (-;

Sorry ntfs is that locking evil!

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer, http://www.linux-ntfs.org/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux