Re: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests

On Tue, 11 Jul 2006, Nathan Scott wrote:

On Mon, Jul 10, 2006 at 12:46:23PM -0400, Stephane Doyon wrote:

Hi,


Hi Stephane,

A few months back, a fix was introduced for a mutex double unlock warning
related to direct I/O in XFS. I believe that fix has a lock ordering
problem that can cause a deadlock.

The problem was that __blockdev_direct_IO() would unlock the i_mutex in
the READ and DIO_OWN_LOCKING case, and the mutex would be unlocked again
in xfs_read() soon after returning from __generic_file_aio_read(). Because
there are lots of execution paths down from __generic_file_aio_read() that
do not consistently release the i_mutex, the safest fix was deemed to be
to reacquire the i_mutex before returning from __blockdev_direct_IO(). At
this point however, the reader is holding an xfs ilock, and AFAICT the
i_mutex is usually taken BEFORE the xfs ilock.


That is correct, yes.  Hmm.  Subtle.  Painful.  Thanks for the detailed
report and your analysis.

We are seeing a deadlock between a process writing and another process
reading the same file, when the reader is using direct I/O. (The writer


Is that a direct writer or a buffered writer?


Whichever, both cases trigger the deadlock.

must apparently be growing the file, using either direct or buffered
I/O.) Something like this, on XFS:
(dd if=/dev/zero of=f bs=128K count=5000 & ) ; sleep 1 ; dd of=/dev/null
if=f iflag=direct bs=128K count=5000

Seen on kernels 2.6.16 and 2.6.17.

The deadlock scenario appears to be this:
-The reader goes into xfs_read(), takes the i_mutex and then an xfs_ilock
XFS_IOLOCK_SHARED, then calls down to __generic_file_aio_read() which
eventually goes down to __blockdev_direct_IO(). In there it drops the
i_mutex.
-The writer goes into xfs_write() and obtains the i_mutex. It then tries
to get an xfs_ilock XFS_ILOCK_EXCL and must wait on it since it's held by
the reader.
-The reader, still in __blockdev_direct_IO(), executes directio_worker()
and then tries to reacquire the i_mutex, and must wait on it because the
writer has it.

And so we have a deadlock.


*nod*.  This will require some thought, I'm not sure I like the sound of
your suggested workaround there a whole lot, unfortunately, but maybe it
is all we can do at this stage.  Let me ponder further and get back to you


Thank you.

(but if you want to prototype your fix further, that'd be most welcome, of
course).

Sure, well it's not very subtle. The below patch is what I'm using fornow. I haven't seen problems with it yet but it hasn't been seriouslytested.

I'm assuming that it's OK to keep holding the i_mutex during the call todirect_io_worker(), because in the DIO_LOCKING case, direct_io_worker()is called with the i_mutex held, and XFS touches i_mutex only inxfs_read() and xfs_write(), so opefully nothing will conflict.


Signed-off-by: Stephane Doyon <[email protected]>
--- linux/fs/direct-io.c.orig	2006-07-11 09:23:20.000000000 -0400
+++ linux/fs/direct-io.c	2006-07-11 09:27:54.000000000 -0400
@@ -1191,7 +1191,6 @@ __blockdev_direct_IO(int rw, struct kioc
 	loff_t end = offset;
 	struct dio *dio;
 	int release_i_mutex = 0;
-	int acquire_i_mutex = 0;

 	if (rw & WRITE)
 		current->flags |= PF_SYNCWRITE;
@@ -1254,11 +1253,6 @@ __blockdev_direct_IO(int rw, struct kioc
 				goto out;
 			}

-			if (dio_lock_type == DIO_OWN_LOCKING) {
-				mutex_unlock(&inode->i_mutex);
-				acquire_i_mutex = 1;
-			}
-		}

 		if (dio_lock_type == DIO_LOCKING)
 			down_read(&inode->i_alloc_sem);
@@ -1282,8 +1276,6 @@ __blockdev_direct_IO(int rw, struct kioc
 out:
 	if (release_i_mutex)
 		mutex_unlock(&inode->i_mutex);
-	else if (acquire_i_mutex)
-		mutex_lock(&inode->i_mutex);
 	if (rw & WRITE)
 		current->flags &= ~PF_SYNCWRITE;
 	return retval;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

References:
- Re: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests
  - From: Stephane Doyon <[email protected]>
- Re: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests
  - From: Nathan Scott <[email protected]>

Prev by Date: Re: tty's use of file_list_lock and file_move
Next by Date: Re: [PATCH] Add try_to_freeze() to rt-test kthreads
Previous by thread: Re: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests
Next by thread: [ANNOUNCE] sdparm 0.99
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]