Re: [RFC][PATCH] Fix hang in posix_locks_deadlock()

On Fri, Oct 26, 2007 at 01:07:50PM -0400, bfields wrote:
> On Thu, Oct 18, 2007 at 02:57:59PM -0400, George G. Davis wrote:
> > On Wed, Oct 17, 2007 at 02:51:57PM -0400, George G. Davis wrote:
> > > ---
> > > Not sure if this is the correct fix but it does resolve the hangs we're
> > > observing in posix_locks_deadlock().
> > 
> > Please disregard the previous patch, it's not quite right - causes occasional
> > segfaults and clearly did not retain the posix_same_owner() checks implemented
> > in the original code.  Here's a new version which I believe retains the
> > intent of the original code:
> > 
> > diff --git a/fs/locks.c b/fs/locks.c
> > index 7f9a3ea..e012b27 100644
> > --- a/fs/locks.c
> > +++ b/fs/locks.c
> > @@ -702,14 +702,12 @@ static int posix_locks_deadlock(struct file_lock *caller_fl,
> >  {
> >  	struct file_lock *fl;
> >  
> > -next_task:
> >  	if (posix_same_owner(caller_fl, block_fl))
> >  		return 1;
> >  	list_for_each_entry(fl, &blocked_list, fl_link) {
> >  		if (posix_same_owner(fl, block_fl)) {
> > -			fl = fl->fl_next;
> > -			block_fl = fl;
> > -			goto next_task;
> > +			if (posix_same_owner(caller_fl, fl))
> > +				return 1;
> >  		}
> >  	}
> >  	return 0;
> 
> It may take multiple steps to identify a deadlock.  With the above
> you'll miss deadlocks like
> 
> 	process 1 is requesting a lock held by process 2
> 	process 2 is blocking on a lock held by process 3
> 	process 3 is blocking on a lock held by process 1.
> 
> Could you give more details about how you're causing
> posix_locks_deadlock to hang?  Is there a simple test-case you can post?

Hm.  After another look: assume we have four tasks, t1, t2, t3, and t4.
Assume t1 and t2 share the same current->files (so they're the same
"owner" for the purpose of posix_same_owner()).  Assume:

	t1 is waiting on a conflicting lock held by t3.
	t2 is waiting on a conflicting lock held by t4.

Now suppose t4 requests a lock that conflicts with a lock held by t1 and
t2.  The list_for_each_entry() above will search for a task with t1 or
t2 as owner, which is waiting on a lock.  If it finds t1 first, the loop
won't be noticed, so t4 will be put to sleep.  Now we have a loop; t3
can release its lock (it no longer matters), and we'll have

	t2 waiting on a conflicting lock held by t4, and
	t4 waiting on a conflicting lock held by t2.

If a new task t5 then requests a lock conflicting with the one held by
t2, then the above function will go into an infinite loop.  I think.

Consider the directed graph with each vertex representing the set of all
tasks sharing the same file table, and each edge representing the
relationship "a task at this vertex is waiting on a lock held by a task
on another vertex".  The existance of multiple tasks with the same file
table means that we can no longer assume that each vertex has outdegree
at most one, so we have to switch to an algorithm that works on an
arbitrary directed graph.  That sounds painful.

Am I right about that, and about the example above?  It'd be interesting
to code it up just to make sure.

If so, one can imagine various bandaids, but maybe we should just rip
out the deadlock detection completely.... It's hard to imagine it being
really useful anyway.

--b.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: [RFC][PATCH] Fix hang in posix_locks_deadlock()
  - From: "J. Bruce Fields" <[email protected]>
- [PATCH] locks: fix possible infinite loop in posix deadlock detection
  - From: "J. Bruce Fields" <[email protected]>

References:
- [RFC][PATCH] Fix hang in posix_locks_deadlock()
  - From: "George G. Davis" <[email protected]>
- Re: [RFC][PATCH] Fix hang in posix_locks_deadlock()
  - From: "George G. Davis" <[email protected]>
- Re: [RFC][PATCH] Fix hang in posix_locks_deadlock()
  - From: "J. Bruce Fields" <[email protected]>

Prev by Date: Re: [PATCH] Fix bad data from non-direct-io read after direct-io write
Next by Date: Re: [patch 1/1] Drop CAP_SYS_RAWIO requirement for FIBMAP
Previous by thread: Re: [RFC][PATCH] Fix hang in posix_locks_deadlock()
Next by thread: [PATCH] locks: fix possible infinite loop in posix deadlock detection
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]