Re: [PATCH] fix race in AF_UNIX

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> > > Holding a global mutex over recvmsg() calls under AF_UNIX is pretty
> > > much a non-starter, this will kill performance for multi-threaded
> > > apps.
> > 
> > That's an rwsem held for read.  It's held for write in unix_gc() only
> > for a short duration, and unix_gc() should only rarely be called.  So
> > I don't think there's any performance problem here.
> 
> It pulls a non-local cacheline into the local thread, that's extremely
> expensive on SMP.

OK, here's an updated patch, that uses ->readlock, and passes my
testing.

----
From: Miklos Szeredi <[email protected]>

There are races involving the garbage collector, that can throw away
perfectly good packets with AF_UNIX sockets in them.

The problems arise when a socket goes from installed to in-flight or
vice versa during garbage collection.  Since gc is done with a
spinlock held, this only shows up on SMP.

Signed-off-by: Miklos Szeredi <[email protected]>
---

Index: linux-2.6.22-rc2/net/unix/garbage.c
===================================================================
--- linux-2.6.22-rc2.orig/net/unix/garbage.c	2007-06-03 23:58:11.000000000 +0200
+++ linux-2.6.22-rc2/net/unix/garbage.c	2007-06-06 09:48:36.000000000 +0200
@@ -186,7 +186,21 @@ void unix_gc(void)
 
 	forall_unix_sockets(i, s)
 	{
-		unix_sk(s)->gc_tree = GC_ORPHAN;
+		struct unix_sock *u = unix_sk(s);
+
+		u->gc_tree = GC_ORPHAN;
+
+		/*
+		 * Hold ->readlock to protect against sockets going from
+		 * in-flight to installed
+		 *
+		 * Can't sleep on this, because
+		 *   a) we are under spinlock
+		 *   b) skb_recv_datagram() could be waiting for a packet that
+		 *      is to be sent by this thread
+		 */
+		if (!mutex_trylock(&u->readlock))
+			goto lock_failed;
 	}
 	/*
 	 *	Everything is now marked
@@ -207,8 +221,6 @@ void unix_gc(void)
 
 	forall_unix_sockets(i, s)
 	{
-		int open_count = 0;
-
 		/*
 		 *	If all instances of the descriptor are not
 		 *	in flight we are in use.
@@ -218,10 +230,20 @@ void unix_gc(void)
 		 *	In this case (see unix_create1()) we set artificial
 		 *	negative inflight counter to close race window.
 		 *	It is trick of course and dirty one.
+		 *
+		 *	Get the inflight counter first, then the open
+		 *	counter.  This avoids problems if racing with
+		 *	sendmsg
+		 *
+		 *	If just created socket is not yet attached to
+		 *	a file descriptor, assume open_count of 1
 		 */
+		int inflight_count = atomic_read(&unix_sk(s)->inflight);
+		int open_count = 1;
+
 		if (s->sk_socket && s->sk_socket->file)
 			open_count = file_count(s->sk_socket->file);
-		if (open_count > atomic_read(&unix_sk(s)->inflight))
+		if (open_count > inflight_count)
 			maybe_unmark_and_push(s);
 	}
 
@@ -300,6 +322,7 @@ void unix_gc(void)
 			spin_unlock(&s->sk_receive_queue.lock);
 		}
 		u->gc_tree = GC_ORPHAN;
+		mutex_unlock(&u->readlock);
 	}
 	spin_unlock(&unix_table_lock);
 
@@ -309,4 +332,19 @@ void unix_gc(void)
 
 	__skb_queue_purge(&hitlist);
 	mutex_unlock(&unix_gc_sem);
+	return;
+
+ lock_failed:
+	{
+		struct sock *s1;
+		forall_unix_sockets(i, s1) {
+			if (s1 == s) {
+				spin_unlock(&unix_table_lock);
+				mutex_unlock(&unix_gc_sem);
+				return;
+			}
+			mutex_unlock(&unix_sk(s1)->readlock);
+		}
+		BUG();
+	}
 }

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux