Re: [Lse-tech] Re: [PATCH 0/7]: Fix for unsafe notifier chain

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Nov 27, 2005 at 02:03:29PM -0800, Greg KH wrote:
> On Sun, Nov 27, 2005 at 11:56:40AM -0800, Andrew Morton wrote:
> > We're saying that kernel/sys.c:notifier_lock should be removed and that all
> > callers of notifier_chain_register(), notifier_chain_unregister() and
> > notifier_call_chain() should be changed to define and use their own lock.
> > 
> > So the _callers_ get to decide whether they're going to use down(),
> > spin_lock(), down_read(), read_lock(), preempt_disable(), local_irq_disable()
> > or whatever.
> 
> I completely agree.  I just watched in mute horror as Chandra and Alan
> spun off into the rcu blackhole trying to create one-size-fits-all
> notifiers.

RCU as blackhole?  I am impressed -- even -I- don't find RCU to have the
same limiting-case attraction that a black hole does.  ;-)

(My apologies, Greg, but you did leave yourself open for this!)

> Making the user do the locking it needs to do is simple and sane.
> 
> And the reason USB's notifiers are implemented correctly, is they don't
> use the notifier core, but rather, reimplemented their own, due to the
> locking mess.

The locking mess is due to the fact that the notifier chain itself
must be protected by something or another, right?  Here are the options
I have come across:

1.	The notifier chain is guarded by a separate notifier-chain lock.
	We then have the following deadlock situation:

	o	The subsystem requesting the notifier likely has
		to hold one of its own locks when registering and
		unregistering, which means that the notifier lock
		must be subordinate to the subsystem lock.

	o	But when invoking the notifiers, the notifier lock
		must be held while walking the chain.  Since the
		subsystem being notified likely has to acquire one
		of its own locks, the subsystem lock must be subordinate
		to the notifier lock.

	There are a number of ways of breaking this deadlock, some of
	which are listed below.  Note that this deadlock situation can
	arise both for spinlocks and for sleeplocks.

	I believe that this deadlock situation is the core reason why
	notifier locking is so difficult to get right.

	Even ignoring the deadlock, this does not work for NMIs.

2.	Each element of the notifier chain is guarded by reference
	counts, and the chain itself is guarded by a lock.  Each
	element of the chain holds a reference to the next element
	in the chain, and new elements are always added to the end
	of the chain.  The traversal code looks something like the
	following:

	spin_lock(&chain_lock);
	p = head;
	atomic_inc(&p->refcnt);
	while (p != NULL) {
		spin_unlock(&chain_lock);
		p->func(p->arg);
		spin_lock(&chain_lock);
		q = p->next;
		atomic_inc(&q->refcnt);
		if (atomic_dec_and_test(&p->refcnt)) {
			kfree(p);
		}
		p = q;
	}
	spin_unlock(&chain_lock);

	Note that all actual deletion is done under chain_lock.

	This works (I think...), and the subsystem code can use a
	single lock that the notifier lock (chain_lock in this case)
	is subordinate to.  But the notifier traversal loop is quite
	heavyweight.  And it cannot be invoked from NMI handlers without
	risk of self-deadlock.

3.	Like #1, but require that the subsystem have at least two locks,
	one of which is subordinate to the notifier lock (and is acquired
	from the notifier callback function), and the other of which is
	superior to the notifier lock, and is held when registering and
	deregistering callbacks.  This can work, but could significantly
	complicate the subsystem locking, and also will require that some
	operations acquire two locks instead of just one, since one must
	hold both to exclude both notifications and register/unregister
	operations.

	This approach also fails to handle notifications from NMIs.

4.	"Just say no" to a separate notifier-chain lock, and guard
	the hole thing with whatever mechanism the subsystem
	deems appropriate.  This can be made to work with NMI-based
	notification, since such subsystems can use RCU or whatever
	other lock-free mechanism they desire.

	I believe that Greg took this "just say no" approach in USB,
	but could easily be missing something.

	This works with minimal added complexity to the subsystem,
	but requires that each subsystem have its own notifier
	chain, since it does not make sense to have a single chain
	guarded by multiple locks.  But it means that notifier
	code must be replicated in a number of subsystems, which
	seems sub-optimal.

	This might be the best we can do, but in the interest of
	completeness...

5.	Use RCU to traverse the notifier chain and use simple locking
	to guard updates, similar to Alan's and Chandra's patch.
	This avoids the deadlock, and handles NMIs nicely, but does
	not tolerate synchronous notification callbacks that sleep.
	(Cases that can tolerate asynchronous notification can make use
	of workqueues or similar mechanisms to defer the sleeping code
	so that it is not executed in the scope of the rcu_read_lock()
	protecting the notifier-chain traversal.)

6.	Have separate mechanisms for the non-sleeping and the synchronous
	sleeping cases.  For example, #5 for non-sleeping and either #2,
	#3, or #4 for the synchronous sleeping case.

	This works in all cases, and achieves a high degree of code
	commonality, but does require two separate APIs.

7.	Use wait-free synchronization everywhere.  This has some issues
	with complexity, last I checked.

8.	Come up with a safe and sane RCU implementation that tolerates
	general blocking in read-side critical sections.  Note that
	although some realtime implementations of RCU do tolerate
	blocking, they do so only in the special case that priority
	inheritance applies to.  Might happen,
	but I would not recommend holding your breath.

Any options I missed?

							Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux