Re: RT Mutex patch and tester [PREEMPT_RT]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have patched against 2.6.15-rt15 and I have found a hyperthreaded P4
machine. It works fine on that one.

Esben

On Mon, 23 Jan 2006, Esben Nielsen wrote:

> On Mon, 23 Jan 2006, Steven Rostedt wrote:
>
> > On Mon, 2006-01-23 at 10:33 +0100, Esben Nielsen wrote:
> > > On Sun, 22 Jan 2006, Bill Huey wrote:
> > >
> > > > On Mon, Jan 23, 2006 at 01:20:12AM +0100, Esben Nielsen wrote:
> > > > > Here is the problem:
> > > > >
> > > > > Task B (non-RT) takes BKL. It then takes mutex 1. Then B
> > > > > tries to lock mutex 2, which is owned by task C. B goes blocks and releases the
> > > > > BKL. Our RT task A comes along and tries to get 1. It boosts task B
> > > > > which boosts task C which releases mutex 2. Now B can continue? No, it has
> > > > > to reaquire BKL! The netto effect is that our RT task A waits for BKL to
> > > > > be released without ever calling into a module using BKL. But just because
> > > > > somebody in some non-RT code called into a module otherwise considered
> > > > > safe for RT usage with BKL held, A must wait on BKL!
> > > >
> > > > True, that's major suckage, but I can't name a single place in the kernel that
> > > > does that.
> > >
> > > Sounds good. But someone might put it in...
> >
> > Hmm, I wouldn't be surprised if this is done somewhere in the VFS layer.
> >
> > >
> > > > Remember, BKL is now preemptible so the place that it might sleep
> > > > similar
> > > > to the above would be in spinlock_t definitions.
> > > I can't see that from how it works. It is explicitly made such that you
> > > are allowed to use semaphores with BKL held - and such that the BKL is
> > > released if you do.
> >
> > Correct.  I hope you didn't remove my comment in the rt.c about BKL
> > being a PITA :) (Ingo was nice enough to change my original patch to use
> > the acronym.)
>
> I left it there it seems :-)
>
> >
> > >
> > > > But BKL is held across schedules()s
> > > > so that the BKL semantics are preserved.
> > > Only for spinlock_t now rt_mutex operation, not for semaphore/mutex
> > > operations.
> > > > Contending under a priority inheritance
> > > > operation isn't too much of a problem anyways since the use of it already
> > > > makes that
> > > > path indeterminant.
> > > The problem is that you might hit BKL because of what some other low
> > > priority  task does, thus making your RT code indeterministic.
> >
> > I disagree here.  The fact that you grab a semaphore that may also be
> > grabbed by a path while holding the BKL means that grabbing that
> > semaphore may be blocked on the BKL too.  So the length of grabbing a
> > semaphore that can be grabbed while also holding the BKL is the length
> > of the critical section of the semaphore + the length of the longest BKL
> > hold.
> Exactly. What is "the length of the longest BKL hold" ? (see below).
>
> >
> > Just don't let your RT tasks grab semaphores that can be grabbed while
> > also holding the BKL :)
>
> How are you to _know_ that. Even though your code or any code you
> call or any code called from code you call haven't changed, this situation
> can arise!
>
> >
> > But the main point is that it is still deterministic.  Just that it may
> > be longer than one thinks.
> >
> I don't consider "the length of the longest BKL hold" deterministic.
> People might traverse all kinds of weird lists and datastructures while
> holding BKL.
>
> > >
> > > > Even under contention, a higher priority task above A can still
> > > > run since the kernel is preemptive now even when manipulating BKL.
> > >
> > > No, A waits for BKL because it waits for B which waits for the BKL.
> >
> > Right.
> >
> > -- Steve
> >
> > PS. I might actually get around to testing your patch today :)  That is,
> > if -rt12 passes all my tests.
> >
>
> Sounds nice :-) I cross my fingers...
>
> Esben
>
>
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> >
>
>
diff -upr linux-2.6.15-rt15-orig/fs/proc/array.c linux-2.6.15-rt15-pipatch/fs/proc/array.c
--- linux-2.6.15-rt15-orig/fs/proc/array.c	2006-01-24 18:50:37.000000000 +0100
+++ linux-2.6.15-rt15-pipatch/fs/proc/array.c	2006-01-24 18:56:07.000000000 +0100
@@ -295,6 +295,14 @@ static inline char *task_cap(struct task
 			    cap_t(p->cap_effective));
 }
 
+
+static char *show_blocked_on(task_t *task, char *buffer)
+{
+  pid_t pid = get_blocked_on(task);
+  return buffer + sprintf(buffer,"BlckOn: %d\n",pid);
+}
+
+
 int proc_pid_status(struct task_struct *task, char * buffer)
 {
 	char * orig = buffer;
@@ -313,6 +321,7 @@ int proc_pid_status(struct task_struct *
 #if defined(CONFIG_ARCH_S390)
 	buffer = task_show_regs(task, buffer);
 #endif
+	buffer = show_blocked_on(task,buffer);
 	return buffer - orig;
 }
 
diff -upr linux-2.6.15-rt15-orig/include/linux/rt_lock.h linux-2.6.15-rt15-pipatch/include/linux/rt_lock.h
--- linux-2.6.15-rt15-orig/include/linux/rt_lock.h	2006-01-24 18:50:37.000000000 +0100
+++ linux-2.6.15-rt15-pipatch/include/linux/rt_lock.h	2006-01-24 18:56:07.000000000 +0100
@@ -36,6 +36,7 @@ struct rt_mutex {
 	unsigned long		acquire_eip;
 	char 			*name, *file;
 	int			line;
+	int                     verbose;
 # endif
 # ifdef CONFIG_DEBUG_PREEMPT
 	int			was_preempt_off;
@@ -67,7 +68,7 @@ struct rt_mutex_waiter {
 
 #ifdef CONFIG_DEBUG_DEADLOCKS
 # define __RT_MUTEX_DEADLOCK_DETECT_INITIALIZER(lockname) \
-	, .name = #lockname, .file = __FILE__, .line = __LINE__
+	, .name = #lockname, .file = __FILE__, .line = __LINE__, .verbose =0
 #else
 # define __RT_MUTEX_DEADLOCK_DETECT_INITIALIZER(lockname)
 #endif
diff -upr linux-2.6.15-rt15-orig/include/linux/sched.h linux-2.6.15-rt15-pipatch/include/linux/sched.h
--- linux-2.6.15-rt15-orig/include/linux/sched.h	2006-01-24 18:50:37.000000000 +0100
+++ linux-2.6.15-rt15-pipatch/include/linux/sched.h	2006-01-24 18:56:07.000000000 +0100
@@ -1652,6 +1652,8 @@ extern void recalc_sigpending(void);
 
 extern void signal_wake_up(struct task_struct *t, int resume_stopped);
 
+extern pid_t get_blocked_on(task_t *task);
+
 /*
  * Wrappers for p->thread_info->cpu access. No-op on UP.
  */
diff -upr linux-2.6.15-rt15-orig/init/main.c linux-2.6.15-rt15-pipatch/init/main.c
--- linux-2.6.15-rt15-orig/init/main.c	2006-01-24 18:50:37.000000000 +0100
+++ linux-2.6.15-rt15-pipatch/init/main.c	2006-01-24 18:56:07.000000000 +0100
@@ -616,6 +616,12 @@ static void __init do_initcalls(void)
 			printk(KERN_WARNING "error in initcall at 0x%p: "
 				"returned with %s\n", *call, msg);
 		}
+		if (initcall_debug) {
+			printk(KERN_DEBUG "Returned from initcall 0x%p", *call);
+			print_fn_descriptor_symbol(": %s()", (unsigned long) *call);
+			printk("\n");
+		}
+
 	}
 
 	/* Make sure there is no pending stuff from the initcall sequence */
diff -upr linux-2.6.15-rt15-orig/kernel/rt.c linux-2.6.15-rt15-pipatch/kernel/rt.c
--- linux-2.6.15-rt15-orig/kernel/rt.c	2006-01-24 18:50:37.000000000 +0100
+++ linux-2.6.15-rt15-pipatch/kernel/rt.c	2006-01-24 18:56:07.000000000 +0100
@@ -36,7 +36,10 @@
  *   (also by Steven Rostedt)
  *    - Converted single pi_lock to individual task locks.
  *
+ * By Esben Nielsen:
+ *    Doing priority inheritance with help of the scheduler.
  */
+
 #include <linux/config.h>
 #include <linux/rt_lock.h>
 #include <linux/sched.h>
@@ -58,18 +61,26 @@
  *  To keep from having a single lock for PI, each task and lock
  *  has their own locking. The order is as follows:
  *
+ *     lock->wait_lock   -> sometask->pi_lock
+ * You should only hold one wait_lock and one pi_lock
  * blocked task->pi_lock -> lock->wait_lock -> owner task->pi_lock.
  *
- * This is safe since a owner task should never block on a lock that
- * is owned by a blocking task.  Otherwise you would have a deadlock
- * in the normal system.
- * The same goes for the locks. A lock held by one task, should not be
- * taken by task that holds a lock that is blocking this lock's owner.
+ * lock->wait_lock protects everything inside the lock and all the waiters
+ * on lock->wait_list.
+ * sometask->pi_lock protects everything on task-> related to the rt_mutex.
+ *
+ * Invariants  - must be true when unlock lock->wait_lock:
+ *   If lock->wait_list is non-empty 
+ *     1) lock_owner(lock) points to a valid thread.
+ *     2) The first and only the first waiter on the list must be on
+ *        lock_owner(lock)->task->pi_waiters.
+ * 
+ *  A waiter struct is on the lock->wait_list iff waiter->ti!=NULL.
  *
- * A task that is about to grab a lock is first considered to be a
- * blocking task, even if the task successfully acquires the lock.
- * This is because the taking of the locks happen before the
- * task becomes the owner.
+ *  Strategy for boosting lock chain:
+ *   task A blocked on lock 1 owned by task B blocked on lock 2 etc..
+ *  A sets B's prio up and wakes B. B try to get lock 2 again and fails.
+ *  B therefore boost C.
  */
 
 /*
@@ -117,6 +128,7 @@
  * This flag is good for debugging the PI code - it makes all tasks
  * in the system fall under PI handling. Normally only SCHED_FIFO/RR
  * tasks are PI-handled:
+ *
  */
 #define ALL_TASKS_PI 0
 
@@ -132,6 +144,19 @@
 # define __CALLER0__
 #endif
 
+int rt_mutex_debug = 0;
+
+#ifdef CONFIG_PREEMPT_RT
+static int is_kernel_lock(struct rt_mutex *lock)
+{
+	return (lock == &kernel_sem.lock);
+
+}
+#else
+#define is_kernel_lock(lock) (0)
+#endif
+
+
 #ifdef CONFIG_DEBUG_DEADLOCKS
 /*
  * We need a global lock when we walk through the multi-process
@@ -311,7 +336,7 @@ void check_preempt_wakeup(struct task_st
 		}
 }
 
-static inline void
+static void
 account_mutex_owner_down(struct task_struct *task, struct rt_mutex *lock)
 {
 	if (task->lock_count >= MAX_LOCK_STACK) {
@@ -325,7 +350,7 @@ account_mutex_owner_down(struct task_str
 	task->lock_count++;
 }
 
-static inline void
+static void
 account_mutex_owner_up(struct task_struct *task)
 {
 	if (!task->lock_count) {
@@ -390,6 +415,21 @@ static void printk_lock(struct rt_mutex 
 	}
 }
 
+static void debug_lock(struct rt_mutex *lock, 
+		       const char *fmt,...)
+{ 
+	if(rt_mutex_debug && lock->verbose) { 
+		va_list args;
+		printk_task(current);
+
+		va_start(args, fmt);
+		vprintk(fmt, args);
+		va_end(args);
+		printk_lock(lock, 1);
+	} 
+}
+
+
 static void printk_waiter(struct rt_mutex_waiter *w)
 {
 	printk("-------------------------\n");
@@ -534,10 +574,9 @@ static int check_deadlock(struct rt_mute
 	 * Special-case: the BKL self-releases at schedule()
 	 * time so it can never deadlock:
 	 */
-#ifdef CONFIG_PREEMPT_RT
-	if (lock == &kernel_sem.lock)
+	if (is_kernel_lock(lock))
 		return 0;
-#endif
+
 	ti = lock_owner(lock);
 	if (!ti)
 		return 0;
@@ -562,13 +601,8 @@ static int check_deadlock(struct rt_mute
 		trace_local_irq_disable(ti);
 		return 0;
 	}
-#ifdef CONFIG_PREEMPT_RT
-	/*
-	 * Skip the BKL:
-	 */
-	if (lockblk == &kernel_sem.lock)
+	if(is_kernel_lock(lockblk))
 		return 0;
-#endif
 	/*
 	 * Ugh, something corrupted the lock data structure?
 	 */
@@ -656,7 +690,7 @@ restart:
 		list_del_init(curr);
 		trace_unlock_irqrestore(&trace_lock, flags, ti);
 
-		if (lock == &kernel_sem.lock) {
+		if (is_kernel_lock(lock)) {
 			printk("BUG: %s/%d, BKL held at task exit time!\n",
 				task->comm, task->pid);
 			printk("BKL acquired at: ");
@@ -724,28 +758,14 @@ restart:
 	return err;
 }
 
-#endif
-
-#if ALL_TASKS_PI && defined(CONFIG_DEBUG_DEADLOCKS)
-
-static void
-check_pi_list_present(struct rt_mutex *lock, struct rt_mutex_waiter *waiter,
-		      struct thread_info *old_owner)
+#else /* ifdef CONFIG_DEBUG_DEADLOCKS */
+static inline void debug_lock(struct rt_mutex *lock, 
+			      const char *fmt,...)
 {
-	struct rt_mutex_waiter *w;
-
-	_raw_spin_lock(&old_owner->task->pi_lock);
-	TRACE_WARN_ON_LOCKED(plist_node_empty(&waiter->pi_list));
-
-	plist_for_each_entry(w, &old_owner->task->pi_waiters, pi_list) {
-		if (w == waiter)
-			goto ok;
-	}
-	TRACE_WARN_ON_LOCKED(1);
-ok:
-	_raw_spin_unlock(&old_owner->task->pi_lock);
-	return;
 }
+#endif /* else CONFIG_DEBUG_DEADLOCKS */
+
+#if ALL_TASKS_PI && defined(CONFIG_DEBUG_DEADLOCKS)
 
 static void
 check_pi_list_empty(struct rt_mutex *lock, struct thread_info *old_owner)
@@ -781,274 +801,115 @@ check_pi_list_empty(struct rt_mutex *loc
 
 #endif
 
-/*
- * Move PI waiters of this lock to the new owner:
- */
-static void
-change_owner(struct rt_mutex *lock, struct thread_info *old_owner,
-	     struct thread_info *new_owner)
+static inline int boosting_waiter(struct  rt_mutex_waiter *waiter)
 {
-	struct rt_mutex_waiter *w, *tmp;
-	int requeued = 0, sum = 0;
-
-	if (old_owner == new_owner)
-		return;
-
-	SMP_TRACE_BUG_ON_LOCKED(!spin_is_locked(&old_owner->task->pi_lock));
-	SMP_TRACE_BUG_ON_LOCKED(!spin_is_locked(&new_owner->task->pi_lock));
-	plist_for_each_entry_safe(w, tmp, &old_owner->task->pi_waiters, pi_list) {
-		if (w->lock == lock) {
-			trace_special_pid(w->ti->task->pid, w->ti->task->prio, w->ti->task->normal_prio);
-			plist_del(&w->pi_list);
-			w->pi_list.prio = w->ti->task->prio;
-			plist_add(&w->pi_list, &new_owner->task->pi_waiters);
-			requeued++;
-		}
-		sum++;
-	}
-	trace_special(sum, requeued, 0);
+  return ALL_TASKS_PI || rt_prio(waiter->list.prio);
 }
 
-int pi_walk, pi_null, pi_prio, pi_initialized;
-
-/*
- * The lock->wait_lock and p->pi_lock must be held.
- */
-static void pi_setprio(struct rt_mutex *lock, struct task_struct *task, int prio)
+static int calc_pi_prio(task_t *task)
 {
-	struct rt_mutex *l = lock;
-	struct task_struct *p = task;
-	/*
-	 * We don't want to release the parameters locks.
-	 */
-
-	if (unlikely(!p->pid)) {
-		pi_null++;
-		return;
+	int prio = task->normal_prio;
+	if(!plist_head_empty(&task->pi_waiters)) {
+		struct  rt_mutex_waiter *waiter = 
+			plist_first_entry(&task->pi_waiters, struct rt_mutex_waiter, pi_list);
+		prio = min(waiter->pi_list.prio,prio);
 	}
 
-	SMP_TRACE_BUG_ON_LOCKED(!spin_is_locked(&lock->wait_lock));
-	SMP_TRACE_BUG_ON_LOCKED(!spin_is_locked(&p->pi_lock));
-#ifdef CONFIG_DEBUG_DEADLOCKS
-	pi_prio++;
-	if (p->policy != SCHED_NORMAL && prio > normal_prio(p)) {
-		TRACE_OFF();
-
-		printk("huh? (%d->%d??)\n", p->prio, prio);
-		printk("owner:\n");
-		printk_task(p);
-		printk("\ncurrent:\n");
-		printk_task(current);
-		printk("\nlock:\n");
-		printk_lock(lock, 1);
-		dump_stack();
-		trace_local_irq_disable(ti);
-	}
-#endif
-	/*
-	 * If the task is blocked on some other task then boost that
-	 * other task (or tasks) too:
-	 */
-	for (;;) {
-		struct rt_mutex_waiter *w = p->blocked_on;
-#ifdef CONFIG_DEBUG_DEADLOCKS
-		int was_rt = rt_task(p);
-#endif
-
-		mutex_setprio(p, prio);
-
-		/*
-		 * The BKL can really be a pain. It can happen where the
-		 * BKL is being held by one task that is just about to
-		 * block on another task that is waiting for the BKL.
-		 * This isn't a deadlock, since the BKL is released
-		 * when the task goes to sleep.  This also means that
-		 * all holders of the BKL are not blocked, or are just
-		 * about to be blocked.
-		 *
-		 * Another side-effect of this is that there's a small
-		 * window where the spinlocks are not held, and the blocked
-		 * process hasn't released the BKL.  So if we are going
-		 * to boost the owner of the BKL, stop after that,
-		 * since that owner is either running, or about to sleep
-		 * but don't go any further or we are in a loop.
-		 */
-		if (!w || unlikely(p->lock_depth >= 0))
-			break;
-		/*
-		 * If the task is blocked on a lock, and we just made
-		 * it RT, then register the task in the PI list and
-		 * requeue it to the wait list:
-		 */
-
-		/*
-		 * Don't unlock the original lock->wait_lock
-		 */
-		if (l != lock)
-			_raw_spin_unlock(&l->wait_lock);
-		l = w->lock;
-		TRACE_BUG_ON_LOCKED(!lock);
+	return prio;
 
-#ifdef CONFIG_PREEMPT_RT
-		/*
-		 * The current task that is blocking can also the one
-		 * holding the BKL, and blocking on a task that wants
-		 * it.  So if it were to get this far, we would deadlock.
-		 */
-		if (unlikely(l == &kernel_sem.lock) && lock_owner(l) == current_thread_info()) {
-			/*
-			 * No locks are held for locks, so fool the unlocking code
-			 * by thinking the last lock was the original.
-			 */
-			l = lock;
-			break;
-		}
-#endif
-
-		if (l != lock)
-			_raw_spin_lock(&l->wait_lock);
-
-		TRACE_BUG_ON_LOCKED(!lock_owner(l));
-
-		if (!plist_node_empty(&w->pi_list)) {
-			TRACE_BUG_ON_LOCKED(!was_rt && !ALL_TASKS_PI && !rt_task(p));
-			/*
-			 * If the task is blocked on a lock, and we just restored
-			 * it from RT to non-RT then unregister the task from
-			 * the PI list and requeue it to the wait list.
-			 *
-			 * (TODO: this can be unfair to SCHED_NORMAL tasks if they
-			 *        get PI handled.)
-			 */
-			plist_del(&w->pi_list);
-		} else
-			TRACE_BUG_ON_LOCKED((ALL_TASKS_PI || rt_task(p)) && was_rt);
-
-		if (ALL_TASKS_PI || rt_task(p)) {
-			w->pi_list.prio = prio;
-			plist_add(&w->pi_list, &lock_owner(l)->task->pi_waiters);
-		}
-
-		plist_del(&w->list);
-		w->list.prio = prio;
-		plist_add(&w->list, &l->wait_list);
-
-		pi_walk++;
-
-		if (p != task)
-			_raw_spin_unlock(&p->pi_lock);
-
-		p = lock_owner(l)->task;
-		TRACE_BUG_ON_LOCKED(!p);
-		_raw_spin_lock(&p->pi_lock);
-		/*
-		 * If the dependee is already higher-prio then
-		 * no need to boost it, and all further tasks down
-		 * the dependency chain are already boosted:
-		 */
-		if (p->prio <= prio)
-			break;
-	}
-	if (l != lock)
-		_raw_spin_unlock(&l->wait_lock);
-	if (p != task)
-		_raw_spin_unlock(&p->pi_lock);
 }
 
-/*
- * Change priority of a task pi aware
- *
- * There are several aspects to consider:
- * - task is priority boosted
- * - task is blocked on a mutex
- *
- */
-void pi_changeprio(struct task_struct *p, int prio)
+static void fix_prio(task_t *task)
 {
-	unsigned long flags;
-	int oldprio;
-
-	spin_lock_irqsave(&p->pi_lock,flags);
-	if (p->blocked_on)
-		spin_lock(&p->blocked_on->lock->wait_lock);
-
-	oldprio = p->normal_prio;
-	if (oldprio == prio)
-		goto out;
-
-	/* Set normal prio in any case */
-	p->normal_prio = prio;
-
-	/* Check, if we can safely lower the priority */
-	if (prio > p->prio && !plist_head_empty(&p->pi_waiters)) {
-		struct rt_mutex_waiter *w;
-		w = plist_first_entry(&p->pi_waiters,
-				      struct rt_mutex_waiter, pi_list);
-		if (w->ti->task->prio < prio)
-			prio = w->ti->task->prio;
+	int prio = calc_pi_prio(task);
+	if(task->prio > prio) {
+		/* Boost him */
+		mutex_setprio(task,prio);
+		if(task->blocked_on) {
+			/* Let it run to boost it's lock */
+			wake_up_process_mutex(task);
+		}
+	}
+	else if(task->prio < prio) {
+		/* Priority too high */
+		if(task->blocked_on) {
+			/* Let it run to unboost it's lock */
+			wake_up_process_mutex(task);
+		}
+		else {
+			mutex_setprio(task,prio);
+		}
 	}
-
-	if (prio == p->prio)
-		goto out;
-
-	/* Is task blocked on a mutex ? */
-	if (p->blocked_on)
-		pi_setprio(p->blocked_on->lock, p, prio);
-	else
-		mutex_setprio(p, prio);
- out:
-	if (p->blocked_on)
-		spin_unlock(&p->blocked_on->lock->wait_lock);
-
-	spin_unlock_irqrestore(&p->pi_lock, flags);
-
 }
 
+int pi_walk, pi_null, pi_prio, pi_initialized;
+
 /*
  * This is called with both the waiter->task->pi_lock and
  * lock->wait_lock held.
  */
 static void
 task_blocks_on_lock(struct rt_mutex_waiter *waiter, struct thread_info *ti,
-		    struct rt_mutex *lock __EIP_DECL__)
+                    struct rt_mutex *lock, int state __EIP_DECL__)
 {
+	struct rt_mutex_waiter *old_first;
 	struct task_struct *task = ti->task;
 #ifdef CONFIG_DEBUG_DEADLOCKS
 	check_deadlock(lock, 0, ti, eip);
 	/* mark the current thread as blocked on the lock */
 	waiter->eip = eip;
 #endif
+	SMP_TRACE_BUG_ON_LOCKED(!spin_is_locked(&lock->wait_lock));
+	SMP_TRACE_BUG_ON_LOCKED(spin_is_locked(&task->pi_lock));
+
+	if(plist_head_empty(&lock->wait_list)) {
+		old_first = NULL;
+	}
+	else {
+		old_first = plist_first_entry(&lock->wait_list, struct rt_mutex_waiter, list);
+		if(!boosting_waiter(old_first)) {
+			old_first = NULL;
+		}
+	}
+
+
+	_raw_spin_lock(&task->pi_lock);
 	task->blocked_on = waiter;
 	waiter->lock = lock;
 	waiter->ti = ti;
-	plist_node_init(&waiter->pi_list, task->prio);
-	/*
-	 * Add SCHED_NORMAL tasks to the end of the waitqueue (FIFO):
-	 */
-	SMP_TRACE_BUG_ON_LOCKED(!spin_is_locked(&task->pi_lock));
-	SMP_TRACE_BUG_ON_LOCKED(!spin_is_locked(&lock->wait_lock));
-#if !ALL_TASKS_PI
-	if ((!rt_task(task) &&
-		!(lock->mutex_attr & FUTEX_ATTR_PRIORITY_INHERITANCE))) {
-		plist_add(&waiter->list, &lock->wait_list);
-		set_lock_owner_pending(lock);
-		return;
+        
+	{
+		/* Fixup the prio of the (current) task here while we have the
+		   pi_lock */
+		int prio = calc_pi_prio(task);
+		if(prio!=task->prio) {
+			mutex_setprio(task,prio);
+		}
 	}
-#endif
-	_raw_spin_lock(&lock_owner(lock)->task->pi_lock);
-	plist_add(&waiter->pi_list, &lock_owner(lock)->task->pi_waiters);
-	/*
-	 * Add RT tasks to the head:
-	 */
+
+	plist_node_init(&waiter->list, task->prio);
 	plist_add(&waiter->list, &lock->wait_list);
-	set_lock_owner_pending(lock);
-	/*
-	 * If the waiter has higher priority than the owner
-	 * then temporarily boost the owner:
-	 */
-	if (task->prio < lock_owner(lock)->task->prio)
-		pi_setprio(lock, lock_owner(lock)->task, task->prio);
-	_raw_spin_unlock(&lock_owner(lock)->task->pi_lock);
+	set_task_state(task, state);
+	_raw_spin_unlock(&task->pi_lock);
+
+	set_lock_owner_pending(lock);   
+
+	if(waiter ==
+	   plist_first_entry(&lock->wait_list, struct rt_mutex_waiter, list)
+	    && boosting_waiter(waiter)) {
+		task_t *owner = lock_owner(lock)->task;
+
+		plist_node_init(&waiter->pi_list, task->prio);
+
+		_raw_spin_lock(&owner->pi_lock);
+		if(old_first) {
+			plist_del(&old_first->pi_list);
+		}
+		plist_add(&waiter->pi_list, &owner->pi_waiters);
+		fix_prio(owner);
+
+		_raw_spin_unlock(&owner->pi_lock);
+	}
 }
 
 /*
@@ -1068,6 +929,7 @@ static void __init_rt_mutex(struct rt_mu
 	lock->name = name;
 	lock->file = file;
 	lock->line = line;
+	lock->verbose = 0;
 #endif
 #ifdef CONFIG_DEBUG_PREEMPT
 	lock->was_preempt_off = 0;
@@ -1085,20 +947,48 @@ EXPORT_SYMBOL(__init_rwsem);
 #endif
 
 /*
- * This must be called with both the old_owner and new_owner pi_locks held.
- * As well as the lock->wait_lock.
+ * This must be called with the lock->wait_lock held.
+ * Must: new_owner!=NULL
+ * Likely: old_owner==NULL
  */
-static inline
+static 
 void set_new_owner(struct rt_mutex *lock, struct thread_info *old_owner,
 			struct thread_info *new_owner __EIP_DECL__)
 {
+	SMP_TRACE_BUG_ON_LOCKED(spin_is_locked(&old_owner->task->pi_lock));
+	SMP_TRACE_BUG_ON_LOCKED(spin_is_locked(&new_owner->task->pi_lock));
+	SMP_TRACE_BUG_ON_LOCKED(!spin_is_locked(&lock->wait_lock));
+
 	if (new_owner)
 		trace_special_pid(new_owner->task->pid, new_owner->task->prio, 0);
-	if (unlikely(old_owner))
-		change_owner(lock, old_owner, new_owner);
+	if(old_owner) {
+		account_mutex_owner_up(old_owner->task);
+	}
+#ifdef CONFIG_DEBUG_DEADLOCKS
+	if (trace_on && unlikely(old_owner)) {
+		TRACE_WARN_ON_LOCKED(list_empty(&lock->held_list));
+		list_del_init(&lock->held_list);
+	}
+#endif
 	lock->owner = new_owner;
-	if (!plist_head_empty(&lock->wait_list))
-		set_lock_owner_pending(lock);
+	if (!plist_head_empty(&lock->wait_list)) {
+		struct rt_mutex_waiter *next =
+			plist_first_entry(&lock->wait_list, 
+					  struct rt_mutex_waiter, list);
+		if(boosting_waiter(next)) {
+			if(old_owner) {
+				_raw_spin_lock(&old_owner->task->pi_lock);
+				plist_del(&next->pi_list);
+				_raw_spin_unlock(&old_owner->task->pi_lock);
+			}
+			_raw_spin_lock(&new_owner->task->pi_lock);
+			plist_add(&next->pi_list, 
+				  &new_owner->task->pi_waiters);
+			set_lock_owner_pending(lock);
+			_raw_spin_unlock(&new_owner->task->pi_lock);
+		}
+	}
+        
 #ifdef CONFIG_DEBUG_DEADLOCKS
 	if (trace_on) {
 		TRACE_WARN_ON_LOCKED(!list_empty(&lock->held_list));
@@ -1109,6 +999,36 @@ void set_new_owner(struct rt_mutex *lock
 	account_mutex_owner_down(new_owner->task, lock);
 }
 
+
+static void remove_waiter(struct rt_mutex *lock, 
+			  struct rt_mutex_waiter *waiter, 
+			  int fixprio)
+{
+	task_t *owner = lock_owner(lock) ? lock_owner(lock)->task : NULL;
+	int first = (waiter==plist_first_entry(&lock->wait_list, 
+					       struct rt_mutex_waiter, list));
+        
+	plist_del(&waiter->list);
+	if(first && owner) {
+		_raw_spin_lock(&owner->pi_lock);
+		if(boosting_waiter(waiter)) {
+			plist_del(&waiter->pi_list);
+		}
+		if(!plist_head_empty(&lock->wait_list)) {
+			struct rt_mutex_waiter *next =
+				plist_first_entry(&lock->wait_list, 
+						  struct rt_mutex_waiter, list);
+			if(boosting_waiter(next)) {
+				plist_add(&next->pi_list, &owner->pi_waiters);
+			}
+		}
+		if(fixprio) {
+			fix_prio(owner);
+		}
+		_raw_spin_unlock(&owner->pi_lock);
+	}
+}
+
 /*
  * handle the lock release when processes blocked on it that can now run
  * - the spinlock must be held by the caller
@@ -1123,70 +1043,36 @@ pick_new_owner(struct rt_mutex *lock, st
 	struct thread_info *new_owner;
 
 	SMP_TRACE_BUG_ON_LOCKED(!spin_is_locked(&lock->wait_lock));
+	SMP_TRACE_BUG_ON_LOCKED(spin_is_locked(&old_owner->task->pi_lock));
+
 	/*
 	 * Get the highest prio one:
 	 *
 	 * (same-prio RT tasks go FIFO)
 	 */
 	waiter = plist_first_entry(&lock->wait_list, struct rt_mutex_waiter, list);
-
-#ifdef CONFIG_SMP
- try_again:
-#endif
+	remove_waiter(lock,waiter,0);
 	trace_special_pid(waiter->ti->task->pid, waiter->ti->task->prio, 0);
 
-#if ALL_TASKS_PI
-	check_pi_list_present(lock, waiter, old_owner);
-#endif
 	new_owner = waiter->ti;
-	/*
-	 * The new owner is still blocked on this lock, so we
-	 * must release the lock->wait_lock before grabing
-	 * the new_owner lock.
-	 */
-	_raw_spin_unlock(&lock->wait_lock);
-	_raw_spin_lock(&new_owner->task->pi_lock);
-	_raw_spin_lock(&lock->wait_lock);
-	/*
-	 * In this split second of releasing the lock, a high priority
-	 * process could have come along and blocked as well.
-	 */
-#ifdef CONFIG_SMP
-	waiter = plist_first_entry(&lock->wait_list, struct rt_mutex_waiter, list);
-	if (unlikely(waiter->ti != new_owner)) {
-		_raw_spin_unlock(&new_owner->task->pi_lock);
-		goto try_again;
-	}
-#ifdef CONFIG_PREEMPT_RT
-	/*
-	 * Once again the BKL comes to play.  Since the BKL can be grabbed and released
-	 * out of the normal P1->L1->P2 order, there's a chance that someone has the
-	 * BKL owner's lock and is waiting on the new owner lock.
-	 */
-	if (unlikely(lock == &kernel_sem.lock)) {
-		if (!_raw_spin_trylock(&old_owner->task->pi_lock)) {
-			_raw_spin_unlock(&new_owner->task->pi_lock);
-			goto try_again;
-		}
-	} else
-#endif
-#endif
-		_raw_spin_lock(&old_owner->task->pi_lock);
-
-	plist_del(&waiter->list);
-	plist_del(&waiter->pi_list);
-	waiter->pi_list.prio = waiter->ti->task->prio;
 
 	set_new_owner(lock, old_owner, new_owner __W_EIP__(waiter));
+
+	_raw_spin_lock(&new_owner->task->pi_lock);
 	/* Don't touch waiter after ->task has been NULLed */
 	mb();
 	waiter->ti = NULL;
 	new_owner->task->blocked_on = NULL;
-	TRACE_WARN_ON(save_state != lock->save_state);
-
-	_raw_spin_unlock(&old_owner->task->pi_lock);
+#ifdef CAPTURE_LOCK
+	if (!is_kernel_lock(lock)) {
+		new_owner->task->rt_flags |= RT_PENDOWNER;
+		new_owner->task->pending_owner = lock;
+	}
+#endif
 	_raw_spin_unlock(&new_owner->task->pi_lock);
 
+	TRACE_WARN_ON(save_state != lock->save_state);
+
 	return new_owner;
 }
 
@@ -1217,11 +1103,41 @@ static inline void init_lists(struct rt_
 	}
 #endif
 #ifdef CONFIG_DEBUG_DEADLOCKS
-	if (!lock->held_list.prev && !lock->held_list.next)
+	if (!lock->held_list.prev && !lock->held_list.next) {
 		INIT_LIST_HEAD(&lock->held_list);
+		lock->verbose = 0;
+	}
 #endif
 }
 
+
+static void remove_pending_owner_nolock(task_t *owner)
+{
+	owner->rt_flags &= ~RT_PENDOWNER;
+	owner->pending_owner = NULL;
+}
+
+static void remove_pending_owner(task_t *owner)
+{
+	_raw_spin_lock(&owner->pi_lock);
+	remove_pending_owner_nolock(owner);
+	_raw_spin_unlock(&owner->pi_lock);
+}
+
+int task_is_pending_owner_nolock(struct thread_info  *owner, 
+                                 struct rt_mutex *lock)
+{
+	return (lock_owner(lock) == owner) &&
+		(owner->task->pending_owner == lock);
+}
+int task_is_pending_owner(struct thread_info  *owner, struct rt_mutex *lock)
+{
+	int res;
+	_raw_spin_lock(&owner->task->pi_lock);
+	res = task_is_pending_owner_nolock(owner,lock);
+	_raw_spin_unlock(&owner->task->pi_lock);
+	return res;
+}
 /*
  * Try to grab a lock, and if it is owned but the owner
  * hasn't woken up yet, see if we can steal it.
@@ -1233,6 +1149,8 @@ static int __grab_lock(struct rt_mutex *
 {
 #ifndef CAPTURE_LOCK
 	return 0;
+#else
+	int res = 0;
 #endif
 	/*
 	 * The lock is owned, but now test to see if the owner
@@ -1241,111 +1159,36 @@ static int __grab_lock(struct rt_mutex *
 
 	TRACE_BUG_ON_LOCKED(!owner);
 
+	_raw_spin_lock(&owner->pi_lock);
+
 	/* The owner is pending on a lock, but is it this lock? */
 	if (owner->pending_owner != lock)
-		return 0;
+		goto out_unlock;
 
 	/*
 	 * There's an owner, but it hasn't woken up to take the lock yet.
 	 * See if we should steal it from him.
 	 */
 	if (task->prio > owner->prio)
-		return 0;
-#ifdef CONFIG_PREEMPT_RT
+		goto out_unlock;
+
 	/*
 	 * The BKL is a PITA. Don't ever steal it
 	 */
-	if (lock == &kernel_sem.lock)
-		return 0;
-#endif
+	if (is_kernel_lock(lock))
+		goto out_unlock;
+
 	/*
 	 * This task is of higher priority than the current pending
 	 * owner, so we may steal it.
 	 */
-	owner->rt_flags &= ~RT_PENDOWNER;
-	owner->pending_owner = NULL;
-
-#ifdef CONFIG_DEBUG_DEADLOCKS
-	/*
-	 * This task will be taking the ownership away, and
-	 * when it does, the lock can't be on the held list.
-	 */
-	if (trace_on) {
-		TRACE_WARN_ON_LOCKED(list_empty(&lock->held_list));
-		list_del_init(&lock->held_list);
-	}
-#endif
-	account_mutex_owner_up(owner);
-
-	return 1;
-}
-
-/*
- * Bring a task from pending ownership to owning a lock.
- *
- * Return 0 if we secured it, otherwise non-zero if it was
- * stolen.
- */
-static int
-capture_lock(struct rt_mutex_waiter *waiter, struct thread_info *ti,
-	     struct task_struct *task)
-{
-	struct rt_mutex *lock = waiter->lock;
-	struct thread_info *old_owner;
-	unsigned long flags;
-	int ret = 0;
-
-#ifndef CAPTURE_LOCK
-	return 0;
-#endif
-#ifdef CONFIG_PREEMPT_RT
-	/*
-	 * The BKL is special, we always get it.
-	 */
-	if (lock == &kernel_sem.lock)
-		return 0;
-#endif
-
-	trace_lock_irqsave(&trace_lock, flags, ti);
-	/*
-	 * We are no longer blocked on the lock, so we are considered a
-	 * owner. So we must grab the lock->wait_lock first.
-	 */
-	_raw_spin_lock(&lock->wait_lock);
-	_raw_spin_lock(&task->pi_lock);
-
-	if (!(task->rt_flags & RT_PENDOWNER)) {
-		/*
-		 * Someone else stole it.
-		 */
-		old_owner = lock_owner(lock);
-		TRACE_BUG_ON_LOCKED(old_owner == ti);
-		if (likely(!old_owner) || __grab_lock(lock, task, old_owner->task)) {
-			/* we got it back! */
-			if (old_owner) {
-				_raw_spin_lock(&old_owner->task->pi_lock);
-				set_new_owner(lock, old_owner, ti __W_EIP__(waiter));
-				_raw_spin_unlock(&old_owner->task->pi_lock);
-			} else
-				set_new_owner(lock, old_owner, ti __W_EIP__(waiter));
-			ret = 0;
-		} else {
-			/* Add ourselves back to the list */
-			TRACE_BUG_ON_LOCKED(!plist_node_empty(&waiter->list));
-			plist_node_init(&waiter->list, task->prio);
-			task_blocks_on_lock(waiter, ti, lock __W_EIP__(waiter));
-			ret = 1;
-		}
-	} else {
-		task->rt_flags &= ~RT_PENDOWNER;
-		task->pending_owner = NULL;
-	}
+	remove_pending_owner_nolock(owner);
 
-	_raw_spin_unlock(&lock->wait_lock);
-	_raw_spin_unlock(&task->pi_lock);
-	trace_unlock_irqrestore(&trace_lock, flags, ti);
+	res = 1;
 
-	return ret;
+ out_unlock:
+	_raw_spin_unlock(&owner->pi_lock);
+	return res;
 }
 
 static inline void INIT_WAITER(struct rt_mutex_waiter *waiter)
@@ -1366,10 +1209,25 @@ static inline void FREE_WAITER(struct rt
 #endif
 }
 
+static int allowed_to_take_lock(struct thread_info *ti,
+                                task_t *task,
+                                struct thread_info *old_owner,
+                                struct rt_mutex *lock)
+{
+	SMP_TRACE_BUG_ON_LOCKED(!spin_is_locked(&lock->wait_lock));
+	SMP_TRACE_BUG_ON_LOCKED(spin_is_locked(&old_owner->task->pi_lock));
+	SMP_TRACE_BUG_ON_LOCKED(spin_is_locked(&task->pi_lock));
+
+	return !old_owner ||
+		(is_kernel_lock(lock)  && lock_owner(lock) == ti) ||
+		task_is_pending_owner(ti,lock) || 
+		__grab_lock(lock, task, old_owner->task);
+}
+
 /*
  * lock it semaphore-style: no worries about missed wakeups.
  */
-static inline void
+static void
 ____down(struct rt_mutex *lock __EIP_DECL__)
 {
 	struct thread_info *ti = current_thread_info(), *old_owner;
@@ -1379,65 +1237,66 @@ ____down(struct rt_mutex *lock __EIP_DEC
 
 	trace_lock_irqsave(&trace_lock, flags, ti);
 	TRACE_BUG_ON_LOCKED(!raw_irqs_disabled());
-	_raw_spin_lock(&task->pi_lock);
 	_raw_spin_lock(&lock->wait_lock);
 	INIT_WAITER(&waiter);
 
-	old_owner = lock_owner(lock);
 	init_lists(lock);
 
-	if (likely(!old_owner) || __grab_lock(lock, task, old_owner->task)) {
+	debug_lock(lock,"down");
+	/* wait to be given the lock */
+	for (;;) {
+		old_owner = lock_owner(lock);
+
+		if(allowed_to_take_lock(ti, task, old_owner,lock)) {
 		/* granted */
-		TRACE_WARN_ON_LOCKED(!plist_head_empty(&lock->wait_list) && !old_owner);
-		if (old_owner) {
-			_raw_spin_lock(&old_owner->task->pi_lock);
-			set_new_owner(lock, old_owner, ti __EIP__);
-			_raw_spin_unlock(&old_owner->task->pi_lock);
-		} else
+			TRACE_WARN_ON_LOCKED(!plist_head_empty(&lock->wait_list) && !old_owner);
 			set_new_owner(lock, old_owner, ti __EIP__);
-		_raw_spin_unlock(&lock->wait_lock);
-		_raw_spin_unlock(&task->pi_lock);
-		trace_unlock_irqrestore(&trace_lock, flags, ti);
-
-		FREE_WAITER(&waiter);
-		return;
-	}
-
-	set_task_state(task, TASK_UNINTERRUPTIBLE);
+			if (!is_kernel_lock(lock)) {
+				remove_pending_owner(task);
+			}
+		  	debug_lock(lock,"got lock");
 
-	plist_node_init(&waiter.list, task->prio);
-	task_blocks_on_lock(&waiter, ti, lock __EIP__);
+			_raw_spin_unlock(&lock->wait_lock);
+			trace_unlock_irqrestore(&trace_lock, flags, ti);
 
-	TRACE_BUG_ON_LOCKED(!raw_irqs_disabled());
-	/* we don't need to touch the lock struct anymore */
-	_raw_spin_unlock(&lock->wait_lock);
-	_raw_spin_unlock(&task->pi_lock);
-	trace_unlock_irqrestore(&trace_lock, flags, ti);
+			FREE_WAITER(&waiter);
+			return;
+		}
+		
+		task_blocks_on_lock(&waiter, ti, lock, TASK_UNINTERRUPTIBLE __EIP__);
 
-	might_sleep();
+		TRACE_BUG_ON_LOCKED(!raw_irqs_disabled());
+		/* we don't need to touch the lock struct anymore */
+		debug_lock(lock,"sleeping on");
+		_raw_spin_unlock(&lock->wait_lock);
+		trace_unlock_irqrestore(&trace_lock, flags, ti);
+		
+		might_sleep();
+		
+		nosched_flag = current->flags & PF_NOSCHED;
+		current->flags &= ~PF_NOSCHED;
 
-	nosched_flag = current->flags & PF_NOSCHED;
-	current->flags &= ~PF_NOSCHED;
+		if (waiter.ti)
+		{
+			schedule();
+		}
+		
+		current->flags |= nosched_flag;
+		task->state = TASK_RUNNING;
 
-wait_again:
-	/* wait to be given the lock */
-	for (;;) {
-		if (!waiter.ti)
-			break;
-		schedule();
-		set_task_state(task, TASK_UNINTERRUPTIBLE);
-	}
-	/*
-	 * Check to see if we didn't have ownership stolen.
-	 */
-	if (capture_lock(&waiter, ti, task)) {
-		set_task_state(task, TASK_UNINTERRUPTIBLE);
-		goto wait_again;
+		trace_lock_irqsave(&trace_lock, flags, ti);
+		_raw_spin_lock(&lock->wait_lock);
+		debug_lock(lock,"waking up on");
+		if(waiter.ti) {
+			remove_waiter(lock,&waiter,1);
+		}
+		_raw_spin_lock(&task->pi_lock);
+		task->blocked_on = NULL;
+		_raw_spin_unlock(&task->pi_lock);
 	}
 
-	current->flags |= nosched_flag;
-	task->state = TASK_RUNNING;
-	FREE_WAITER(&waiter);
+	/* Should not get here! */
+	BUG_ON(1);
 }
 
 /*
@@ -1450,131 +1309,116 @@ wait_again:
  * enables the seemless use of arbitrary (blocking) spinlocks within
  * sleep/wakeup event loops.
  */
-static inline void
+static void
 ____down_mutex(struct rt_mutex *lock __EIP_DECL__)
 {
 	struct thread_info *ti = current_thread_info(), *old_owner;
-	unsigned long state, saved_state, nosched_flag;
+	unsigned long state, saved_state;
 	struct task_struct *task = ti->task;
 	struct rt_mutex_waiter waiter;
 	unsigned long flags;
-	int got_wakeup = 0, saved_lock_depth;
+	int got_wakeup = 0;
+	
+	        
 
 	trace_lock_irqsave(&trace_lock, flags, ti);
 	TRACE_BUG_ON_LOCKED(!raw_irqs_disabled());
-	_raw_spin_lock(&task->pi_lock);
 	_raw_spin_lock(&lock->wait_lock);
-	INIT_WAITER(&waiter);
-
-	old_owner = lock_owner(lock);
-	init_lists(lock);
-
-	if (likely(!old_owner) || __grab_lock(lock, task, old_owner->task)) {
-		/* granted */
-		TRACE_WARN_ON_LOCKED(!plist_head_empty(&lock->wait_list) && !old_owner);
-		if (old_owner) {
-			_raw_spin_lock(&old_owner->task->pi_lock);
-			set_new_owner(lock, old_owner, ti __EIP__);
-			_raw_spin_unlock(&old_owner->task->pi_lock);
-		} else
-			set_new_owner(lock, old_owner, ti __EIP__);
-		_raw_spin_unlock(&lock->wait_lock);
-		_raw_spin_unlock(&task->pi_lock);
-		trace_unlock_irqrestore(&trace_lock, flags, ti);
-
-		FREE_WAITER(&waiter);
-		return;
-	}
-
-	plist_node_init(&waiter.list, task->prio);
-	task_blocks_on_lock(&waiter, ti, lock __EIP__);
-
-	TRACE_BUG_ON_LOCKED(!raw_irqs_disabled());
-	/*
+/*
 	 * Here we save whatever state the task was in originally,
 	 * we'll restore it at the end of the function and we'll
 	 * take any intermediate wakeup into account as well,
 	 * independently of the mutex sleep/wakeup mechanism:
 	 */
 	saved_state = xchg(&task->state, TASK_UNINTERRUPTIBLE);
+        
+	INIT_WAITER(&waiter);
 
-	/* we don't need to touch the lock struct anymore */
-	_raw_spin_unlock(&lock->wait_lock);
-	_raw_spin_unlock(&task->pi_lock);
-	trace_unlock(&trace_lock, ti);
-
-	/*
-	 * TODO: check 'flags' for the IRQ bit here - it is illegal to
-	 * call down() from an IRQs-off section that results in
-	 * an actual reschedule.
-	 */
-
-	nosched_flag = current->flags & PF_NOSCHED;
-	current->flags &= ~PF_NOSCHED;
-
-	/*
-	 * BKL users expect the BKL to be held across spinlock/rwlock-acquire.
-	 * Save and clear it, this will cause the scheduler to not drop the
-	 * BKL semaphore if we end up scheduling:
-	 */
-	saved_lock_depth = task->lock_depth;
-	task->lock_depth = -1;
+	init_lists(lock);
 
-wait_again:
 	/* wait to be given the lock */
 	for (;;) {
-		unsigned long saved_flags = current->flags & PF_NOSCHED;
-
-		if (!waiter.ti)
-			break;
-		trace_local_irq_enable(ti);
-		// no need to check for preemption here, we schedule().
-		current->flags &= ~PF_NOSCHED;
+		old_owner = lock_owner(lock);
+        
+		if (allowed_to_take_lock(ti,task,old_owner,lock)) {
+		/* granted */
+			TRACE_WARN_ON_LOCKED(!plist_head_empty(&lock->wait_list) && !old_owner);
+			set_new_owner(lock, old_owner, ti __EIP__);
+			remove_pending_owner(task);
+			_raw_spin_unlock(&lock->wait_lock);
+                        
+			/*
+			 * Only set the task's state to TASK_RUNNING if it got
+			 * a non-mutex wakeup. We keep the original state otherwise.
+			 * A mutex wakeup changes the task's state to TASK_RUNNING_MUTEX,
+			 * not TASK_RUNNING - hence we can differenciate betwee5~n the two
+			 * cases:
+			 */
+			state = xchg(&task->state, saved_state);
+			if (state == TASK_RUNNING)
+				got_wakeup = 1;
+			if (got_wakeup)
+				task->state = TASK_RUNNING;
+			trace_unlock_irqrestore(&trace_lock, flags, ti);
+			preempt_check_resched();
 
-		schedule();
+			FREE_WAITER(&waiter);
+			return;
+		}
+		
+		task_blocks_on_lock(&waiter, ti, lock,
+				    TASK_UNINTERRUPTIBLE __EIP__);
 
-		current->flags |= saved_flags;
-		trace_local_irq_disable(ti);
-		state = xchg(&task->state, TASK_UNINTERRUPTIBLE);
-		if (state == TASK_RUNNING)
-			got_wakeup = 1;
-	}
-	/*
-	 * Check to see if we didn't have ownership stolen.
-	 */
-	if (capture_lock(&waiter, ti, task)) {
-		state = xchg(&task->state, TASK_UNINTERRUPTIBLE);
-		if (state == TASK_RUNNING)
-			got_wakeup = 1;
-		goto wait_again;
-	}
-	/*
-	 * Only set the task's state to TASK_RUNNING if it got
-	 * a non-mutex wakeup. We keep the original state otherwise.
-	 * A mutex wakeup changes the task's state to TASK_RUNNING_MUTEX,
-	 * not TASK_RUNNING - hence we can differenciate between the two
-	 * cases:
-	 */
-	state = xchg(&task->state, saved_state);
-	if (state == TASK_RUNNING)
-		got_wakeup = 1;
-	if (got_wakeup)
-		task->state = TASK_RUNNING;
-	trace_local_irq_enable(ti);
-	preempt_check_resched();
+		TRACE_BUG_ON_LOCKED(!raw_irqs_disabled());
+		/* we don't need to touch the lock struct anymore */
+		_raw_spin_unlock(&lock->wait_lock);
+		trace_unlock(&trace_lock, ti);
+                
+		if (waiter.ti) {
+			unsigned long saved_flags = 
+				current->flags & PF_NOSCHED;
+			/*
+			 * BKL users expect the BKL to be held across spinlock/rwlock-acquire.
+			 * Save and clear it, this will cause the scheduler to not drop the
+			 * BKL semaphore if we end up scheduling:
+			 */
 
-	task->lock_depth = saved_lock_depth;
-	current->flags |= nosched_flag;
-	FREE_WAITER(&waiter);
+			int saved_lock_depth = task->lock_depth;
+			task->lock_depth = -1;
+			
+
+			trace_local_irq_enable(ti);
+			// no need to check for preemption here, we schedule().
+                        
+			current->flags &= ~PF_NOSCHED;
+			
+			schedule();
+			
+			trace_local_irq_disable(ti);
+			task->flags |= saved_flags;
+			task->lock_depth = saved_lock_depth;
+			state = xchg(&task->state, TASK_RUNNING_MUTEX);
+			if (state == TASK_RUNNING)
+				got_wakeup = 1;
+		}
+		
+		trace_lock_irq(&trace_lock, ti);
+		_raw_spin_lock(&lock->wait_lock);
+		if(waiter.ti) {
+			remove_waiter(lock,&waiter,1);
+		}
+		_raw_spin_lock(&task->pi_lock);
+		task->blocked_on = NULL;
+		_raw_spin_unlock(&task->pi_lock);
+	}
 }
 
-static void __up_mutex_waiter_savestate(struct rt_mutex *lock __EIP_DECL__);
-static void __up_mutex_waiter_nosavestate(struct rt_mutex *lock __EIP_DECL__);
-
+static void __up_mutex_waiter(struct rt_mutex *lock, 
+			      int savestate __EIP_DECL__);
 /*
  * release the lock:
  */
-static inline void
+static void
 ____up_mutex(struct rt_mutex *lock, int save_state __EIP_DECL__)
 {
 	struct thread_info *ti = current_thread_info();
@@ -1585,30 +1429,31 @@ ____up_mutex(struct rt_mutex *lock, int 
 	trace_lock_irqsave(&trace_lock, flags, ti);
 	TRACE_BUG_ON_LOCKED(!raw_irqs_disabled());
 	_raw_spin_lock(&lock->wait_lock);
+	debug_lock(lock,"upping");
 	TRACE_BUG_ON_LOCKED(!lock->wait_list.prio_list.prev && !lock->wait_list.prio_list.next);
 
-#ifdef CONFIG_DEBUG_DEADLOCKS
-	if (trace_on) {
-		TRACE_WARN_ON_LOCKED(lock_owner(lock) != ti);
-		TRACE_WARN_ON_LOCKED(list_empty(&lock->held_list));
-		list_del_init(&lock->held_list);
-	}
-#endif
 
 #if ALL_TASKS_PI
 	if (plist_head_empty(&lock->wait_list))
 		check_pi_list_empty(lock, lock_owner(lock));
 #endif
 	if (unlikely(!plist_head_empty(&lock->wait_list))) {
-		if (save_state)
-			__up_mutex_waiter_savestate(lock __EIP__);
-		else
-			__up_mutex_waiter_nosavestate(lock __EIP__);
-	} else
+		__up_mutex_waiter(lock,save_state __EIP__);
+		debug_lock(lock,"woke up waiter");
+	} else {
+#ifdef CONFIG_DEBUG_DEADLOCKS
+		if (trace_on) {
+			TRACE_WARN_ON_LOCKED(lock_owner(lock) != ti);
+			TRACE_WARN_ON_LOCKED(list_empty(&lock->held_list));
+			list_del_init(&lock->held_list);
+		}
+#endif
 		lock->owner = NULL;
+		debug_lock(lock,"there was no waiters");
+		account_mutex_owner_up(ti->task);
+	}
 	_raw_spin_unlock(&lock->wait_lock);
 #if defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_PREEMPT_RT)
-	account_mutex_owner_up(current);
 	if (!current->lock_count && !rt_prio(current->normal_prio) &&
 					rt_prio(current->prio)) {
 		static int once = 1;
@@ -1841,125 +1686,103 @@ static int __sched __down_interruptible(
 	struct rt_mutex_waiter waiter;
 	struct timer_list timer;
 	unsigned long expire = 0;
+	int timer_installed = 0;
 	int ret;
 
 	trace_lock_irqsave(&trace_lock, flags, ti);
 	TRACE_BUG_ON_LOCKED(!raw_irqs_disabled());
-	_raw_spin_lock(&task->pi_lock);
 	_raw_spin_lock(&lock->wait_lock);
 	INIT_WAITER(&waiter);
 
-	old_owner = lock_owner(lock);
 	init_lists(lock);
 
-	if (likely(!old_owner) || __grab_lock(lock, task, old_owner->task)) {
+	ret = 0;
+	/* wait to be given the lock */
+	for (;;) {
+		old_owner = lock_owner(lock);
+                
+		if (allowed_to_take_lock(ti,task,old_owner,lock)) {
 		/* granted */
-		TRACE_WARN_ON_LOCKED(!plist_head_empty(&lock->wait_list) && !old_owner);
-		if (old_owner) {
-			_raw_spin_lock(&old_owner->task->pi_lock);
-			set_new_owner(lock, old_owner, ti __EIP__);
-			_raw_spin_unlock(&old_owner->task->pi_lock);
-		} else
+			TRACE_WARN_ON_LOCKED(!plist_head_empty(&lock->wait_list) && !old_owner);
 			set_new_owner(lock, old_owner, ti __EIP__);
-		_raw_spin_unlock(&lock->wait_lock);
-		_raw_spin_unlock(&task->pi_lock);
-		trace_unlock_irqrestore(&trace_lock, flags, ti);
-
-		FREE_WAITER(&waiter);
-		return 0;
-	}
+			_raw_spin_unlock(&lock->wait_lock);
+			trace_unlock_irqrestore(&trace_lock, flags, ti);
 
-	set_task_state(task, TASK_INTERRUPTIBLE);
+			goto out_free_timer;
+		}
 
-	plist_node_init(&waiter.list, task->prio);
-	task_blocks_on_lock(&waiter, ti, lock __EIP__);
+		task_blocks_on_lock(&waiter, ti, lock, TASK_INTERRUPTIBLE __EIP__);
 
-	TRACE_BUG_ON_LOCKED(!raw_irqs_disabled());
-	/* we don't need to touch the lock struct anymore */
-	_raw_spin_unlock(&lock->wait_lock);
-	_raw_spin_unlock(&task->pi_lock);
-	trace_unlock_irqrestore(&trace_lock, flags, ti);
+		TRACE_BUG_ON_LOCKED(!raw_irqs_disabled());
+		/* we don't need to touch the lock struct anymore */
+		_raw_spin_unlock(&lock->wait_lock);
+		trace_unlock_irqrestore(&trace_lock, flags, ti);
+		
+		might_sleep();
+		
+		nosched_flag = current->flags & PF_NOSCHED;
+		current->flags &= ~PF_NOSCHED;
+		if (time && !timer_installed) {
+			expire = time + jiffies;
+			init_timer(&timer);
+			timer.expires = expire;
+			timer.data = (unsigned long)current;
+			timer.function = process_timeout;
+			add_timer(&timer);
+			timer_installed = 1;
+		}
 
-	might_sleep();
+                        
+		if (waiter.ti) {
+			schedule();
+		}
+		
+		current->flags |= nosched_flag;
+		task->state = TASK_RUNNING;
 
-	nosched_flag = current->flags & PF_NOSCHED;
-	current->flags &= ~PF_NOSCHED;
-	if (time) {
-		expire = time + jiffies;
-		init_timer(&timer);
-		timer.expires = expire;
-		timer.data = (unsigned long)current;
-		timer.function = process_timeout;
-		add_timer(&timer);
-	}
+		trace_lock_irqsave(&trace_lock, flags, ti);
+		_raw_spin_lock(&lock->wait_lock);
+		if(waiter.ti) {
+			remove_waiter(lock,&waiter,1);
+		}
+		_raw_spin_lock(&task->pi_lock);
+		task->blocked_on = NULL;
+		_raw_spin_unlock(&task->pi_lock);
 
-	ret = 0;
-wait_again:
-	/* wait to be given the lock */
-	for (;;) {
-		if (signal_pending(current) || (time && !timer_pending(&timer))) {
-			/*
-			 * Remove ourselves from the wait list if we
-			 * didnt get the lock - else return success:
-			 */
-			trace_lock_irq(&trace_lock, ti);
-			_raw_spin_lock(&task->pi_lock);
-			_raw_spin_lock(&lock->wait_lock);
-			if (waiter.ti || time) {
-				plist_del(&waiter.list);
-				/*
-				 * If we were the last waiter then clear
-				 * the pending bit:
-				 */
-				if (plist_head_empty(&lock->wait_list))
-					lock->owner = lock_owner(lock);
-				/*
-				 * Just remove ourselves from the PI list.
-				 * (No big problem if our PI effect lingers
-				 *  a bit - owner will restore prio.)
-				 */
-				TRACE_WARN_ON_LOCKED(waiter.ti != ti);
-				TRACE_WARN_ON_LOCKED(current->blocked_on != &waiter);
-				plist_del(&waiter.pi_list);
-				waiter.pi_list.prio = task->prio;
-				waiter.ti = NULL;
-				current->blocked_on = NULL;
-				if (time) {
-					ret = (int)(expire - jiffies);
-					if (!timer_pending(&timer)) {
-						del_singleshot_timer_sync(&timer);
-						ret = -ETIMEDOUT;
-					}
-				} else
-					ret = -EINTR;
+		if(signal_pending(current)) {
+			if (time) {
+				ret = (int)(expire - jiffies);
+				if (!timer_pending(&timer)) {
+					ret = -ETIMEDOUT;
+				}
 			}
-			_raw_spin_unlock(&lock->wait_lock);
-			_raw_spin_unlock(&task->pi_lock);
-			trace_unlock_irq(&trace_lock, ti);
-			break;
+			else
+				ret = -EINTR;
+			
+			goto out_unlock;
 		}
-		if (!waiter.ti)
-			break;
-		schedule();
-		set_task_state(task, TASK_INTERRUPTIBLE);
-	}
-
-	/*
-	 * Check to see if we didn't have ownership stolen.
-	 */
-	if (!ret) {
-		if (capture_lock(&waiter, ti, task)) {
-			set_task_state(task, TASK_INTERRUPTIBLE);
-			goto wait_again;
+		else if(timer_installed &&
+			!timer_pending(&timer)) {
+			ret = -ETIMEDOUT;
+			goto out_unlock;
 		}
 	}
 
-	task->state = TASK_RUNNING;
-	current->flags |= nosched_flag;
 
+ out_unlock:
+	_raw_spin_unlock(&lock->wait_lock);
+	trace_unlock_irqrestore(&trace_lock, flags, ti);
+
+ out_free_timer:
+	if (time && timer_installed) {
+		if (!timer_pending(&timer)) {
+			del_singleshot_timer_sync(&timer);
+		}
+	}
 	FREE_WAITER(&waiter);
 	return ret;
 }
+
 /*
  * trylock for writing -- returns 1 if successful, 0 if contention
  */
@@ -1972,7 +1795,6 @@ static int __down_trylock(struct rt_mute
 
 	trace_lock_irqsave(&trace_lock, flags, ti);
 	TRACE_BUG_ON_LOCKED(!raw_irqs_disabled());
-	_raw_spin_lock(&task->pi_lock);
 	/*
 	 * It is OK for the owner of the lock to do a trylock on
 	 * a lock it owns, so to prevent deadlocking, we must
@@ -1989,17 +1811,11 @@ static int __down_trylock(struct rt_mute
 	if (likely(!old_owner) || __grab_lock(lock, task, old_owner->task)) {
 		/* granted */
 		TRACE_WARN_ON_LOCKED(!plist_head_empty(&lock->wait_list) && !old_owner);
-		if (old_owner) {
-			_raw_spin_lock(&old_owner->task->pi_lock);
-			set_new_owner(lock, old_owner, ti __EIP__);
-			_raw_spin_unlock(&old_owner->task->pi_lock);
-		} else
-			set_new_owner(lock, old_owner, ti __EIP__);
+		set_new_owner(lock, old_owner, ti __EIP__);
 		ret = 1;
 	}
 	_raw_spin_unlock(&lock->wait_lock);
 failed:
-	_raw_spin_unlock(&task->pi_lock);
 	trace_unlock_irqrestore(&trace_lock, flags, ti);
 
 	return ret;
@@ -2046,16 +1862,16 @@ static int down_read_trylock_mutex(struc
 }
 #endif
 
-static void __up_mutex_waiter_nosavestate(struct rt_mutex *lock __EIP_DECL__)
+static void __up_mutex_waiter(struct rt_mutex *lock,
+			      int save_state __EIP_DECL__)
 {
 	struct thread_info *old_owner_ti, *new_owner_ti;
 	struct task_struct *old_owner, *new_owner;
-	struct rt_mutex_waiter *w;
 	int prio;
 
 	old_owner_ti = lock_owner(lock);
 	old_owner = old_owner_ti->task;
-	new_owner_ti = pick_new_owner(lock, old_owner_ti, 0 __EIP__);
+	new_owner_ti = pick_new_owner(lock, old_owner_ti, save_state __EIP__);
 	new_owner = new_owner_ti->task;
 
 	/*
@@ -2063,67 +1879,21 @@ static void __up_mutex_waiter_nosavestat
 	 * to the previous priority (or to the next highest prio
 	 * waiter's priority):
 	 */
-	_raw_spin_lock(&old_owner->pi_lock);
-	prio = old_owner->normal_prio;
-	if (unlikely(!plist_head_empty(&old_owner->pi_waiters))) {
-		w = plist_first_entry(&old_owner->pi_waiters, struct rt_mutex_waiter, pi_list);
-		if (w->ti->task->prio < prio)
-			prio = w->ti->task->prio;
-	}
-	if (unlikely(prio != old_owner->prio))
-		pi_setprio(lock, old_owner, prio);
-	_raw_spin_unlock(&old_owner->pi_lock);
-#ifdef CAPTURE_LOCK
-#ifdef CONFIG_PREEMPT_RT
-	if (lock != &kernel_sem.lock) {
-#endif
-		new_owner->rt_flags |= RT_PENDOWNER;
-		new_owner->pending_owner = lock;
-#ifdef CONFIG_PREEMPT_RT
-	}
-#endif
-#endif
-	wake_up_process(new_owner);
-}
-
-static void __up_mutex_waiter_savestate(struct rt_mutex *lock __EIP_DECL__)
-{
-	struct thread_info *old_owner_ti, *new_owner_ti;
-	struct task_struct *old_owner, *new_owner;
-	struct rt_mutex_waiter *w;
-	int prio;
+	if(ALL_TASKS_PI || rt_prio(old_owner->prio)) {
+		_raw_spin_lock(&old_owner->pi_lock);
 
-	old_owner_ti = lock_owner(lock);
-	old_owner = old_owner_ti->task;
-	new_owner_ti = pick_new_owner(lock, old_owner_ti, 1 __EIP__);
-	new_owner = new_owner_ti->task;
+		prio = calc_pi_prio(old_owner);
+		if (unlikely(prio != old_owner->prio))
+			mutex_setprio(old_owner, prio);
 
-	/*
-	 * If the owner got priority-boosted then restore it
-	 * to the previous priority (or to the next highest prio
-	 * waiter's priority):
-	 */
-	_raw_spin_lock(&old_owner->pi_lock);
-	prio = old_owner->normal_prio;
-	if (unlikely(!plist_head_empty(&old_owner->pi_waiters))) {
-		w = plist_first_entry(&old_owner->pi_waiters, struct rt_mutex_waiter, pi_list);
-		if (w->ti->task->prio < prio)
-			prio = w->ti->task->prio;
-	}
-	if (unlikely(prio != old_owner->prio))
-		pi_setprio(lock, old_owner, prio);
-	_raw_spin_unlock(&old_owner->pi_lock);
-#ifdef CAPTURE_LOCK
-#ifdef CONFIG_PREEMPT_RT
-	if (lock != &kernel_sem.lock) {
-#endif
-		new_owner->rt_flags |= RT_PENDOWNER;
-		new_owner->pending_owner = lock;
-#ifdef CONFIG_PREEMPT_RT
+		_raw_spin_unlock(&old_owner->pi_lock);
+	}
+	if(save_state) {
+		wake_up_process_mutex(new_owner);
+	}
+	else {
+		wake_up_process(new_owner);
 	}
-#endif
-#endif
-	wake_up_process_mutex(new_owner);
 }
 
 #ifdef CONFIG_PREEMPT_RT
@@ -2578,7 +2348,7 @@ int __lockfunc _read_trylock(rwlock_t *r
 {
 #ifdef CONFIG_DEBUG_RT_LOCKING_MODE
 	if (!preempt_locks)
-	return _raw_read_trylock(&rwlock->lock.lock.debug_rwlock);
+		return _raw_read_trylock(&rwlock->lock.lock.debug_rwlock);
 	else
 #endif
 		return down_read_trylock_mutex(&rwlock->lock);
@@ -2905,17 +2675,6 @@ notrace int irqs_disabled(void)
 EXPORT_SYMBOL(irqs_disabled);
 #endif
 
-/*
- * This routine changes the owner of a mutex. It's only
- * caller is the futex code which locks a futex on behalf
- * of another thread.
- */
-void fastcall rt_mutex_set_owner(struct rt_mutex *lock, struct thread_info *t)
-{
-	account_mutex_owner_up(current);
-	account_mutex_owner_down(t->task, lock);
-	lock->owner = t;
-}
 
 struct thread_info * fastcall rt_mutex_owner(struct rt_mutex *lock)
 {
@@ -2950,7 +2709,6 @@ down_try_futex(struct rt_mutex *lock, st
 
 	trace_lock_irqsave(&trace_lock, flags, proxy_owner);
 	TRACE_BUG_ON_LOCKED(!raw_irqs_disabled());
-	_raw_spin_lock(&task->pi_lock);
 	_raw_spin_lock(&lock->wait_lock);
 
 	old_owner = lock_owner(lock);
@@ -2959,16 +2717,10 @@ down_try_futex(struct rt_mutex *lock, st
 	if (likely(!old_owner) || __grab_lock(lock, task, old_owner->task)) {
 		/* granted */
 		TRACE_WARN_ON_LOCKED(!plist_head_empty(&lock->wait_list) && !old_owner);
-		if (old_owner) {
-			_raw_spin_lock(&old_owner->task->pi_lock);
-			set_new_owner(lock, old_owner, proxy_owner __EIP__);
-			_raw_spin_unlock(&old_owner->task->pi_lock);
-		} else
 			set_new_owner(lock, old_owner, proxy_owner __EIP__);
 		ret = 1;
 	}
 	_raw_spin_unlock(&lock->wait_lock);
-	_raw_spin_unlock(&task->pi_lock);
 	trace_unlock_irqrestore(&trace_lock, flags, proxy_owner);
 
 	return ret;
@@ -3064,3 +2816,33 @@ void fastcall init_rt_mutex(struct rt_mu
 	__init_rt_mutex(lock, save_state, name, file, line);
 }
 EXPORT_SYMBOL(init_rt_mutex);
+
+
+pid_t get_blocked_on(task_t *task)
+{
+	pid_t res = 0;
+	struct rt_mutex *lock;
+	struct thread_info *owner;
+ try_again:
+	_raw_spin_lock(&task->pi_lock);
+	if(!task->blocked_on) {
+		_raw_spin_unlock(&task->pi_lock);
+		goto out;
+	}
+	lock = task->blocked_on->lock;
+	if(!_raw_spin_trylock(&lock->wait_lock)) {
+		_raw_spin_unlock(&task->pi_lock);
+		goto try_again;
+	}       
+	owner = lock_owner(lock);
+	if(owner)
+		res = owner->task->pid;
+
+	_raw_spin_unlock(&task->pi_lock);
+	_raw_spin_unlock(&lock->wait_lock);
+        
+ out:
+	return res;
+                
+}
+EXPORT_SYMBOL(get_blocked_on);

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux