Re: Recursion bug in -rt

The rt deadlock check is also recursive, but it stops at a depth of20,
deciding something must be corrupt to have a task blocked on more
than 20 locks.

We should also set a limit.  We can either just hang an ill-behaved
app on the waitqueue or return an error code to the application.
Any suggestions on which would be best, hang an illbehaved app
or just return it an error?
Per default make the tasks block. You can always make the return codeanoption on each futex, but it has to be off as default. Here is why Ithink
so:

Having an return code would require you do the deadlock detection "up
front" in the down() operation. That takes a lot of CPU cycles.Ofcourse,if you return an error code, the application could use the info forsomethingconstructive, but I am afraid must applications wont do anythingconstructiveabout it anyway (i.e. such the application continues to run) - sucherror
handling code would be hard to write.

Yes, I agree. I'll hang the app on a new waitqueue that will letthe usersee with 'ps' that they are hung in futex_ill_behaved waitqueue or someother name

that makes it easy to see where the app is stopped.

thanks.


David

What is needed in most application is that the stuff simply

deadlocks with the tasks blocked on the various locks. Then you can goin and

trace the locks "postmortem".

With the current setup, where you deadlock detection has to beperformed

"up front" because the rt_mutex can make spinlock-deadlocks, the error
code will be a natural thing. But when rt_mutex is "fixed" or

replaced with something else, this feature will force the kernel torun

deadlock detection "up front" even though it isn't used for anything
usefull.

Esben

David

I am working on a new way to do priority inheritance for nested locks
in
rt_mutex such you do not risc deadlocking the system on raw-spinlocks

when you have a rt_mutex deadlock. But it wont have deadlockdetection

without

CONFIG_DEBUG_DEADLOCKS. On the other hand it would be possible tomake

a
deadlock scanner finding deadlocks in the system after they have
happened.
With a suitable /proc interface it could even be done in userspace.

My patch to the rt_mutex is far from finished. I haven't evencompiled

a
kernel with it yet. I spend the little time I have between my
family goes to bed and I simply have to go to bed myself writing a

unittest framework for the rt_mutex and have both the original andthepatched rt_mutex parsing all my tests. But I need more tests tohammer

out the details about task->state forinstance. If anyone is
interrested I
would be happy to send what I got right now.

Esben

	It's also easier to see if a POSIX compliant app has deadlocked
itself.
the 'ps' command will show that the wait channel of a deadlocked
application is waiting at 'futex_deadlock'.

	Let me know if it passes all your tests.

David




On Dec 20, 2005, at 7:50 AM, Dinakar Guniguntala wrote:

On Tue, Dec 20, 2005 at 02:19:56PM +0100, Ingo Molnar wrote:

hm, i'm looking at -rf4 - these changes look fishy:

-       _raw_spin_lock(&lock_owner(lock)->task->pi_lock);
+       if (current != lock_owner(lock)->task)
+               _raw_spin_lock(&lock_owner(lock)->task->pi_lock);

why is this done?

Ingo, this is to prevent a kernel hang due to application error.

Basically when an application does a pthread_mutex_lock twice on a
_nonrecursive_ mutex with robust/PI attributes the whole system
hangs.
Ofcourse the application clearly should not be doing anything like
that, but it should not end up hanging the system either

	-Dinakar

-
To unsubscribe from this list: send the line "unsubscribe
linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

To unsubscribe from this list: send the line "unsubscribelinux-kernel" in

the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

References:
- Re: Recursion bug in -rt
  - From: Esben Nielsen <simlo@phys.au.dk>

Prev by Date: Re: dual line backtraces for i386.
Next by Date: Re: [ipw2200] add monitor and qos entries to Kconfig
Previous by thread: Re: Recursion bug in -rt
Next by thread: robust futex deadlock detection patch
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]