Re: 2.6.13-rc6-rt9

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Deadlock finally found!!! 

I've been debugging this all week. And at 2:30 in the morning I finally
found where it is. It really sucks when you need to debug on something
that doesn't have a serial, and netconsole doesn't work that reliably.

This also explains why this only happened on my laptop. It is caused by
the dependent_sleeper, which is used quite a bit when you have SCHED_HT
turned on.  Although I wouldn't be surprised if this deadlock exists
elsewhere.

The dependent_sleeper (used by SCHED_HT) grabs all the run queue locks
for the physical CPU.  Then it calls trace_stop_sched_switched. So this
is the calling chain.

dependent_sleeper
    +==> grabs CPU0 rq and CPU1 rq lock (saying CPU 0 and 1 are on the
same physical CPU)
   -> trace_stop_sched_switched
      -> check_wakeup_timing
          -> down_trylock(max_mutex)
                -> rt_down_trylock
                    -> __down_trylock
                       +==> grabs trace_lock

Now lets look at something at the same time that is unlocking.

rt_up
  -> __up_mutex_nosavestate_inline
    ->  ___up_mutex_nosavestate
       -> ____up_mutex
          +==> grabs trace_lock
         -> __up_mutex_waiter_nosavestate
            -> wake_up_process
               -> try_to_wake_up
                 +==> grabs rq lock

Here we can see that there's a reverse order here and we have a
deadlock.  I actually witness this using my logger to show the traces.
All I needed was the last few lines, so the console was fine here.

Ingo, can't you get rt.c to be more confusing. I mean it is too simple.
We need to add a few more underscores here and there :-)  Seriously,
that rt.c is mind boggling. It was nice before, now it is just screaming
for a cleanup (come now, do we really need the four underscores?).  Same
with latency.c. 

Well, there's the deadlock, I'm too tired to figure out all the paths,
and what the heck is going on. So I'll give you the honour of writing
the patch. ;-)

Thanks,

-- Steve



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]
  Powered by Linux