Re: Problems with 2.6.17-rt8

Please don't trim CC lines.  LKML is too big to read all emails.

On Thu, 2006-08-03 at 04:48 -0700, Robert Crocombe wrote: 
> On 8/2/06, Steven Rostedt <[email protected]> wrote:
> > You mention problems but I don't see you listing what exactly the
> > problems are.  Just saying "the problems exist" doesn't tell us
> > anything.
> >
> > Don't assume that we will go to some web site to figure out what you're
> > talking about. Please list the problems you are facing.
> 
> The machine dies (no alt-sysrq, no keyboard LEDs of any kind: dead in
> the water).  I thought the log would provide more useful information
> without potentially erroneous editorialization by myself.  Here are
> some highlights:
> 
> kjournald/1105[CPU#3]: BUG in debug_rt_mutex_unlock at kernel/rtmutex-debug.c:47
> 1

Ouch, that looks like kjournald is unlocking a lock that it doesn't own?

> 
> Call Trace:
>        <ffffffff8047655a>{_raw_spin_lock_irqsave+24}
>        <ffffffff8022b272>{__WARN_ON+100}
>        <ffffffff802457e4>{debug_rt_mutex_unlock+199}
>        <ffffffff804757b7>{rt_lock_slowunlock+25}
>        <ffffffff80476301>{__lock_text_start+9}

hmm, here we are probably having trouble with the percpu slab locks,
that is somewhat of a hack to get slabs working on a per cpu basis.

>        <ffffffff80271e93>{kmem_cache_alloc+202}

It would also be nice to know exactly where ffffffff80271e93 is.

>        <ffffffff8025493b>{mempool_alloc_slab+17}
>        <ffffffff80254d07>{mempool_alloc+75}
>        <ffffffff802f2f8c>{generic_make_request+375}
>        <ffffffff8027b914>{bio_alloc_bioset+35}
>        <ffffffff8027ba2a>{bio_alloc+16}
>        <ffffffff802781d1>{submit_bh+137}
>        <ffffffff80279377>{ll_rw_block+122}
>        <ffffffff8027939e>{ll_rw_block+161}
>        <ffffffff802c85dc>{journal_commit_transaction+1011}
>        <ffffffff80476a5f>{_raw_spin_unlock_irqrestore+56}
>        <ffffffff804769ac>{_raw_spin_unlock+46}
>        <ffffffff804757df>{rt_lock_slowunlock+65}
>        <ffffffff80476301>{__lock_text_start+9}
>        <ffffffff802339b0>{try_to_del_timer_sync+85}
>        <ffffffff802cca63>{kjournald+202}
>        <ffffffff8023db60>{autoremove_wake_function+0}
>        <ffffffff802cc999>{kjournald+0}
>        <ffffffff8023d739>{keventd_create_kthread+0}
>        <ffffffff8023da2f>{kthread+219}
>        <ffffffff80225a23>{schedule_tail+188}
>        <ffffffff8020aaca>{child_rip+8}
>        <ffffffff8023d739>{keventd_create_kthread+0}
>        <ffffffff8023d954>{kthread+0}
>        <ffffffff8020aac2>{child_rip+0}
> ---------------------------
> | preempt count: 00000002 ]
> | 2-level deep critical section nesting:
> ----------------------------------------
> .. [<ffffffff80476499>] .... _raw_spin_lock+0x16/0x23
> .....[<ffffffff804757af>] ..   ( <= rt_lock_slowunlock+0x11/0x6b)
> .. [<ffffffff8047655a>] .... _raw_spin_lock_irqsave+0x18/0x29
> .....[<ffffffff8022b22d>] ..   ( <= __WARN_ON+0x1f/0x82)
> 
> 
> Somewhat later:
> 
> Kernel BUG at kernel/rtmutex.c:639

The rest was probably caused as a side effect from above.  The above is
already broken!

You have NUMA configured too, so this is also something to look at.

I still wouldn't ignore the first bug message you got:

----
BUG: scheduling while atomic: udev_run_devd/0x00000001/1568

Call Trace:
       <ffffffff8045c693>{__schedule+155}
       <ffffffff8045f156>{_raw_spin_unlock_irqrestore+53}
       <ffffffff80242241>{task_blocks_on_rt_mutex+518}
       <ffffffff80252da0>{free_pages_bulk+39}
       <ffffffff80252da0>{free_pages_bulk+39}
...
----

This could also have a side effect that messes things up.

Unfortunately, right now I'm assigned to other tasks and I cant spend
much more time on this at the moment.  So hopefully, Ingo, Thomas or
Bill, or someone else can help you find the reason for this problem.

-- Steve


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- [Patch] restore the RCU callback to defer put_task_struct() Re: Problems with 2.6.17-rt8
  - From: Bill Huey (hui) <[email protected]>
- Re: Problems with 2.6.17-rt8
  - From: Bill Huey (hui) <[email protected]>
- Re: Problems with 2.6.17-rt8
  - From: "Robert Crocombe" <[email protected]>

References:
- Problems with 2.6.17-rt8
  - From: "Robert Crocombe" <[email protected]>
- Re: Problems with 2.6.17-rt8
  - From: Steven Rostedt <[email protected]>
- Re: Problems with 2.6.17-rt8
  - From: "Robert Crocombe" <[email protected]>

Prev by Date: Re: partial reiser4 review comments
Next by Date: Re: [PATCH 01/28] prepare for write access checks: collapse if()
Previous by thread: Re: Problems with 2.6.17-rt8
Next by thread: Re: Problems with 2.6.17-rt8
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]