Re: 2.6.13-rc6-rt6 — Linux Kernel

Ingo Molnar wrote:

* Steven Rostedt <rostedt@goodmis.org> wrote:
On Wed, 2005-08-17 at 10:24 -0400, Steven Rostedt wrote:
OK the output from netconsole still seems like netconsole itself is
causing some problems.  But I think it is also showing this lockup. I'll
recompile my kernel as UP and see if netconsole works fine.
Well, the UP kernel boots on my laptop, but netconsole gives strangewarnings.
OK, what's the scoop with the illegal_API_call? What is it about, andwhat is the expected work around?
this is a recent change: i've started flagging "naked" use oflocal_irq_disable(), because it's a problem on PREEMPT_RT and it's apotential SMP bug on upstream kernels. A local_irq_disable() isconverted either to raw_local_irq_disable() when justified (it's mostlyonly justified for lowlevel arch code), or is eliminated totally.(either by merging it into a nearby spin_lock API call, or by removingit altogether, making sure that code doesnt break).
Right now we print a warning on the first such API use, and then shut upabout it. All local_irq_() APIs map to NOPs. (we keep the PF_IRQSOFFflag for compatibility, but only to get irqs_off() right which in turnshuts off a number of BUG_ON(!irqs_disabled()) warnings, and it doesnthave any other functional purpose.)
the desired end-result would be the total elimination of local_irq_*()API calls.
I'm also getting the following output on shutdown:

NET: Registered protocol family 31
Bluetooth: HCI device and connection manager initialized
Bluetooth: HCI socket layer initialized
Bluetooth: L2CAP ver 2.7
Bluetooth: L2CAP socket layer initialized
Bluetooth: RFCOMM ver 1.5
Bluetooth: RFCOMM socket layer initialized
Bluetooth: RFCOMM TTY layer initialized
BUG: nonzero lock count 1 at exit time?
           nfsd: 4696 [f7183830, 115]
[<c0136922>] check_no_held_locks+0x62/0x330 (8)
[<c011df67>] do_exit+0x257/0x480 (32)
[<c013d052>] __module_put_and_exit+0x52/0x70 (40)
[<f8d54583>] nfsd+0x2b3/0x340 [nfsd] (12)
[<f8d542d0>] nfsd+0x0/0x340 [nfsd] (48)
[<c010140d>] kernel_thread_helper+0x5/0x18 (16)
---------------------------
| preempt count: 00000000 ]
| 0-level deep critical section nesting:
----------------------------------------

------------------------------
| showing all locks held by: |  (nfsd/4696 [f7183830, 116]):
------------------------------

#001:             [c038e184] {kernel_sem.lock}
... acquired at:               lock_kernel+0x21/0x40

BUG: nfsd/4696, BKL held at task exit time!

hm, it seems nfsd forgets to do an unlock_kernel() in some exit path itseems? We are enforcing strict balanced lock use in PREEMPT_RT - theupstream kernel is more relaxed about it.

This one has been biting me in the shorts since going to the 2.6.13-rc?RT series on my older SMP system at home. In every case the system hangson shutdown and requires a hard reset. I just hadn't had the time tocheck into it. I was just in the process of building 2.6.13-rc6 withoutRT to check if it still happened. Guess I won't bother now. :-)

Aug 16 11:11:09 porky kernel: BUG: nonzero lock count 1 at exit time?
Aug 16 11:11:09 porky kernel:             nfsd: 4476 [dd1691a0, 115]
Aug 16 11:11:09 porky kernel:  [<c010418e>] dump_stack+0x1e/0x20 (20)

Aug 16 11:11:09 porky kernel: [<c013b7ff>]check_no_held_locks+0x1af/0x370 (36)

Aug 16 11:11:09 porky kernel:  [<c0122e3f>] do_exit+0x26f/0x480 (44)

Aug 16 11:11:09 porky kernel: [<c01413c1>]__module_put_and_exit+0x51/0x70 (16)

Aug 16 11:11:09 porky kernel:  [<e5a8558d>] nfsd+0x2bd/0x340 [nfsd] (68)

Aug 16 11:11:09 porky kernel: [<c0101315>]kernel_thread_helper+0x5/0x10 (65485

2124)
Aug 16 11:11:09 porky kernel: ---------------------------
Aug 16 11:11:09 porky kernel: | preempt count: 00000000 ]
Aug 16 11:11:09 porky kernel: | 0-level deep critical section nesting:
Aug 16 11:11:09 porky kernel: ----------------------------------------
Aug 16 11:11:09 porky kernel:
Aug 16 11:11:09 porky kernel: ------------------------------

Aug 16 11:11:09 porky kernel: | showing all locks held by: | (nfsd/4476[dd1691

a0, 116]):
Aug 16 11:11:09 porky kernel: ------------------------------
Aug 16 11:11:09 porky kernel:
Aug 16 11:11:09 porky kernel: #001:             [c0390fe4] {kernel_sem.lock}

Aug 16 11:11:09 porky kernel: ... acquired at:lock_kernel+0x28/0x50

Aug 16 11:11:09 porky kernel:
Aug 16 11:11:09 porky kernel: BUG: nfsd/4476, BKL held at task exit time!
Aug 16 11:11:09 porky kernel: BKL acquired at: nfsd+0x273/0x340 [nfsd]
Aug 16 11:11:09 porky kernel:  [c0390fe4] {kernel_sem.lock}

Aug 16 11:11:09 porky kernel: .. held by: nfsd: 4476[dd1691a0, 116]Aug 16 11:11:09 porky kernel: ... acquired at:lock_kernel+0x28/0x50

Aug 16 11:46:44 porky syslogd 1.4.1: restart.

And it goes on and on. This happens everytime. Without netconsole, I
only get the nonzero lock count error. Also, one of my lockups on SMP
had to do with the kernel_thread_helper:

Using IPI Shortcut mode
khelper/794[CPU#0]: BUG in set_new_owner at kernel/rt.c:916

this is a 'must not happen'. Somehow lock->held list got non-empty.Maybe some use-after-free thing? Havent seen it myself.

	Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


--
   kr
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: 2.6.13-rc6-rt6
  - From: Steven Rostedt <rostedt@goodmis.org>

References:
- 2.6.13-rc6-rt6
  - From: Ingo Molnar <mingo@elte.hu>
- Re: 2.6.13-rc6-rt6
  - From: Steven Rostedt <rostedt@goodmis.org>
- Re: 2.6.13-rc6-rt6
  - From: Steven Rostedt <rostedt@goodmis.org>
- Re: 2.6.13-rc6-rt6
  - From: Steven Rostedt <rostedt@goodmis.org>
- Re: 2.6.13-rc6-rt6
  - From: Steven Rostedt <rostedt@goodmis.org>
- Re: 2.6.13-rc6-rt6
  - From: Steven Rostedt <rostedt@goodmis.org>
- Re: 2.6.13-rc6-rt6
  - From: Ingo Molnar <mingo@elte.hu>
- Re: 2.6.13-rc6-rt6
  - From: Steven Rostedt <rostedt@goodmis.org>
- Re: 2.6.13-rc6-rt6
  - From: Steven Rostedt <rostedt@goodmis.org>
- Re: 2.6.13-rc6-rt6
  - From: Steven Rostedt <rostedt@goodmis.org>
- Re: 2.6.13-rc6-rt6
  - From: Ingo Molnar <mingo@elte.hu>

Prev by Date: Re: [RFC] [PATCH] Split host arch headers for UML's benefit
Next by Date: Atheros and rt2x00 driver
Previous by thread: Re: 2.6.13-rc6-rt6
Next by thread: Re: 2.6.13-rc6-rt6
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind]