Re: [patch] increase spinlock-debug looping timeouts (write_lock and NMI)

Andrew Morton wrote:

On Mon, 19 Jun 2006 22:35:46 -0700 (PDT)
Dave Olson <[email protected]> wrote:
|| I get that impression ;) If it takes 1-2 seconds to get this lock then it
| can take five seconds.  a) that's just gross and b) the NMI watchdog will
| nuke the box.
|| Why is it taking so long to get the lock?|| Does it happen in non-debug mode?|| What do we do about it?
It seems possible that this might be the cause of problems we've had
with our InfiniPath hardware/software, and also Mellanox/OpenIB hardware/software
on some quad-socket/dual core opteron systems (8 cpu cores).

We'll see very long delays when 8 MPI processes exit "simultaneously", and sometimes
get NMI, sometimes system hangs, and sometimes just hung up for many seconds (and
often in that state, doing sysrq-P or sysrq-T will make things happy again).
OK.  I assume these processes have done a mmap(MAP_SHARED) of a lot of
memory?
A typical trace looks like this (on an fc4 2.6.16 kernel):
fc4?  You seem to have an RH-FCx which doesn't enable
CONFIG_DEBUG_SPINLOCK.  Or maybe we didn't have all that debug code in
2.6.16.  Doesn't matter, really.
[root@quad-00 ~]# NMI Watchdog detected LOCKUP on CPU 0
CPU 0Modules linked in: nfs nfsd exportfs lockd nfs_acl ipv6 autofs4 sunrpc ib_sdp(U)
ib_cm(U) ib_umad(U) ib_uverbs(U) ib_ipoib(U) ib_sa(U) ib_ipath(U) ib_mad(U)
ib_core(U) video button battery ac i2c_nforce2 i2c_core ipath_core(U) e1000
floppy sg dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod sata_nv libata aic79xx
scsi_transport_spi sd_mod scsi_mod
Pid: 4239, comm: mpi_multibw Not tainted 2.6.16-1.2096_FC4.rootsmp #1
RIP: 0010:[<ffffffff80213a30>] <ffffffff80213a30>{_raw_write_lock+161}
RSP: 0018:ffff810078e07c18 EFLAGS: 00000086RAX: 000000008f100300 RBX: ffff81007b7bea58 RCX: 00000000002dc5a0
RDX: 0000000000927efd RSI: 0000000000000001 RDI: ffff81007b7bea58
RBP: ffff81007b7bea40 R08: ffff810002e3ae80 R09: 00000000fffffffa
R10: 0000000000000003 R11: ffffffff801644e2 R12: ffff81007b7bea58
R13: 00002aaaad800000 R14: ffff810002e3aec0 R15: 00002aaabba6f000
FS:  0000000040a00960(0000) GS:ffffffff80514000(0000) knlGS:00000000f7fc86c0
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003bCR2: 00000033f38bdaf0 CR3: 0000000000101000 CR4: 00000000000006e0
Process mpi_multibw (pid: 4239, threadinfo ffff810078e06000, task ffff810079d8a040)
Stack: ffff810002e3aec0 ffffffff8016452b 0000000078ebb067 00002aaaad757000ffff810078dccab8 ffffffff8016b840 0000000000000000 ffff810078e07d38ffffffffffffffff 0000000000000000Call Trace: <ffffffff8016452b>{__set_page_dirty_nobuffers+73}
      <ffffffff8016b840>{unmap_vmas+1042} <ffffffff8016e638>{exit_mmap+124}
<ffffffff80132b07>{mmput+37} <ffffffff80138373>{do_exit+584}<ffffffff801416dc>{__dequeue_signal+459} <ffffffff80138af0>{sys_exit_group+0}
      <ffffffff80142af3>{get_signal_to_deliver+1568}
<ffffffff8010a14a>{do_signal+116}
      <ffffffff80195dc1>{__pollwait+0} <ffffffff80196b0c>{sys_select+934}
      <ffffffff8010aa87>{sysret_signal+28}
<ffffffff8010ad73>{ptregscall_common+103}
Code: 84 c0 75 7f f0 81 03 00 00 00 01 f3 90 48 83 c1 01 48 8b 15Kernel panic - not syncing: nmi watchdog


Any ideas what it might be waiting on?

blam, dead box, that's the one, thanks.
With our current rwlock semantics I don't know if this is fixable.Probably we need to go back to a spinlock on tree_lock.


Lockless pagecache makes most of the readside locks go away, so I have
converted tree_lock back to a spinlock in my tree. I've just started
working on it again with a view for submitting it (or at least the
RCU radix tree, to start with)... been having fun with a userspace RCU
for rtth ;)

Otherwise, a straight rwlock->spinlock conversion will have a few more
scalability issues, but I'd guess it wouldn't be a problem  at all for
most workloads on most systems.

--
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com-

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: [patch] increase spinlock-debug looping timeouts (write_lock and NMI)
  - From: Arjan van de Ven <[email protected]>
- Re: [patch] increase spinlock-debug looping timeouts (write_lock and NMI)
  - From: Ingo Molnar <[email protected]>
- Re: [patch] increase spinlock-debug looping timeouts (write_lock and NMI)
  - From: Andrew Morton <[email protected]>

References:
- Re: [patch] increase spinlock-debug looping timeouts (write_lock and NMI)
  - From: Dave Olson <[email protected]>
- Re: [patch] increase spinlock-debug looping timeouts (write_lock and NMI)
  - From: Andrew Morton <[email protected]>

Prev by Date: Re: [PATCH] utrace: new modular infrastructure for user debug/tracing
Next by Date: Re: [PATCH] utrace: new modular infrastructure for user debug/tracing
Previous by thread: Re: [patch] increase spinlock-debug looping timeouts (write_lock and NMI)
Next by thread: Re: [patch] increase spinlock-debug looping timeouts (write_lock and NMI)
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]