forcedeth kernel panic — Linux Kernel

Hi,

an ASUS M2N32 WS Pro (nVidia MCP55 chipset) based machine with on-boardGbit ethernet leads to kernel panic under high network load.

The machine is to be a Samba server and got minimal 64bit Debian Etchinstalled. First it crashed with stock Debian 2.6.18-amd64 kernel so Iupgraded to 2.6.21 and at last to 2.6.22-2-amd64 (source from Debian).The crashes varied per kernel but were always fatal (only hard resethelped) so I decided to post also here (in addition to Debian's BTS#442877).

The crash occurs under high network load generated by tserv from dbenchpackage within about 20 minutes of tserv test (run from another machine)against this machine (which is running tserv_srv).

Before it crashes it fills the kernel log with the following messagesthat may or may not be related to the crash:


Sep 17 14:51:27 harapes kernel: eth0: too many iterations (6) in nv_nic_irq.
Sep 17 14:51:58 harapes last message repeated 1026 times
Sep 17 14:52:59 harapes last message repeated 2063 times
Sep 17 14:54:00 harapes last message repeated 2055 times
Sep 17 14:55:01 harapes last message repeated 2044 times

I wrote it may not be related because I got here an older nForce basedmachine that is running the tserv against the crashing server and italso fills the log with the same messages - but fortunately it does notcrash...

After killing the machine several times in a row I googled a bit andfound some suggestions so now I am testing a different setup - theforcedeth driver loaded with "optimization_mode=1" parameter and so far(95 minutes of tserv run) it didn't crash...

More details about the hardware: AMD64 3600+ (=2GHz), 2GB of DDR2, 6SATA drives in RAID1 and RAID5 configuration on the on-board SATAdriver, a PCI S3 graphics and that's it.


dmesg output related to networking:

forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.60.
forcedeth: using HIGHDMA
eth0: forcedeth.c: subsystem: 01043:81fb bound to 0000:00:10.0
eth0: no IPv6 routers present


lspci -vv:

00:10.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a2)
        Subsystem: ASUSTeK Computer Inc. Unknown device 81fb

Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-ParErr- Stepping- SERR- FastB2B-Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort-<TAbort- <MAbort- >SERR- <PERR-

        Latency: 0 (250ns min, 5000ns max)
        Interrupt: pin A routed to IRQ 1272
        Region 0: Memory at fe02a000 (32-bit, non-prefetchable) [size=4K]
        Region 1: I/O ports at b400 [size=8]
        Region 2: Memory at fe029000 (32-bit, non-prefetchable) [size=256]
        Region 3: Memory at fe028000 (32-bit, non-prefetchable) [size=16]
        Capabilities: [44] Power Management version 2

Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mAPME(D0+,D1+,D2+,D3hot+,D3cold+)

                Status: D0 PME-Enable+ DSel=0 DScale=0 PME-
        Capabilities: [70] MSI-X: Enable- Mask- TabSize=8
                Vector table: BAR=2 offset=00000000
                PBA: BAR=3 offset=00000000

Capabilities: [50] Message Signalled Interrupts: Mask+ 64bit+Queue=0/3 Enable+

                Address: 00000000fee0300c  Data: 4189
                Masking: 000000fe  Pending: 00000000
        Capabilities: [6c] HyperTransport: MSI Mapping


The incomplete kernel panic dump hand-copied from the stuck console:

Call Trace:
<IRQ> :forcedeth: nv_nic_irq_optimized+0x89/0x22c
 handle_IRQ_event+0x25/0x53
 __do_softirq+0x55/0xc3
 handle_edge_irq+0xe4/0x127
 do_IRQ+0x6c/0xd5
 default_idle+0x0/0x3d
 ret_from_intr+0x0/0xa
 <EOI> default_idle+0x29/0x3d
 cpu_idle+0x8b/0xae

Code: 8a 83 84 00 00 00 83 e0 f3 83 c8 04 88 83 84 00 00 00 83 7b
RIP :forcedeth:nv_rx_process_optimized+0xe6/0x380
Kernel panic - not syncing: Aiee, killing interrupt handler!

I may have to replace the on-board ethernet with some PCI based cardbecause I need a reliable server very soon and when it gets deployed Iwon't have a chance of playing with it anymore so if there is asuggestion I could try now for perfect kernel forcedeth stability thenplease let me know soon. Is the "optimization_mode=1" the rightsolution? What kind of negative impact does it have?


Thanks!

Petr
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Prev by Date: [PATCH] remove timerd() syscall number
Next by Date: Re: [PATCH 5/5][NFS] Cleanup explicit check for mandatory locks
Previous by thread: [PATCH] remove timerd() syscall number
Next by thread: Re: [ofa-general] [PATCH] [WORKAROUND] CONFIG_PREEMPT_RT and ib_umad_close() issue
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]