forcedeth kernel panic

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

an ASUS M2N32 WS Pro (nVidia MCP55 chipset) based machine with on-board Gbit ethernet leads to kernel panic under high network load.

The machine is to be a Samba server and got minimal 64bit Debian Etch installed. First it crashed with stock Debian 2.6.18-amd64 kernel so I upgraded to 2.6.21 and at last to 2.6.22-2-amd64 (source from Debian). The crashes varied per kernel but were always fatal (only hard reset helped) so I decided to post also here (in addition to Debian's BTS #442877).

The crash occurs under high network load generated by tserv from dbench package within about 20 minutes of tserv test (run from another machine) against this machine (which is running tserv_srv).

Before it crashes it fills the kernel log with the following messages that may or may not be related to the crash:

Sep 17 14:51:27 harapes kernel: eth0: too many iterations (6) in nv_nic_irq.
Sep 17 14:51:58 harapes last message repeated 1026 times
Sep 17 14:52:59 harapes last message repeated 2063 times
Sep 17 14:54:00 harapes last message repeated 2055 times
Sep 17 14:55:01 harapes last message repeated 2044 times

I wrote it may not be related because I got here an older nForce based machine that is running the tserv against the crashing server and it also fills the log with the same messages - but fortunately it does not crash...

After killing the machine several times in a row I googled a bit and found some suggestions so now I am testing a different setup - the forcedeth driver loaded with "optimization_mode=1" parameter and so far (95 minutes of tserv run) it didn't crash...

More details about the hardware: AMD64 3600+ (=2GHz), 2GB of DDR2, 6 SATA drives in RAID1 and RAID5 configuration on the on-board SATA driver, a PCI S3 graphics and that's it.

dmesg output related to networking:

forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.60.
forcedeth: using HIGHDMA
eth0: forcedeth.c: subsystem: 01043:81fb bound to 0000:00:10.0
eth0: no IPv6 routers present


lspci -vv:

00:10.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a2)
        Subsystem: ASUSTeK Computer Inc. Unknown device 81fb
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 0 (250ns min, 5000ns max)
        Interrupt: pin A routed to IRQ 1272
        Region 0: Memory at fe02a000 (32-bit, non-prefetchable) [size=4K]
        Region 1: I/O ports at b400 [size=8]
        Region 2: Memory at fe029000 (32-bit, non-prefetchable) [size=256]
        Region 3: Memory at fe028000 (32-bit, non-prefetchable) [size=16]
        Capabilities: [44] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 PME-Enable+ DSel=0 DScale=0 PME-
        Capabilities: [70] MSI-X: Enable- Mask- TabSize=8
                Vector table: BAR=2 offset=00000000
                PBA: BAR=3 offset=00000000
Capabilities: [50] Message Signalled Interrupts: Mask+ 64bit+ Queue=0/3 Enable+
                Address: 00000000fee0300c  Data: 4189
                Masking: 000000fe  Pending: 00000000
        Capabilities: [6c] HyperTransport: MSI Mapping


The incomplete kernel panic dump hand-copied from the stuck console:

Call Trace:
<IRQ> :forcedeth: nv_nic_irq_optimized+0x89/0x22c
 handle_IRQ_event+0x25/0x53
 __do_softirq+0x55/0xc3
 handle_edge_irq+0xe4/0x127
 do_IRQ+0x6c/0xd5
 default_idle+0x0/0x3d
 ret_from_intr+0x0/0xa
 <EOI> default_idle+0x29/0x3d
 cpu_idle+0x8b/0xae

Code: 8a 83 84 00 00 00 83 e0 f3 83 c8 04 88 83 84 00 00 00 83 7b
RIP :forcedeth:nv_rx_process_optimized+0xe6/0x380
Kernel panic - not syncing: Aiee, killing interrupt handler!



I may have to replace the on-board ethernet with some PCI based card because I need a reliable server very soon and when it gets deployed I won't have a chance of playing with it anymore so if there is a suggestion I could try now for perfect kernel forcedeth stability then please let me know soon. Is the "optimization_mode=1" the right solution? What kind of negative impact does it have?

Thanks!

Petr
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux