On Wed, Jan 27, 2010 at 09:01:53AM +0200, Gilboa Davara wrote: > On Tue, 2010-01-26 at 19:07 -0500, Kelvin Ku wrote: > > We recently purchased our first Nehalem-based system with a single Xeon E5530 > > CPU. We were unable to boot FC6 on it and are trying to upgrade our network to > > F11/F12 anyway, so we installed F11 on it. > > > > Our existing hardware includes Xeon 5100- and 5400-series CPUs running mainly > > FC6 (2.6.22), except for a single Xeon 5150 system running F11. Our target > > application consumes multicast data during business hours and has been dropping > > packets more frequently on the new hardware/OS combination than on our older > > systems. I've tried using the on-board Intel 82574L dual-port NIC (e1000e > > driver) and a discrete Intel 82576 dual-port NIC (igb driver). Counters for the > > NIC, socket layer, and switch don't show any dropped packets. > > > > My question is this: has anyone experienced performance degradation running a > > UDP-consuming application after moving to a Nehalem-based system? We have yet > > to identify whether the culprit is the hardware, the OS, or the combination of > > the two. However, note that our app works fine on the 5150 system running F11 > > that I mentioned above. > > > > Likewise, if you've migrated such an app to a Nehalem system and had to make > > adjustments to get it to work as before, I'd like to hear from you too. > > > > Thanks, > > Kelvin Ku > > Please post the output of: > $ cat /proc/interrupts | grep eth We rename our interfaces to lan: $ grep lan /proc/interrupts 61: 1 0 0 0 PCI-MSI-edge lan0 62: 7194004 0 0 0 PCI-MSI-edge lan0-TxRx-0 63: 0 1 0 0 PCI-MSI-edge lan1 64: 0 0 49842410 0 PCI-MSI-edge lan1-TxRx-0 $ pgrep irqbalance $ Note that irqbalance is disabled. I found that it wasn't balancing IRQs like on our older machines. I note that the irqbalance docs say that NIC interrupts should not be balanced, which is what we're seeing whether irqbalance is running or not. > $ ethtool -S ethX lan0 (LAN interface): NIC statistics: rx_packets: 7429553 tx_packets: 85327 rx_bytes: 9752917197 tx_bytes: 66766666 rx_broadcast: 7386732 tx_broadcast: 8610 rx_multicast: 0 tx_multicast: 42 rx_errors: 0 tx_errors: 0 tx_dropped: 0 multicast: 0 collisions: 0 rx_length_errors: 0 rx_over_errors: 0 rx_crc_errors: 0 rx_frame_errors: 0 rx_no_buffer_count: 0 rx_missed_errors: 0 tx_aborted_errors: 0 tx_carrier_errors: 0 tx_fifo_errors: 0 tx_heartbeat_errors: 0 tx_window_errors: 0 tx_abort_late_coll: 0 tx_deferred_ok: 0 tx_single_coll_ok: 0 tx_multi_coll_ok: 0 tx_timeout_count: 0 tx_restart_queue: 0 rx_long_length_errors: 0 rx_short_length_errors: 0 rx_align_errors: 0 tx_tcp_seg_good: 6893 tx_tcp_seg_failed: 0 rx_flow_control_xon: 0 rx_flow_control_xoff: 0 tx_flow_control_xon: 0 tx_flow_control_xoff: 0 rx_long_byte_count: 9752917197 rx_csum_offload_good: 7429553 rx_csum_offload_errors: 0 tx_dma_out_of_sync: 0 alloc_rx_buff_failed: 1487 tx_smbus: 0 rx_smbus: 0 dropped_smbus: 0 tx_queue_0_packets: 85327 tx_queue_0_bytes: 65978674 rx_queue_0_packets: 7429553 rx_queue_0_bytes: 9693480773 lan1 (multicast interface) is below. Note that rx_missed_errors is non-zero. I previously encountered this with the e1000e NIC after disabling cpuspeed, which was throttling the CPUs to 1.6 GHz (from a maximum of 2.4 GHz). I attempted to remedy this by setting InterruptThrottleRate=0,0 in the e1000e driver, after which we had one full day of testing with zero rx_missed_errors, but the application still reported packet loss. Today is the first day of testing with the igb NIC since I disabled cpuspeed. The igb driver is also running with InterruptThrottleRate=0,0. NIC statistics: rx_packets: 54874782 tx_packets: 161 rx_bytes: 35581821239 tx_bytes: 18479 rx_broadcast: 10 tx_broadcast: 25 rx_multicast: 54874635 tx_multicast: 16 rx_errors: 0 tx_errors: 0 tx_dropped: 0 multicast: 54874635 collisions: 0 rx_length_errors: 0 rx_over_errors: 0 rx_crc_errors: 0 rx_frame_errors: 0 rx_no_buffer_count: 1 rx_missed_errors: 22192 tx_aborted_errors: 0 tx_carrier_errors: 0 tx_fifo_errors: 0 tx_heartbeat_errors: 0 tx_window_errors: 0 tx_abort_late_coll: 0 tx_deferred_ok: 0 tx_single_coll_ok: 0 tx_multi_coll_ok: 0 tx_timeout_count: 0 tx_restart_queue: 0 rx_long_length_errors: 0 rx_short_length_errors: 0 rx_align_errors: 0 tx_tcp_seg_good: 0 tx_tcp_seg_failed: 0 rx_flow_control_xon: 0 rx_flow_control_xoff: 0 tx_flow_control_xon: 0 tx_flow_control_xoff: 0 rx_long_byte_count: 35581821239 rx_csum_offload_good: 54874782 rx_csum_offload_errors: 0 tx_dma_out_of_sync: 0 alloc_rx_buff_failed: 9598 tx_smbus: 0 rx_smbus: 0 dropped_smbus: 0 tx_queue_0_packets: 161 tx_queue_0_bytes: 17013 rx_queue_0_packets: 54874783 rx_queue_0_bytes: 35362322772 > > Which board are you using? Supermicro X8DTL-iF > Have you enabled hyper-threading? It is currently disabled. > Have you disabled IO vt-d? This is disabled by default in the BIOS. I'll double-check the setting later today. > In which slot did you installed the igb card? The slot is PCIe x16. The NIC itself is x4. > Have you tried enabling pci=msi in your kernel's command line? No. Do I need to do this? MSI seems to be enabled: $ dmesg | grep -i msi pcieport-driver 0000:00:01.0: irq 48 for MSI/MSI-X pcieport-driver 0000:00:03.0: irq 49 for MSI/MSI-X pcieport-driver 0000:00:07.0: irq 50 for MSI/MSI-X pcieport-driver 0000:00:09.0: irq 51 for MSI/MSI-X pcieport-driver 0000:00:1c.0: irq 52 for MSI/MSI-X pcieport-driver 0000:00:1c.4: irq 53 for MSI/MSI-X pcieport-driver 0000:00:1c.5: irq 54 for MSI/MSI-X e1000e 0000:06:00.0: irq 55 for MSI/MSI-X e1000e 0000:06:00.0: irq 56 for MSI/MSI-X e1000e 0000:06:00.0: irq 57 for MSI/MSI-X e1000e 0000:07:00.0: irq 58 for MSI/MSI-X e1000e 0000:07:00.0: irq 59 for MSI/MSI-X e1000e 0000:07:00.0: irq 60 for MSI/MSI-X igb 0000:03:00.0: irq 61 for MSI/MSI-X igb 0000:03:00.0: irq 62 for MSI/MSI-X igb: eth2: igb_probe: Using MSI-X interrupts. 1 rx queue(s), 1 tx queue(s) igb 0000:03:00.1: irq 63 for MSI/MSI-X igb 0000:03:00.1: irq 64 for MSI/MSI-X igb: eth3: igb_probe: Using MSI-X interrupts. 1 rx queue(s), 1 tx queue(s) > > Per your question, at least when dealing with packets from within the > kernel, a Nehalem box is fully capable of handling >20Gbps (depending on > the packet size) - so I doubt that this is a hardware issue. Agreed. I ran a local netperf test and was seeing about 8 Gbps of throughput on a single core, so this should be adequate for 1 Gbps traffic. > > - Gilboa > Thanks, Kelvin -- users mailing list users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe or change subscription options: https://admin.fedoraproject.org/mailman/listinfo/users Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines