On Fri, Jan 29, 2010 at 12:22:05PM +0200, Gilboa Davara wrote: > > which > > was throttling the CPUs to 1.6 GHz (from a maximum of 2.4 GHz). I attempted to > > remedy this by setting InterruptThrottleRate=0,0 in the e1000e driver, after > > which we had one full day of testing with zero rx_missed_errors, but the > > application still reported packet loss. > > rx_missed_error usually get triggered when the kernel is slow to handle > incoming hardware interrupts. > There's a trade-off here, increase the interrupt rate and you'll > increase the kernel CPU usage as the expense of lower latency - decrease > the interrupt rate, and you'll reduce the CPU usage at the expense of a > higher chance of hitting the RX queue limit. > I'd suggest you try setting the InterruptThrottleRate to 1000, while > increasing the RX queues to 4096. > (sbin/ethtool -G DEVICE rx 4096) > > You could try enabling multi-queue by adding IntterruptType=2, > RSS=NUM_OF_QUEUE and MQ=1 to your modprobe.conf.d. I'll try these suggestions later today. Note that I was able to disable interrupt throttling on the on-board 82574L NICs without seeing any rx_missed_errors. > > Can you post the output of $ mpstat -P 1 ALL during peak load? > We run "mpstat -P 5 ALL" continuously; is this sufficient resolution? I've attached the mpstat output from the 09:30-10:30 yesterday, which is one of the busiest hours of the day for multicast traffic. Also, here is the top of the output from powertop. Are you running with C-STATE enabled? It is somewhat troubling that more than half of the time is spent in the most power-saving state (C3), but I think this is averaged across all CPUs. PowerTOP version 1.11 (C) 2007 Intel Corporation Cn Avg residency P-states (frequencies) C0 (cpu running) (15.2%) polling 5.5ms ( 4.1%) C1 halt 0.2ms (23.0%) C2 mwait 0.2ms ( 4.6%) C3 mwait 0.4ms (53.1%) Wakeups-from-idle per second : 2833.7 interval: 10.0s no ACPI power usage estimate available Top causes for wakeups: 47.7% (8416.6) <interrupt> : lan1-TxRx-0 25.5% (4498.9) <kernel IPI> : Rescheduling interrupts 13.2% (2324.3) <kernel core> : hrtimer_start_range_ns (tick_sched_timer) 5.7% (1000.9) kipmi0 : __mod_timer (process_timeout) 4.1% (721.9) <interrupt> : lan0-TxRx-0 2.3% (413.0) <interrupt> : extra timer interrupt 0.6% ( 99.8) <kernel module> : __mod_timer (smi_timeout) 0.5% ( 93.1) <interrupt> : ata_piix, ata_piix, uhci_hcd:usb5, uhci_hcd: 0.1% ( 17.2) <kernel core> : __mod_timer (neigh_periodic_timer) 0.1% ( 11.1) <kernel core> : hrtimer_start (tick_sched_timer) 0.1% ( 10.4) vconfig : __mod_timer (garp_join_timer) 0.1% ( 10.0) <kernel module> : __mod_timer (ipmi_timeout) ... Thanks, Kelvin
Attachment:
testhost.mpstat.20100128.bz2
Description: BZip2 compressed data
-- users mailing list users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe or change subscription options: https://admin.fedoraproject.org/mailman/listinfo/users Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines