Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

Andrew Morton wrote:

On Sat, 17 Jun 2006 16:23:34 -0700
Harry Edmon <[email protected]> wrote:
Andrew Morton wrote:
On Fri, 16 Jun 2006 09:01:23 -0700
Harry Edmon <[email protected]> wrote:
I have a system with a strange network performance degradation from2.6.11.12 to most recent kernels including 2.6.16.20 and 2.6.17-rc6.The system is has Dual single core Xeons with hyperthreading on. Theapplication is the LDM system from UCAR/Unidata(http://www.unidata.ucar.edu/software/ldm). This system requestsweather data from a variety of systems using RPC calls over a reservedTCP port (388), puts them into a memory mapped queue file, and thensends the data out to a variety of downstream requesting systems, againusing RPC calls. When the load is heavy, the 2.6.16.20 kernel falls waybehind with the data ingestion. The 2.6.11.12 kernel does not. I havetried an experiment with a 2.6.17-rc6 system where it just does theingestion, and not the downstream distribution, and it is able to keepup. I would really appreciate any pointers as to where the problem maybe and how to diagnose it. I have attached the config files from bothkernels and the sysctl.conf file I am using. I have also included theoutput from "netstat -s" on the 2.6.16.20 system during a time when itwas having problems.
(added netdev)

A quick grep indicates that it isn't using TCP_NODELAY - we've had problems
with that in the past.

Perhaps a tcpdump of the net traffic will help to determine what's going on.
[ edit, edit - please don't top-post ]
I assume you are talking about using TCP_NODELAY as a socket option within theLDM software. I could give that a try.
The use of TCP_NODELAY caused problems with the JVM debugger.  I'm not
suggesting that enabling it will fix anything here.
There is a lot of traffic on this node, on the order of 2000 packets in and outper second, so the tcpdump output will grow pretty fast. How long a tcpdumpwould be useful, and what options would you suggest?
I don't know, frankly - first one needs to develop some sort of theory,
then use the diagnostic tools to prove or disprove that theory.  And I
don't have a theory.

I guess a simple one-second bare `tcpdump -i eth0' would be a starting
point.  Perhaps compare the output of that with the output from a
correctly-operating kernel, see if anything suggests itself.  That might
also give us something which the networking developers can use.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Does this fix it?
   # sysctl -w net.ipv4.tcp_abc=0

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
  - From: Harry Edmon <[email protected]>
- Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
  - From: Harry Edmon <[email protected]>

References:
- Network performance degradation from 2.6.11.12 to 2.6.16.20
  - From: Harry Edmon <[email protected]>
- Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
  - From: Andrew Morton <[email protected]>
- Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
  - From: Harry Edmon <[email protected]>
- Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
  - From: Andrew Morton <[email protected]>

Prev by Date: Re: sound skips on 2.6.16.17
Next by Date: Re: Linux v2.6.17
Previous by thread: Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
Next by thread: Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]