Re: Possible BUG in IPv4 TCP window handling, all recent 2.4.x/2.6.x kernels

Hi David,

On Thu, 1 Sep 2005, David S. Miller wrote:

Thanks for the empty posting.  Please provide the content you
intended to post, and furthermore please post it to the network
developer mailing list, netdev@vger.kernel.org

First of all, thanks for the reply (even to an empty posting :).

The posting wasn't actually empty, it was probably too long (94K accordingto my sent-mail folder) and majordomo truncated it to zero. It has sometcpdump snippets, that's what made it so long... unfortunately, they'reall necessary to understand the nature of the bug. I wasn't sure aboutnetdev, that's why I posted it only to linux-kernel and linux-net.

I can provide the full tcpdump out-of-band to interested people, since Idon't think I can get it past majordomo.

Here is the text of the message without the tcpdump inserts:

---------------------------------------------------------------------------
Hello,

I've been tracking down this bug for some time, and I'm fairly convincedat this point that it's a kernel bug.

Under certain conditions, the TCP stack starts shrinking the TCP windowdown to some ridiculously low values (hundreds of bytes, as low as 181)and never recovers. The certain conditions I mentioned are not wellunderstood at this point, but they include a long-lived connection with avery one-sided, fluctuating traffic flowing through it.

So far I've been able to reproduce it on plain-vanilla 2.4.9, 2.4.11.9,and 2.4.12.2, as well as on the RHEL3 kernels 2.4.21-20 and 2.4.21-31. Thehardware is dual Opteron 250, running both 32- and 64-bit SMP kernels(seems to make no difference). I've also seen the bug occur on a singleAthlon XP running 2.6.11.9 UP.

The bug occurs with all sysctl settings at their default values. I'vetried enabling and disabling pretty much all the tcp-related sysctl's in/proc/sys/net/ipv4, to no visible improvement.

Here are a few tcpdump snippets of a TCP connection exhibiting the bug(the complete tcpdump is available upon request, but it's very large).10.2.20.246 is the data receiver and is the box exhibiting the bug (I'mnot sure what 10.2.224.182 is running, I don't have access to it). Thedata being sent through is real-time financial data; the session begins bycatching up (at line speed) to present time, then continues to receivereal-time data as it is being generated. For what it's worth, we've neverbeen seen the bug occur while the session is still catching up (andreceiving a few large packets at a time); it always seems to happen whilereceiving real-time data (many small packets, variably interspaced).

[I apologize for the amount of tcpdump data, but it's the only way to showthe bug in action.]

[tcpdump output removed]

The connection is established and the receiver's TCP window quickly rampsup to 8192.

[tcpdump output removed]

Shortly thereafter the TCP window increases further to 16534. It remainsaround 16534 for the next 5 minutes or so.

[tcpdump output removed]

A few minutes later it has finally caught up to present time and it startsreceiving smaller packets containing real-time data. The TCP window isstill 16534 at this point.

[tcpdump output removed]

This is where things start going bad. The window starts shrinking from15340 all the way down to 2355 over the course of 0.3 seconds. Notice themany duplicate acks that serve no purpose (there are no lost packets andthe tcpdump is taken on the receiver so there is no packets/acks crossedin flight).

[tcpdump output removed]

Five minutes later the TCP window is still at 2355, having neverrecovered. The window is so small that the available bandwidth for thisconnection is too small to keep up with the real-time data so it isfalling behind, hence large packets are again being used. The applicationprocessing the data (Java-based) is mostly idle at this point, and netstatshows its recv queue to be empty. There is no apparent reason why thekernel shouldn't enlarge the window.

In fact, if I let it continue, it eventually shrinks the window evenfurther (by 18:19:29, the time I'm writing this email, it's gone all theway down to 1373). As I mentioned earlier, I've seen it go as low as 181.

We are kind of stumped at this point, and it's proving to be ashow-stopping bug for our purposes, especially over WAN links that havehigher latency (for obvious reasons). Any kind of assistance would begreatly appreciated.

Thanks,
-Ion
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: Possible BUG in IPv4 TCP window handling, all recent 2.4.x/2.6.x kernels
  - From: John Heffner <jheffner@psc.edu>
- Re: Possible BUG in IPv4 TCP window handling, all recent 2.4.x/2.6.x kernels
  - From: Jesper Juhl <jesper.juhl@gmail.com>

References:
- Possible BUG in IPv4 TCP window handling, all recent 2.4.x/2.6.x kernels
  - From: Ion Badulescu <lists@limebrokerage.com>
- Re: Possible BUG in IPv4 TCP window handling, all recent 2.4.x/2.6.x kernels
  - From: "David S. Miller" <davem@davemloft.net>

Prev by Date: [PATCH 7/12] UML - Move libc-dependent startup and signal code
Next by Date: [PATCH 10/12] UML - Allow host capability usage to be disabled
Previous by thread: Re: Possible BUG in IPv4 TCP window handling, all recent 2.4.x/2.6.x kernels
Next by thread: Re: Possible BUG in IPv4 TCP window handling, all recent 2.4.x/2.6.x kernels
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind]