Folks,
I continued my attempts to diagnose the source of my network stalls. here is what I found out:
- The problem only occurs when downloading large files over a T1. The problem does not occur when transferring files between two computers on the same 100BaseT switched subnet. I successfully copied a 2.5GB file between the Fedora Core 3 (FC3) and another linux box.
We have a point-to-point T1 that connects our office to our co-location facility where our production servers and route to the Internet exists. The problem occurs when downloading large files (+20MB files) over this T1. It happens randomly, but I usually get the first 10MB before the network stalls.
Using Ethereal, I discovered a problem where I think the FC3 system is not recovering from receiving a bad TCP packet. When the FC3 system receives a bad TCP packet, it sends s response to the remote server requesting it to resend the packet. The remote server does, but then the FC3 system sends the request again for the same packet. It does this several times and then gives up. I don't think the FC3 OS is processing the resent packet properly so it retries several before eventually giving up on with the download. The lower level OS return doesn't fail, it just stops responding, which is why from scp level it looks like a stall.
Our RedHat Enterprise 3 WS systems are handling this problem correctly, i.e., when they receive a bad packet, they send the response requesting a resend. They receive the retransmitted packet and then contine the download.
Tonight, on the computer that was having the problem, I replaced the Fedora Core 3 OS with RedHat Enterprise 3 (RHE3) OS. The problem did not exist with RHE3. I was able to download 10 - 40MB files over the T1 without a problem. So I think the problem is with FC3.
Does anyone have any idea why the FC3 OS might be having this problem and what I can do or who to report the problem too?
Thanks, Keith
Shawn Iverson wrote:
On Wednesday, April 06, 2005 6:25 PM Keith Fetterman wrote:
Folks,
I am encountering a random problem where network (100BaseT ethernet) connections stall when downloading files. I first noticed a problem when updating my system using up2date. I discovered the update process would hang.
I am now seeing the network stalling when I am downloading a large file, roughly 41MB, using scp. The point at which it stalls is random and it doesn't always stall (but it usually does.) When it stalls, it usually stalls after 20-30MB have been transferred.
Are you going through a SOHO router? If you are, temporarily bypass your router and connect directly to your service provider (after turning on the firewall, of course) and then check for stalling. This will tell you if your router is the source of the problem. I have similar symptoms on the BEFSR11. The only workaround I have found is to connect directly to the router with a patch cable and force the speed to 10Mbit with mii-tool, thereby forcing the Linksys also to 10Mbit.
--
Shawn