Strange network related data corruption

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,
I am encountering some strange data corruption when transferring
data from one of my PCs that I use as a file-server.

on the server:
FILE=<large file>; | cut -d" " -f1 | nc -lp5000 -q0; while nc
-lp5000 -q0 < $FILE; do : ; done

on the client:
H=<server>; SUM=$(nc -q0 $H 5000);sleep 1s; while nc -q0 $H 5000 |
sha1sum | (grep -v $SUM || echo -n .); do sleep 1s ;done

(output looks somewhat like this:
..............6dd5fb1ce29d270acdfbb02d00921bf75d141773  -
...
)

I would expect the sha1sum to be the same in every pass (assuming the
source file does not change). But every few passes (with no apparent
pattern) there is a different sum returned. I first noticed this when
transferring large files (backups) with with SMB and NFS(v3 and v4) but
to rule that out I tried netcat in the way noted above.

When I have the server do the sha1sum of the file locally the problem
is not reproducible. When I do this with a small file that easily fits
into the cache the problem stays reproducible.

Another thing I did was to use dd to transfer data in 1GiB chunks from
/dev/zero and generate the sha1sum on the client. There I was not able
to reproduce the problem.


The server is a Athlon64 3400+ (good old Clawhammer) with 1GiB RAM. I
use 4 SATA drives in a software RAID5 configuration, attached to a
Promise TX4 300 SATA-II controller. The filesystem is ext3 without
special mount-options. The dist is Debian/Sid for AMD64 with
self-compiled kernel 2.6.23-rc9 (.config attached).

The clients I tried are a Core2Duo 6600 with 3GiB of RAM, also
Debian/Sid AMD64 (kernel 2.6.23-rc9) and a Centrino notebook with
Pentium M and 1GiB of RAM (Debian/Sid i386, kernel 2.6.23-rc7).

All PCs mentioned have gigabit ethernet and are connected via a gigabit
switch.

I tried these tests between the clients and could not reproduce the
problem there.

I had the server run memtest68+ with 20 passes without problems.

I tried several kernel versions on the server (from .18 to .23-rc9), all
showed the problem. I suspect a hardware problem, but I cannot isolate
the part responsible. I tried another ethernet adapter (the 3com905cin
lspci output) and I also tried the onboard sata controller(s) (2 ports
via and 2 ports promise tx2).

I don't know if this is a kernel problem or just my and my setup, but
maybe some one on this list has an idea wher I could look next.

Thanks and regards
Malte
-- 
---------------------------------------
Malte Schröder
[email protected]
ICQ# 68121508
---------------------------------------

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux