Fedora Users — Re: file-copy corruption

Motor wrote:
> On Wed, 28 Jun 2006 13:40:15 +0100, T. Horsnell wrote:
> 
>> I'm in the process of moving stuff from our Alpha fileserver onto A
>> linux replacement. I've been using gnu-tar to copy filesystems from the
>> Alpha to to the Linux NFS-exported disks over a 1Gbit LAN, followed by
>> diff -r to check that they have copied correctly (I wish diff had an
>> option to not follow symlinks..). I've so far transferred about 3 TiB of
>> data (spread over several weeks) and am concerned that during this
>> process, 3 files were mis-copied without any apparent hardware-errors
>> being flagged. There was nothing unusual about these files, and
>> re-copying them (with cp) fixed the problem.
>>
>> Are occasional undetected errors like this to be expected? I thought
>> there were sufficient stages of checksumming/parity (both boxes have ECC
>> memory) etc to render the probability of this to be vanishingly small.
> 
> I'd still consider running a good RAM test on both boxes.

Good advice.

It is also useful to not delete the corrupted file; a comparison with
the good file will reveal the type of error.
For example, if one bit is wrong, the culprit can be the RAM or
the CPU, if you have 4096 bytes set to zero it looks like a kernel
or filesystem bug...
When multiple files are corrupted by 1 bit, it can be interesting
to know if it is always the same bit position.

Best regards.
-- 
   Roberto Ragusa    mail at robertoragusa.it