John Austin wrote, On Tue, 24 Nov 2009 12:21:58 +0000:
On Mon, 2009-11-23 at 15:00 -0800, Rick Stevens wrote:
On 11/21/2009 10:41 AM, John Austin wrote:
On Sat, 2009-11-21 at 11:11 -0700, Greg Woods wrote:
On Sat, 2009-11-21 at 10:09 +0000, John Austin wrote:

When copying a large file (2.7GB) from the server to the
F12 m/c a complete freeze of the F12 machine occurs.
I haven't seen freezes, but I have seen corruption when trying to copy
large files (e.g. like a DVD iso image) via NFS. In fact, this happened
to me when I was trying to install an F12 virtual machine on my F11 box
(so I could try it out before deciding whether or not to bite the bullet
and upgrade the host OS). I copied over the DVD iso image, then tried to
install a VM from it, and it failed the media test. Sure enough, it also
failed the sha256sum test. Copying the same DVD iso file via scp instead
worked fine. I do not trust NFS for large files.


Hi Greg

That's interesting and very worrying - surely it can't/shouldn't happen!

I have been using NFS for years for all types/sizes of files and
never had a problem until the last couple of months.

1.  The Centos/RHEL 5.3/5.4 kernel had a serious bug that has been fixed with the
	latest kernel update

2.  Now this F12 problem

Surely a very large worldwide community uses NFS ?

OK the F12 case could be my finger trouble or even a hardware problem

I will install F12 on a second machine and test again (against the same server)
Can you verify that you run into the same issue if you run NFS over TCP
as opposed to NFS over UDP (it's an option in the mount command on the
client, use either "proto=tcp" or "proto=udp").

By default, the system queries the server and selects a protocol based
on what's being asked of it.  See the "TRANSPORT METHODS" section of
"man nfs".
Hi Rick

Many thanks for the reply - you have found a work-around !!

Just tested my machine with UDP and TCP
This was using md5sum for about 10GB over the NFS mount

1. The default for F12/Centos5.4 appears to be TCP - which freezes
2. Forcing UDP gives NO errors for 10GB transfer
3. Forcing TCP gives a freeze

Having briefly read the man pages this is the opposite of what I would
expect and of what you suggest !!

There must be a timing problem somewhere -
Please see the other thread "Sky2 NIC Problem? - Was F12 NFS Failures"
for other tests I have carried out



what are your other mount options?
having seen the "Sky2 NIC Problem" message, your card/driver may be having issues, but some nfs options may help/hurt.
I am assuming that you only have 'hard' and not 'hard,intr' as options to the mount.
And for transferring large files over NFS, I have had experiences that say stay away from 'soft' NFS.

it is interesting that TCP nfs locks the machine and fails to copy the very large file, while UDP succeeds in copying the same file with the same device/drver. BTW when you say that UDP gave no errors, do you mean that from the user program perspective (cp, and then sha256sum) there were no errors, or that from both the user and syslog perspective there were no errors? I am wondering if you have found a place where the UDP code deals with a bad packet correctly and the TCP version has not seen enough (bad environment) testing. Wouldn't happen to have a serial cable around so you can capture where the kernel goes bonkers at would you? (note, never done the serial console myself.)
