Trond Myklebust wrote:
On Tue, 2007-10-02 at 15:56 -0400, Bob Kryger wrote:
So, I have a relatively new system on which I am seeing strange NFS
behavior.
In short I am getting seemingly random errors in files written via NFS.
[snip details]
Anyone ever seen anything like this before?
Suggest where I might look next?
Additional tests?
Feel free to describe your test in a bit more detail. Without more
information, we obviously can't rule out the existence of an NFS bug,
I was trying to be thorough, I hope I succeeded.
Is there anything else that might be helpful? I certainly would not go
to a bug first, as I may very well have something misconfigured, but I
cannot seem to identify what that might be. I do have about 8 other
linux NFS servers in production on different hardware, SATA mostly,
where I am not seeing any issues. I don't think it's a hardware issue
though, as I cannot reproduce the problem without the use of NFS. (Hmm,
maybe if I NFS mount to the server itself. Would that prove anything?)
however usually whenever people describe this sort of problem it is
because they have failed to understand the NFS caching model as
described in
http://nfs.sourceforge.net/#faq_a8
Excellent, Thanks for the lead and I will test these items shortly.
After reading the FAQ, I'm not sure I see how the cache consistency
mechanisms apply to this problem. If I test the files after they are
closed shouldn't the data be consistent, written completely to the
server? If there were a data write error should I not see it somewhere?
If so where? client? server? would it be up to the client program to
catch it? I wonder if dd would see it. For the purpose of testing, I
have limited this server to serving to only a single client at a time,
so there will be no other variables/systems interfering.
So to test this I read back the data of a newly written, 256M file,
right from the client that wrote it. In this case with nocto option.
This should take the client cache into account. I compared the results
from the server side as well. It had errors, the same errors in the same
locations on both the client and the server. So, this seems to indicate
that it is the issue is on the nfs client not the server. (hmmm) But the
same client does not have a problem with any other server. At least one
has never been reported. I'll verify that rigorously.
I am not familiar with the mechanism that NFS uses to verify data
validity between the client and the server. I assume that there is some
sort of checksum. Did I mention that this is NFSv3? At least I have not
specified v4.
So please include a reproducible test for us.
Easily reproducible on this system. Short of providing access to this
system, not sure what more to do. Oh, wait, was that humor? Indicating
that I have provided significant detail? Dang, I've got to sharpen my
international tongue-in-cheek detector.
Cheers
Trond
Cool name
thanks
Bob