Cameron Simpson wrote:
On 26Mar2008 15:45, Bob Kinney <bc98kinney@xxxxxxxxx> wrote:
| --- Ian Chapman <packages@xxxxxxxxxxxxxxxxxx> wrote:
| > Neal Becker wrote:
| > > I used unix/linux for many years. In the past we've used nfs. But nfsv3
| > > has no (useful) authentication. Anyone can setup a rogue machine and
| > > pretend to be any uid/gid.
| >
| > What I'd like to see is a way to forcibly unmount broken hard NFS
| > mounts. umount -f seems to do squat.
|
| I thought that hard NFS mounts were a thing of the past--like the mid '90s.
Not if you want reliable batch behaviour in the face of NFS server
downtime. My previous workplace routinely ran jobs that took weeks.
With a hard mount the job just stalls until the server's back, then
continues. Which means you can do maintenance that requires downtime.
"hard,intr" is the common flag pair, allowing you to at least interrupt
a stalled IO to a down server, getting your job back.
Those were the options I used too, and the above were the reasons.
| Isn't it preferred to set them up with an automounter to prevent panic
| when communication falters?
| I've looked into it a little bit, and it seems like it can be done, but for
| the frequency that I use NFS, I took the quick-and-dirty route.
Autofs isn't enough. If you run it with a smallish idle timeout (to
umount when a remote fs is unused long enough) it reduces your exposure
to down servers, particularly handy when you want to reboot a client,
but also handy for those processes that walk the mount table to find
stuff out - avoiding a stall on a down mount mountpoint.
I actually increased the timeouts quite high, otherwise a several jobs starting
at the same time caused mount storms (you need a lot of machines trying to mount
at the same time to get this) which results in some mounts timing out, and
results in jobs failing to start.
However, it only reduces the problem. There's no magic in autofs, and a
stalled mount point is still a stalled mount point. And of course autofs
introduces its own collection of issues (mostly rare and minor).
If it is mounted (through any method) and the server goes down it is trouble.
I don't have very much trouble at all with NFS, most of the people I have helped
that have had issues read about some option that was suggested to use several
years ago and went to using it on a current NFS server with less than good
results.
I know people that decided that the "soft" option was a good choice, and used it
and bitched about NFS being horrible until we told them never ever use the soft
option as this will result in the application aborting when a "hard" mount would
get a annoying timeout warning in the messages file and eventually go on when
things were fixed.
The -l option has some bad behavior if you use them under certain cases it is
best to be avoided in any environment that reliability is important, it is
better to kill all of the jobs accessing the filesystem and do a proper unmount
(I believe hung NFS will unmount with hard,intr once all of the jobs accessing
it have been killed and given time to receive the signal and die), or take the
easy way out and reboot. Have the mount removed from the table (-l) and having
processes live and still accessing that unmounted NFS filesystem *WILL* result
in funny things happening when the NFS server comes back up-those applications
in the background *WILL* continue even though the filesystem is not showing in
the mount table on the given client machine.
Roger