I have a Fedora server with kernel 2.4.22-1-2163 SMP mounting a remote solaris server (hence choice of options):
rsize=32768,ro,hard,intr,tcp,nfsvers=3
When the remote is down or disconnected, a "df" hangs (as expected),
but I can't kill it, even as root or with kill -9. The docs for mount indicate that the INTR option should allow for killing apps mounted with HARD.
I also coded a test program that calls statvfs(2) and it hangs in the on the statvfs(2) call when run against a down NFS server. It too can't be interrupted or killed.
My questions are:
1) Is there a safe and reliable means to check for a down NFS server (e.g., is showmount -e <server> safe enough -- it is interruptable hence one could wrap this with a timer and it you timeout, the server would be down)?
2) Is the non-interruptable operation (even with INTR option) a bug or feature?
3) Is there a simple kernel call, /proc entry, or similar that can be used for this purpose?
4) Is there a perl module to accomplish this?
This would be very useful for network monitoring, e.g., when the
server goes down and stays down for >1 minute, generate an SNMP
trap and write to a log file. It would be good if you can't put an SNMP
agent on the server, but only on the client. It is also useful for writing
a highly reliable client application.
As I have no control over the remote system, when it went down,
I had to do a hard reboot of my Linux box to stop the hung apps. This
is a Windows solution, not a Linux solution
Note, I found this when writing some scripts for MRTG to check the disk utilization of partitions. My df's hung so I didn't even get the proper values for my local partitions. After a few days, I had LOTS of hung MRTG apps.
Thanks -- Wade Hampton