Re: Why must NFS access metadata in synchronous mode?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Jun 01, 2006  12:27 -0400, Xin Zhao wrote:
> > Question 3: Cache consistency requirements are _much_ more stringent
> > for asynchronous operation.
> I agree. But I am not sure how local file system like Ext3 handle this
> problem. I don't think Ext3 must synchronously write metadata (I will
> double check the ext3 code). If I remember correctly, when change
> metadata, Ext3 just change it in memory and mark this page to be
> dirty. The page will be flushed to disk afterward. If the server
> exports an Ext3 code, it should be able to do the same thing. When a
> client requests to change metadata, server writes to the mmaped
> metadata page and then return to client instead of having to sync the
> change to disk. With this mechanism, at least the client does not have
> to wait for the disk flush time. Does it make sense? To prevent
> interleave change on metadata before it is flushed to disk, the server
> can even mark the metadata page to be read-only before it is flushed
> to disk.

The difference between local filesystems and remote filesystems is that
if asynchronous operations on a local filesystem are lost due to node
failure, the application is also generally failed at the same time,
so when the node restarts the application it gets the old state from disk.
If applications care about on-disk consistency (say because the application
is itself sharing state with a remote system, like sendmail) then it will
fsync the file(s) before updating the remote state.

In the NFS case, a remote client keeps the "new" state, which is inconsistent
with what the server has on disk (if server is running asynchronously) so
there is no way to reconcile this.  In the case of Lustre the clients are
stateful and keep a record of all operations they do (until the server later
confirms that it is safe on disk).  In case of server failure the clients
replay uncommitted operations to the server after a failure.  This allows
the server filesystem to run asynchronously.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux