Re: [RFC] Union Mount: Readdir approaches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[email protected] wrote:
> Hello Bharata,
> I am developing a linux stackable/unification filesystem too.
>
> Bharata B Rao:
>   
>> Questions
>> ---------
>>     
> 	:::
>   
>> First of all, should we even expect a sane lseek(2) on union mounted
>> directories ? If not, will the Approach 2, which works uniformly for
>> all filesystem types be acceptable ?
>>
>> If lseek(2) needs to be supported, then how do we define the seek behaviour
>> when two different types of directories are union'ed ? For eg. how do we define
>> lseek(2) on a union of ext2 directory on top of a nfs directory ? Since both
>> of them use different encoding methods for filp->f_pos, how do we establish
>> a common lseek(2) behaviour here ?
>>
>> And finally, what is the use case for directory seek ? Would anybody walk
>> directory by directory by seeking into a directory file ?
>>     
>
> Although I don't remember exactly, NFS or smbfs seek for a
> directory. Additionally, any user process can call seekdir or
> something. So I believe any stackable/unification filesystem should
> support it.
>
> Here is my approach. While I don't think it is the best approach since
> it consumes much memory and cpu, I hope it help you. (or assist
> you? Sorry, I don't know correct English word)
>
> - the stackable fs has its own inode, file, dentry object, which has an
>   array for the underlying inode pointers. and the whiteout is a regular
>   file with a special name, instead of a flag in inode. this is the most
>   different architecture from your unification embeded in VFS.
> - the vritual dir inode object has a cache for its child entries. it is
>   called vdir. the cache has a version and a customizlable lifetime too.
> - all the existing underlying (same-named) dir are opened too. the file
>   objects are stored in the virtual file object as an array.
> - the virtual file object has a cache for its child entries too. it is a
>   copy of the one in the inode object.
>
> When the first readdir is issued:
> - call vfs_readdir for every underlying opened dir (file) object.
> - store every entry to either the hash table for the result or the
>   whiteout, when the same-named entry didn't exist in the tables.
> - to improvement the performance, the allocated memory for the hash
>   tables are managed in a pointer array. and the elements are
>   concatinated logically by the pointer.
> - the pointer for the result-table, the version, and the currect jiffies
>   are set to vdir, which is a cache in an inode.
> - all cache are copied to a member in a file object.
> - the index of the cache memory block and the offset in an array is
>   handled as the seek position.
>
> In the case of the application issued this sequence:
> - opendir()
> - readdir()
> - creat or unlink an entry under the dir
> - readdir()
>
> When an entry under the dir was removed or added, the inode version will
> be updated. Since readdir can compare it with the cached version or the
> lifetime (jiffies) in the file object, it can refresh the entries. But
> in this case, it doesn't, since the file position is not 0. If the
> application needs the latest entries, it has to call rewinddir.
> The cache in the file object will updated only the case of obsoleted AND
> the file position is 0.
>
> When a dir who has already its vdir is opened, the cache in the inode
> object will be used without calling vfs_readdir, after checking the
> version and the lifetime which are stored in the inode object. If it is
> obsoleted, vfs_readdir will be called again in order to update the cache
> in the inode.
>
>   
This sounds like a good approach. How does aufs handle low memory
situations? Union mounts seem to be quite common on low memory embedded
systems. Is there a way for the VM to signal to aufs/the union
filesystem to trim its cache? Also on the memory consumption front I
guess you could get the union fs to refer to a singleton name entry
directly instead of creating a new virtual inode et al. This may lead to
some unusualness though for mounts over different filesystems that have
different length directory files (eg vfat and ext3). This does run
counter to the model described above in some ways.

Matt

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux