Enclosed 8 patches that implement shared subtree functionality as
detailed in Al Viro's RFC found at http://lwn.net/Articles/119232/
I have incorporated all the comments received earlier in first round. Thanks
to Miklos and Pekka for the valuable comments. Also I have optimized lots of
code, especially in pnode.c . Code is unit tested. However the code in its
current form does not handle ENOMEM error gracefully. I am working on it.
The incremental patches provide the following functionality:
1) shared_private_slave.patch : Provides the ability to mark a subtree as
shared or private or slave.
2) unclone.patch : provides the ability to mark a subtree as unclonable. NOTE:
this feature is an addition to Al Viro's RFC, to solve the vfsmount explosion.
The problem is detailed here:
http://www.ussg.iu.edu/hypermail/linux/kernel/0502.0/0468.html
3) rbind.patch : this patch adds the ability to propagate binds/rbinds across
vfsmounts.
4) move.patch : this patch provides the ability to move a
shared/private/slave/unclonable subtree to some other mount-point. It also
provides the same feature to pivot_root()
5) umount.patch: this patch provides the ability to propagate unmounts.
6) namespace.patch: this patch provides ability to clone a namespace, with
propagation set to vfsmounts in the new namespace.
7) automount.patch: this patch provides the automatic propagation for
mounts/unmounts done through automounter.
8) pnode_opt.patch: this patch optimizes the redundant code in pnode.c .
Looking forward for comments,
RP
-----------------------------------------------------------------------
CHANGES DONE IN RESPONSE TO COMMENTS RECEIVED IN 1ST ROUND,
response to Pekka J Enberg Comments:
>>Inlining the patches to email would be greatly appreciated. Here are
>>some comments.
done
> +int
> +_do_make_mounted(struct nameidata *nd, struct vfsmount **mnt)
>>Use two underscores to follow naming conventions.
Yes done. In fact this function is renamed as make_mounted because
that seemed to make more sense. But in general throughout the patches
I have changed all newly introduced function that start with one
underscore to two underscores.
> Index: 2.6.12/fs/pnode.c
> ===================================================================
> --- /dev/null
> +++ 2.6.12/fs/pnode.c
> @@ -0,0 +1,362 @@
> +
> +#define PNODE_MEMBER_VFS 0x01
> +#define PNODE_SLAVE_VFS 0x02
>>Enums, please.
done
> +
> +static kmem_cache_t * pnode_cachep;
> +
> +/* spinlock for pnode related operations */
> + __cacheline_aligned_in_smp DEFINE_SPINLOCK(vfspnode_lock);
> +
> +
> +static void
> +pnode_init_fn(void *data, kmem_cache_t *cachep, unsigned long flags)
> +{
> + struct vfspnode *pnode = (struct vfspnode *)data;
>>Redundant cast.
yes. removed.
> + INIT_LIST_HEAD(&pnode->pnode_vfs);
> + INIT_LIST_HEAD(&pnode->pnode_slavevfs);
> + INIT_LIST_HEAD(&pnode->pnode_slavepnode);
> + INIT_LIST_HEAD(&pnode->pnode_peer_slave);
> + pnode->pnode_master = NULL;
> + pnode->pnode_flags = 0;
> + atomic_set(&pnode->pnode_count,0);
> +}
> +
> +void __init
> +pnode_init(unsigned long mempages)
> +{
> + pnode_cachep = kmem_cache_create("pnode_cache",
> + sizeof(struct vfspnode), 0,
> + SLAB_HWCACHE_ALIGN|SLAB_PANIC, pnode_init_fn, NULL);
> +}
> +
> +
> +struct vfspnode *
> +pnode_alloc(void)
> +{
> + struct vfspnode *pnode = (struct vfspnode *)kmem_cache_alloc(
> + pnode_cachep, GFP_KERNEL);
>>Redundant cast.
yes removed.
> +struct inoutdata {
>>Wants a better name.
This datastructure is gone after optimizing pnode.c
> + void *my_data; /* produced and consumed by me */
> + void *in_data; /* produced by master, consumed by slave */
> + void *out_data; /* produced by slave, comsume by master */
> +};
> +
> +struct pcontext {
> + struct vfspnode *start;
> + int flag;
> + int traversal;
> + int level;
> + struct vfspnode *master_pnode;
> + struct vfspnode *pnode;
> + struct vfspnode *slave_pnode;
> +};
> +
> +
> +#define PNODE_UP 1
> +#define PNODE_DOWN 2
> +#define PNODE_MID 3
>>Enums, please.
These #defines are gone after the optimizations and cleanup.
> +
> +/*
> + * Walk the pnode tree for each pnode encountered. A given pnode in the tree
> + * can be returned a minimum of 2 times. First time the pnode is encountered,
> + * it is returned with the flag PNODE_DOWN. Every time the pnode is encountered
> + * after having traversed through each of its children, it is returned with the
> + * flag PNODE_MID. And finally when the pnode is encountered after having
> + * walked all of its children, it is returned with the flag PNODE_UP.
> + *
> + * @context: provides context on the state of the last walk in the pnode
> + * tree.
> + */
> +static int inline
> +pnode_next(struct pcontext *context)
>>Rather large function to be an inline.
yes. done.
> +{
> + int traversal = context->traversal;
> + int ret=0;
> + struct vfspnode *pnode = context->pnode,
> + *slave_pnode=context->slave_pnode,
> + *master_pnode=context->master_pnode;
>>Add a separate declaration for each variable. The above is hard to read.
yes done.
> + struct list_head *next;
> +
> + spin_lock(&vfspnode_lock);
> + /*
> + * the first traversal will be on the root pnode
> + * with flag PNODE_DOWN
> + */
> + if (!pnode) {
> + context->pnode = get_pnode(context->start);
> + context->master_pnode = NULL;
> + context->traversal = PNODE_DOWN;
> + context->slave_pnode = NULL;
> + context->level = 0;
> + ret = 1;
> + goto out;
> + }
> +
> + /*
> + * if the last traversal was PNODE_UP, than the current
> + * traversal is PNODE_MID on the master pnode.
> + */
> + if (traversal == PNODE_UP) {
> + if (!master_pnode) {
> + /* DONE. return */
> + put_pnode(pnode);
> + ret = 0;
>> Using goto out and dropping the else branch would make this more readable.
This code is also gone after optimization and cleanups.
> + } else {
> + context->traversal = PNODE_MID;
> + context->level--;
> + context->pnode = master_pnode;
> + put_pnode(slave_pnode);
> + context->slave_pnode = pnode;
> + context->master_pnode = (master_pnode ?
> + master_pnode->pnode_master : NULL);
> + ret = 1;
> + }
> + } else {
> + if(traversal == PNODE_MID) {
>> Missing space before parenthesis.
Yes ensured that throughout the patches. Hope I have not missed any.
> + next = slave_pnode->pnode_peer_slave.next;
> + } else {
> + next = pnode->pnode_slavepnode.next;
> + }
>> Please drop the extra braces.
Yes done.
> + put_pnode(slave_pnode);
> + context->slave_pnode = NULL;
> + /*
> + * if the last traversal was PNODE_MID or PNODE_DOWN, and the
> + * master pnode has some slaves to traverse, the current
> + * traversal will be PNODE_DOWN on the slave pnode.
> + */
> + if ((next != &pnode->pnode_slavepnode) &&
> + (traversal == PNODE_DOWN || traversal == PNODE_MID)) {
> + context->traversal = PNODE_DOWN;
> + context->level++;
> + context->pnode = get_pnode(list_entry(next,
> + struct vfspnode, pnode_peer_slave));
> + context->master_pnode = pnode;
> + ret = 1;
> + } else {
> + /*
> + * since there are no more children, the current traversal
> + * is PNODE_UP on the same pnode
> + */
> + context->traversal = PNODE_UP;
> + ret = 1;
>> Would probably make more sense to check if
>> (next == &pnode->pnode_slavepnode && traversal == PNODE_UP) and use goto out to
>> get rid of the else branch.
Again after code-cleanup and optimization, this code is gone.
> + }
> + }
> +out:
> + spin_unlock(&vfspnode_lock);
> + return ret;
> +}
> +
> +
> +static void
> +_pnode_disassociate_mnt(struct vfsmount *mnt)
>> Two underscores, please.
Yes. Did it!
> +struct vfsmount *
> +pnode_make_mounted(struct vfspnode *pnode, struct vfsmount *mnt, struct dentry *dentry)
> +{
> + struct vfsmount *child_mnt;
> + int ret=0,traversal,level;
>> Spaces, please.
Yes done.
> + struct vfspnode *slave_pnode, *master_pnode, *child_pnode, *slave_child_pnode;
> + struct vfsmount *slave_mnt, *member_mnt, *t_m;
>> Formatting damage.
Yep. Done
> + while(pnode_next(&context)) {
>> Missing space before parenthesis.
Done
> + traversal = context.traversal;
> + level = context.level;
> + pnode = context.pnode;
> + slave_pnode = context.slave_pnode;
> + master_pnode = context.master_pnode;
> +
> + if (traversal == PNODE_DOWN ) {
>> Use switch statement here.
yep! did it.
> + if (master_pnode) {
> + child_pnode = (struct vfspnode *)p_array[level-1].in_data;
>> Redundant cast.
done.
> + } else {
> + child_pnode = NULL;
> + }
>> Extra braces.
done
> + while (!(child_pnode = pnode_alloc()))
> + schedule();
>> This looks dangerous. Why this must not fail and in other places you
>> return -ENOMEM?
I have changed it to return -ENOMEM. But its not perfect yet. The
code is not graceful returning. It leaves around some data-structures.
I am working on this last aspect currently.
> + p_array[level].my_data = (void *)child_pnode;
>> Redundant cast.
Yes. done.
> + child_pnode = (struct vfspnode *)p_array[level].my_data;
>> Redundant cast.
Yes. done.
> +#define MS_PRIVATE 262144
> +#define MS_SLAVE 524288
> +#define MS_SHARED 1048576
>> The expression (1<<bit) would make more sense here.
Yes. done.
> Index: 2.6.12/include/linux/pnode.h
> ===================================================================
> --- /dev/null
> +++ 2.6.12/include/linux/pnode.h
> @@ -0,0 +1,78 @@
> +/*
> + * linux/fs/pnode.c
> + *
> + * (C) Copyright IBM Corporation 2005.
> + * Released under GPL v2.
> + *
> + */
> +#ifndef _LINUX_PNODE_H
> +#define _LINUX_PNODE_H
> +#ifdef __KERNEL__
>> No need for the above. Kernel headers are not supposed to be included by
>> userspace.
Removed it.
> +
> +#include <linux/list.h>
> +#include <linux/mount.h>
> +#include <linux/spinlock.h>
> +#include <asm/atomic.h>
> +
> +struct vfspnode {
> + struct list_head pnode_vfs; /* list of vfsmounts anchored here */
> + struct list_head pnode_slavevfs; /* list of slave vfsmounts */
> + struct list_head pnode_slavepnode;/* list of slave pnode */
> + struct list_head pnode_peer_slave;/* going through master's slave pnode
> + list*/
> + struct vfspnode *pnode_master; /* master pnode */
> + int pnode_flags;
> + atomic_t pnode_count;
> +};
> +#define PNODE_MAX_SLAVE_LEVEL 10
> +#define PNODE_DELETE 0x01
> +#define PNODE_SLAVE 0x02
>> Enums, please.
I have left it as is for now. These defines can also go. I will fix
them.
> +
> +#define IS_PNODE_DELETE(pn) ((pn->pnode_flags&PNODE_DELETE)==PNODE_DELETE)
> +#define IS_PNODE_SLAVE(pn) ((pn->pnode_flags&PNODE_SLAVE)==PNODE_SLAVE)
> +#define SET_PNODE_DELETE(pn) pn->pnode_flags |= PNODE_DELETE
> +#define SET_PNODE_SLAVE(pn) pn->pnode_flags |= PNODE_SLAVE
>> Static inline functions are preferred over #define.
done
> +
> +extern spinlock_t vfspnode_lock;
> +extern void __put_pnode(struct vfspnode *);
> +
> +static inline struct vfspnode *
> +get_pnode(struct vfspnode *pnode)
> +{
> + if (!pnode)
> + return NULL;
>> Can pnode really be NULL here? Looking at the callers in this patch, it can't.
>> Please remember that you should do NULL checks like this only when it makes
>> sense from API point of view to call the function with NULL.
Yes there are cases where it they can be called with null. If I don't
have these checks, than the check has to be done in the caller. So I chose
it here.
> +//TOBEDONE WRITE BETTER MACROS. ..
>>Please use static inline functions instead.
yes done.
Response to Miklos comments follows:
> -struct vfsmount *lookup_mnt(struct vfsmount *mnt, struct dentry *dentry)
> +struct vfsmount *lookup_mnt(struct vfsmount *mnt, struct dentry *dentry, struct dentry *root)
> {
>> How about changing it to inline and calling it __lookup_mnt_root(),
>> and calling it from lookup_mnt() (which could keep the old signature)
>> and lookup_mnt_root(). That way the compiler can optimize away the
>> root check for the plain lookup_mnt() case, and there's no need to
>> modify callers of lookup_mnt().
Yes. DOne.
>
> +struct vfsmount *do_make_mounted(struct vfsmount *mnt, struct dentry *dentry)
> +{
>> What does this function do? Can we have a header comment?
Yes added a header.
> +int
> +_do_make_mounted(struct nameidata *nd, struct vfsmount **mnt)
> +{
>> Ditto.
Added a header.
> +/*
> + * recursively change the type of the mountpoint.
> + */
> +static int do_change_type(struct nameidata *nd, int flag)
> +{
> + struct vfsmount *m, *mnt;
> + struct vfspnode *old_pnode = NULL;
> + int err;
> +
> + if (!(flag & MS_SHARED) && !(flag & MS_PRIVATE)
> + && !(flag & MS_SLAVE))
> + return -EINVAL;
> +
> + if ((err = _do_make_mounted(nd, &mnt)))
> + return err;
>> Why does this opertation do any mounting? If it's a type change, it
>> should just change the type of something already mounted, no?
Ok. Now it returns -EINVAL.
> + case MS_SHARED:
> + SET_MNT_PRIVATE(m);
....
> + break;
> +
>> Can this be split into three functions?
Yes split them into 3 functions.
> +static int inline
> +pnode_next(struct pcontext *context)
> +{
>> Is such a generic traversal function really needed? Why?
As I said in my earlier mail, this function is equivalent of next_mnt
for traversing vfsmount tree.
I have optimized this function as well as pnode_traverse(). Hopefully you
may find the new code palatable.
> +struct vfsmount *
> +pnode_make_mounted(struct vfspnode *pnode, struct vfsmount *mnt, struct dentry *dentry)
> +{
>>Again a header comment would be nice, on what exactly this function
>>does. Also the implementation is really cryptic, but I can't even
>>start to decipher without knowing what it's supposed to do.
Yes. added a header comment.
> +static inline struct vfspnode *
> +get_pnode_n(struct vfspnode *pnode, size_t n)
> +{
>> Seems to be unused throughout the patch series
True removed.
>
> +
> static inline struct vfsmount *mntget(struct vfsmount *mnt)
>> Please don't add empty lines.
Yes done.
-----------------------------------------------------------------------
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
|
|