Hi Andrew,
I have a full patch for this.
Please don't top-post. It makes things hard...
do you prefer separate mails with patch and with reference to original
report? will do so.
I don't remember the details yet, but lock was not god here, we used
semaphore. I pointed to this problem long ago when fixed error path in
proc with moduleget.
This patch protects proc_dir_entry tree with a proc_tree_sem semaphore.
I suppose lock_kernel() can be removed later after checking that no proc
handlers require it.
Also this patch remakes de refcounters a bit making it more clear and
more similar to dentry scheme - this is required to make sure that
everything works correctly.
Patch is against 2.6.15-rcX and was tested for about a week. Also works
half a year on 2.6.8 :)
[ patch which uses an rwsem for procfs and somewhat removes lock_kernel() ]
I worry about replacing a spinlock with a sleeping lock. In some
circumstances it can cause a complete scalability collapse and I suspect
this could happen with /proc. Although I guess the only fastpath here is
proc_readdir(), and as the lock is taken there for reading, we'll be OK..
The patch does leave some lock_kernel() calls behind. If we're going to do
this, I think they should all be removed?
Races in /proc have been plentiful and hard to find. The patch worries me,
frankly. I'd like to see quite a bit more description of the locking
schema and some demonstration that it's actually complete before taking the
plunge.
ok, here goes some more descriptions:
1.
proc_readdir is a sleeping ops:
sys_getdents
`- vfs_readdir
`- proc_readdir
`- filldir
`- put_user/copy_to_user
that's why we must use semaphore. of course spinlock can be used too,
but this will case another problem: it must be dropped to call filldir, but
beofre it will be retaken the dentry being filldir-ed may be removed and
we won't see it's siblings in output.
2.
lock_kernel() is needed to handle at least simultaneous sys_read vs
create_proc_entry with sequental de->proc_fops = XXX. this can be
handled by passing fops directly to create_proc_entry.
i.e. there is a 3rd problem I pointed to you before:
create_proc_entry() and setting of de->owner/de->proc_fops is inatomic.
lock_kernel() is not a best way to protect against this, sure...
I would prefer to fix it with a separate patch somehow...
3.
refcounting:
each proc_dir_entry's refcounter is the reference from inode plus
references from children. once this count reaches zero - dentry is freed.
so on each proc_register() parent is get-ed, on each remove_proc_entry
parent is put-ed.
Kirill
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]