[Resending, directly to a couple likely suspects this time, in the
hope of getting a reply... thanks]
I have a question about the synchronization of file_operations: is it
intended that the .release method of a file can be called while a
.read or .write method is still running for that file?
The reason I ask is that I've seen a crash in practice that appears to
be caused by this race, and I'm wondering whether the correct fix is
to add synchronization between .read/.write and .release to the
driver's methods (this is a character device), or whether the core
kernel is supposed to provide this synchronization.
I've written a quick test case that seems to show this happen in
practice, and reading the code in fs/open.c and fs/read_write.c I see
no reason why this race can't happen: sys_read() and sys_write() just
do fget_light(), which will not increment the file's reference count
on the fast path, so a racing sys_close() from another thread could do
the final fput() that ends up calling the file's .release method
before the read or write has finished.
I think the actual file data structure is OK, because it is freed with
RCU, so it won't be freed too soon. But a .release method may free
other driver-internal state that is still in use because of this race.
It seems that we wouldn't want to give up the fget_light()
optimization in read/write, so probably the right place to handle this
is in the driver. But on the other hand, this is kind of a subtle
booby trap that has been laid, and maybe there is a clever way to
handle this. So I would appreciate guidance from smarter people.
Thanks,
Roland
BTW, here's the test case -- it seems pretty conclusive to me, but
maybe I'm all wet and misunderstanding things. There's a kernel
module that just prints "Write raced with close?" if it sees the race,
and a userspace driver that just spawns two threads, one that does
open/close in a loop and one that does write in a loop. Running this
on my machine, I see several race messages get printed.
--- kernel module ---
#include <linux/module.h>
#include <linux/init.h>
#include <linux/errno.h>
#include <linux/miscdevice.h>
#include <linux/fs.h>
#include <linux/delay.h>
MODULE_LICENSE("GPL");
static unsigned long flag;
static int cw_open(struct inode *inode, struct file *filp)
{
return 0;
}
static int cw_close(struct inode *inode, struct file *filp)
{
clear_bit(1, &flag);
msleep(1);
if (test_and_set_bit(1, &flag))
printk(KERN_ERR "Write raced with close?\n");
return 0;
}
static ssize_t cw_write(struct file *filp, const char __user *buf,
size_t len, loff_t *pos)
{
set_bit(1, &flag);
return 0;
}
static const struct file_operations cw_fops = {
.owner = THIS_MODULE,
.open = cw_open,
.release = cw_close,
.write = cw_write,
};
static struct miscdevice cw_misc = {
.minor = MISC_DYNAMIC_MINOR,
.name = "cw",
.fops = &cw_fops,
};
static int __init cw_init(void)
{
return misc_register(&cw_misc);
}
static void __exit cw_cleanup(void)
{
misc_deregister(&cw_misc);
}
module_init(cw_init);
module_exit(cw_cleanup);
--- userspace code ---
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <pthread.h>
static int dfd = -1;
static void *write_loop(void *p)
{
char buf[1];
while (1)
write(dfd, buf, 1);
return NULL;
}
int main(int argc, char *argv[])
{
pthread_t thr;
if (pthread_create(&thr, NULL, write_loop, NULL))
return 1;
while (1) {
dfd = open("/dev/cw", O_RDWR);
close(dfd);
}
return 0;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]