Improving read/write/close system call reliability when used with pthreads

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello!

The attached patch attempts to improve read/write/close system call reliability when used with pthreads and pipes. At least two issues are handled:

1. Read/write system calls hanging indefinitely when the fd is closed by another thread.

2. Read/write fd (reuse) alias problems in combination with - ERESTARTSYS.

For (1), a simple scenario has two threads: a reader and a watchdog. The reader thread is stuck in a blocking read of an unreliable pipe. The watchdog thread is designed to cancel the blocking read using close on the reader's fd.

Unfortunately, with the current kernel, the reader thread does not wake up by the close call issued by the watchdog. Instead, the reader thread remains stuck unless some unrelated signal happens to arrive, in which case it wakes up and terminates with -EBADF. This is also the current workaround: have the process signal itself after the close call in the watchdog thread.

For (2), the scenario can be as (1) above, adding an open system call between the close and signal events. Since the read fd has been closed it's now reused by open for a completely unrelated file, and when read finally wakes up by the signal it starts reading it.

In both of these cases, I believe it would be desirable to have read terminate immediately with -EBADF. Similar scenarios can be made for the write system call (and perhaps others as well).

The attached sample programs (watchdog.c and aliasing.c) demonstrate the effects.

Please regard the attached patch as proof-of-concept to trying to solve these problems. The FIXME certainly needs more thought, as do locking schemes, performance impact, related fd issues, and the approach taken in general. I'd be delighted if it would be possible to sort out. :-)

Many thanks,
Fredrik



Notes regarding the patch:

The patch adds a required_fds list to task_struct. System calls such as read and write register their fd respectively in this list, as the fd:s are required for successful completion of the calls. When close is called, it scans the required_fds lists and marks closed fd:s with -EBADF, as well as notifies the file with the new file->f_op- >closing_fd file operation. This allows files to wake up waiting read/write calls. When read/write wakes up, it checks its required_fds list and cancels if its fd has been marked -EBADF.

fs/open.c | 63 +++++++++++++++++++++++++++++++++++++ +++++++++
fs/pipe.c                 |   28 ++++++++++++++++++++
fs/read_write.c           |    6 ++++
include/linux/fs.h        |    5 +++
include/linux/init_task.h |    1
include/linux/sched.h     |   13 ++++++++-
kernel/fork.c             |    1
7 files changed, 115 insertions(+), 2 deletions(-)



Notes regarding POSIX:

Whether or not close may cancel I/O operations appears to be implementation-defined if I interpret the following section correctly, unless anyone knows more specific details?

The Open Group Base Specifications Issue 6
IEEE Std 1003.1, 2004 Edition
Copyright 2001-2004 The IEEE and The Open Group, All Rights reserved.

When there is an outstanding cancelable asynchronous I/O operation against fildes when close() is called, that I/O operation may be canceled. An I/O operation that is not canceled completes as if the close() operation had not yet occurred. All operations that are not canceled shall complete as if the close() blocked until the operations completed. The close() operation itself need not block awaiting such I/O completion. Whether any I/O operation is canceled, and which I/O operation may be canceled upon close(), is implementation-defined.



watchdog.c:

#include <sys/types.h>
#include <sys/uio.h>
#include <pthread.h>
#include <unistd.h>
#include <signal.h>
#include <stdio.h>
#include <errno.h>

int inout[2] = { -1, -1 };

void *watchdog(void *arg)
{
	fprintf(stderr, "Sleeping...\n");
	sleep(1);
	fprintf(stderr, "Closing...\n");
	close(inout[0]);
	fprintf(stderr, "Closed.\n");
	return NULL;
}

void wakeup(int sig)
{
	fprintf(stderr, "Alarmed.\n");
}

int main(int argc, char *argv[])
{
	pthread_t th;
	char buf[1];
	ssize_t r;
	
	pipe(inout);
	pthread_create(&th, NULL, watchdog, NULL);
	signal(SIGALRM, wakeup);
	alarm(5);
	
	fprintf(stderr, "Reading...\n");
	r = read(inout[0], buf, sizeof(buf));
	if (r == -1)
		perror("read");
	fprintf(stderr, "Read done.\n");
	
	pthread_join(th, NULL);
	fprintf(stderr, "Exit.\n");
	
	return 0;
}



aliasing.c:

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/uio.h>
#include <pthread.h>
#include <unistd.h>
#include <signal.h>
#include <stdio.h>
#include <errno.h>

int inout[2] = { -1, -1 };

void *aliasing(void *arg)
{
	fprintf(stderr, "Sleeping...\n");
	sleep(1);
	fprintf(stderr, "Closing...\n");
	close(inout[0]);
	fprintf(stderr, "Closed.\n");
	fprintf(stderr, "Opening...\n");
	open(__FILE__, O_RDONLY);
	fprintf(stderr, "Opened.\n");
	return NULL;
}

void wakeup(int sig)
{
	fprintf(stderr, "Alarmed.\n");
}

int main(int argc, char *argv[])
{
	pthread_t th;
	char buf[1];
	ssize_t r;
	
	pipe(inout);
	pthread_create(&th, NULL, aliasing, NULL);
	signal(SIGALRM, wakeup);
	alarm(5);
	
	fprintf(stderr, "Reading...\n");
	r = read(inout[0], buf, 1);
	if (r == -1)
		perror("read");
	else
		fprintf(stderr, "Alias read!\n");
	fprintf(stderr, "Read done.\n");
	
	pthread_join(th, NULL);
	fprintf(stderr, "Exit.\n");
	
	return 0;
}



required-fds.patch (not sure Apple Mail can handle this properly though...):

--- linux-2.6.19-gentoo-r5/include/linux/init_task.h 2007-07-12 22:03:14.000000000 +0200 +++ linux-2.6.19-required-fds/include/linux/init_task.h 2007-08-12 03:51:34.000000000 +0200
@@ -126,6 +126,7 @@
	.thread		= INIT_THREAD,					\
	.fs		= &init_fs,					\
	.files		= &init_files,					\
+	.required_fds	= LIST_HEAD_INIT(tsk.required_fds),		\
	.signal		= &init_signals,				\
	.sighand	= &init_sighand,				\
	.nsproxy	= &init_nsproxy,				\
--- linux-2.6.19-gentoo-r5/include/linux/sched.h 2007-07-12 22:03:14.000000000 +0200 +++ linux-2.6.19-required-fds/include/linux/sched.h 2007-08-12 13:47:44.000000000 +0200
@@ -907,6 +907,8 @@
	struct fs_struct *fs;
/* open file information */
	struct files_struct *files;
+/* file descriptors required to complete current I/O operation successfully */
+	struct list_head required_fds;
/* namespaces */
	struct nsproxy *nsproxy;
/* signal handlers */
@@ -930,7 +932,7 @@
/* Thread group tracking */
    	u32 parent_exec_id;
    	u32 self_exec_id;
-/* Protection of (de-)allocation: mm, files, fs, tty, keyrings */
+/* Protection of (de-)allocation: mm, files, required_fds, fs, tty, keyrings */
	spinlock_t alloc_lock;
	/* Protection of the PI data structures: */
@@ -1025,6 +1027,13 @@
#endif
};
+#define REQUIRED_FD_INIT(fd) { .fd = fd }
+
+struct required_fd {
+	struct list_head list;
+	int fd;
+};
+
static inline pid_t process_group(struct task_struct *tsk)
{
	return tsk->signal->pgrp;
@@ -1429,7 +1438,7 @@
		(thread_group_leader(p) && !thread_group_empty(p))
/*
- * Protects ->fs, ->files, ->mm, ->group_info, ->comm, keyring
+ * Protects ->fs, ->files, ->mm, ->group_info, ->comm, - >required_fds, keyring * subscriptions and synchronises with wait4(). Also used in procfs. Also
  * pins the final release of task.io_context.  Also protects ->cpuset.
  *
--- linux-2.6.19-gentoo-r5/include/linux/fs.h 2007-07-12 22:03:14.000000000 +0200 +++ linux-2.6.19-required-fds/include/linux/fs.h 2007-08-12 13:27:02.000000000 +0200
@@ -1120,6 +1120,7 @@
	int (*mmap) (struct file *, struct vm_area_struct *);
	int (*open) (struct inode *, struct file *);
	int (*flush) (struct file *, fl_owner_t id);
+ void (*closing_fd) (struct inode *, struct file *, struct files_struct *files, unsigned int fd);
	int (*release) (struct inode *, struct file *);
	int (*fsync) (struct file *, struct dentry *, int datasync);
	int (*aio_fsync) (struct kiocb *, int datasync);
@@ -1497,6 +1498,10 @@
			int mode);
extern struct file *filp_open(const char *, int, int);
extern struct file * dentry_open(struct dentry *, struct vfsmount *, int);
+extern int required_fds_are_bad(struct task_struct *task);
+extern void add_required_fd(struct task_struct *task,
+			    struct required_fd *req_fd);
+extern void del_required_fds(struct task_struct *task);
extern int filp_close(struct file *, fl_owner_t id);
extern char * getname(const char __user *);
--- linux-2.6.19-gentoo-r5/kernel/fork.c 2007-07-12 22:03:14.000000000 +0200 +++ linux-2.6.19-required-fds/kernel/fork.c 2007-08-12 13:12:59.000000000 +0200
@@ -1156,6 +1156,7 @@
	retval = copy_thread(0, clone_flags, stack_start, stack_size, p, regs);
	if (retval)
		goto bad_fork_cleanup_namespaces;
+	INIT_LIST_HEAD(&p->required_fds);
p->set_child_tid = (clone_flags & CLONE_CHILD_SETTID) ? child_tidptr : NULL;
	/*
--- linux-2.6.19-gentoo-r5/fs/read_write.c 2007-07-12 22:03:14.000000000 +0200 +++ linux-2.6.19-required-fds/fs/read_write.c 2007-08-12 13:19:59.000000000 +0200
@@ -355,10 +355,12 @@
asmlinkage ssize_t sys_read(unsigned int fd, char __user * buf, size_t count)
{
+	struct required_fd req_fd = REQUIRED_FD_INIT(fd);
	struct file *file;
	ssize_t ret = -EBADF;
	int fput_needed;
+	add_required_fd(current, &req_fd);
	file = fget_light(fd, &fput_needed);
	if (file) {
		loff_t pos = file_pos_read(file);
@@ -366,6 +368,7 @@
		file_pos_write(file, pos);
		fput_light(file, fput_needed);
	}
+	del_required_fds(current);
	return ret;
}
@@ -373,10 +376,12 @@
asmlinkage ssize_t sys_write(unsigned int fd, const char __user * buf, size_t count)
{
+	struct required_fd req_fd = REQUIRED_FD_INIT(fd);
	struct file *file;
	ssize_t ret = -EBADF;
	int fput_needed;
+	add_required_fd(current, &req_fd);
	file = fget_light(fd, &fput_needed);
	if (file) {
		loff_t pos = file_pos_read(file);
@@ -384,6 +389,7 @@
		file_pos_write(file, pos);
		fput_light(file, fput_needed);
	}
+	del_required_fds(current);
	return ret;
}
--- linux-2.6.19-gentoo-r5/fs/open.c	2007-07-12 22:03:14.000000000 +0200
+++ linux-2.6.19-required-fds/fs/open.c 2007-08-12 16:30:03.000000000 +0200
@@ -2,6 +2,9 @@
  *  linux/fs/open.c
  *
  *  Copyright (C) 1991, 1992  Linus Torvalds
+ *
+ *  2007-08-12 Implemented required_fds functionality to close files
+ * reliably with pthreads, by Fredrik Noring <[email protected]>.
  */
#include <linux/string.h>
@@ -1015,6 +1018,53 @@
#endif
+void mark_required_fd_bad(struct task_struct *task, unsigned int fd)
+{
+	struct required_fd *req_fd;
+
+	task_lock(task);
+	list_for_each_entry(req_fd, &task->required_fds, list)
+		if (req_fd->fd == fd)
+			req_fd->fd = -EBADF;
+	task_unlock(task);
+}
+
+int required_fds_are_bad(struct task_struct *task)
+{
+	struct required_fd *req_fd;
+	int ret = 0;
+
+	task_lock(task);
+	list_for_each_entry(req_fd, &task->required_fds, list)
+		if (req_fd->fd == -EBADF) {
+			ret = 1;
+			break;
+		}
+	task_unlock(task);
+	
+	return ret;
+}
+
+EXPORT_SYMBOL(required_fds_are_bad);
+
+void add_required_fd(struct task_struct *task, struct required_fd *req_fd)
+{
+	task_lock(task);
+	list_add(&req_fd->list, &task->required_fds);
+	task_unlock(task);
+}
+
+EXPORT_SYMBOL(add_required_fd);
+
+void del_required_fds(struct task_struct *task)
+{
+	task_lock(task);
+	INIT_LIST_HEAD(&task->required_fds);
+	task_unlock(task);
+}
+
+EXPORT_SYMBOL(del_required_fds);
+
/*
  * "id" is the POSIX thread ID. We use the
  * files pointer for this..
@@ -1048,9 +1098,17 @@
{
	struct file * filp;
	struct files_struct *files = current->files;
+	struct task_struct *task;
	struct fdtable *fdt;
	int retval;
+	/* FIXME: Proof of concept but obviously inefficient. Mark
+	 * within f_op->closing_fd instead, where we know the tasks
+	 * that are really interested in learning this? */
+	for_each_process(task)
+		if(task->files == files)
+			mark_required_fd_bad(task, fd);
+	
	spin_lock(&files->file_lock);
	fdt = files_fdtable(files);
	if (fd >= fdt->max_fds)
@@ -1062,6 +1120,10 @@
	FD_CLR(fd, fdt->close_on_exec);
	__put_unused_fd(files, fd);
	spin_unlock(&files->file_lock);
+	if (filp->f_op && filp->f_op->closing_fd) {
+		struct inode *inode = filp->f_dentry->d_inode;
+		filp->f_op->closing_fd(inode, filp, files, fd);
+	}
	retval = filp_close(filp, files);
	/* can't restart close syscall because file table entry was cleared */
--- linux-2.6.19-gentoo-r5/fs/pipe.c	2007-07-12 22:03:14.000000000 +0200
+++ linux-2.6.19-required-fds/fs/pipe.c 2007-08-12 13:43:13.000000000 +0200
@@ -316,6 +316,11 @@
			wake_up_interruptible_sync(&pipe->wait);
  			kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT);
		}
+		if (required_fds_are_bad(current)) {
+			if (!ret)
+				ret = -EBADF;
+			break;
+		}
		pipe_wait(pipe);
	}
	mutex_unlock(&inode->i_mutex);
@@ -488,6 +493,11 @@
			kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
			do_wakeup = 0;
		}
+		if (required_fds_are_bad(current)) {
+			if (!ret)
+				ret = -EBADF;
+			break;
+		}
		pipe->waiting_writers++;
		pipe_wait(pipe);
		pipe->waiting_writers--;
@@ -576,6 +586,18 @@
	return mask;
}
+static void
+pipe_closing_fd(struct inode *inode, struct file *file,
+		struct files_struct *files, unsigned int fd)
+{
+	struct pipe_inode_info *pipe;
+
+	mutex_lock(&inode->i_mutex);
+	pipe = inode->i_pipe;
+	wake_up_interruptible(&pipe->wait);
+	mutex_unlock(&inode->i_mutex);
+}
+
static int
pipe_release(struct inode *inode, int decr, int decw)
{
@@ -727,6 +749,7 @@
	.poll		= pipe_poll,
	.ioctl		= pipe_ioctl,
	.open		= pipe_read_open,
+	.closing_fd	= pipe_closing_fd,
	.release	= pipe_read_release,
	.fasync		= pipe_read_fasync,
};
@@ -739,6 +762,7 @@
	.poll		= pipe_poll,
	.ioctl		= pipe_ioctl,
	.open		= pipe_write_open,
+	.closing_fd	= pipe_closing_fd,
	.release	= pipe_write_release,
	.fasync		= pipe_write_fasync,
};
@@ -752,6 +776,7 @@
	.poll		= pipe_poll,
	.ioctl		= pipe_ioctl,
	.open		= pipe_rdwr_open,
+	.closing_fd	= pipe_closing_fd,
	.release	= pipe_rdwr_release,
	.fasync		= pipe_rdwr_fasync,
};
@@ -764,6 +789,7 @@
	.poll		= pipe_poll,
	.ioctl		= pipe_ioctl,
	.open		= pipe_read_open,
+	.closing_fd	= pipe_closing_fd,
	.release	= pipe_read_release,
	.fasync		= pipe_read_fasync,
};
@@ -776,6 +802,7 @@
	.poll		= pipe_poll,
	.ioctl		= pipe_ioctl,
	.open		= pipe_write_open,
+	.closing_fd	= pipe_closing_fd,
	.release	= pipe_write_release,
	.fasync		= pipe_write_fasync,
};
@@ -789,6 +816,7 @@
	.poll		= pipe_poll,
	.ioctl		= pipe_ioctl,
	.open		= pipe_rdwr_open,
+	.closing_fd	= pipe_closing_fd,
	.release	= pipe_rdwr_release,
	.fasync		= pipe_rdwr_fasync,
};

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux