Re: PROBLEM: pthread-safety bug in write(2) on Linux 2.6.x

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



At 09:46 PM 4/12/2006, Andrew Morton wrote:
Locking for file.f_pos is generally file->f_dentry->d_inode->i_mutex.  We
could use that if we were to restructure the code a lot.  Or we could add a
new lock to `struct file'.

Or we could do nothing, because a) the application is going to produce
inderterminate output anyway and b) because it only affects silly testcases
and not real-world apps.

OK, there _might_ be a real-world case: threads appending logging
information to a flat file.  Trivially workable-around with a userspace
lock, or by switching to stdio (same thing).

Yes, really we should fix it.  But it's not worth adding more overhead to
do so.  So the fix would involve widespread (but simple) change, to draw
that f_pos update inside i_mutex.

Hi Andrew - thanks for the detailed response.

I don't know enough about the kernel implementation to comment on your proposed fixes.

However, I should clarify that this problem definitely affects more than just "silly testcases", and the fact that a program generates non-deterministically ordered output does not necessarily make it erroneous, invalid or unuseful.

This problem arose in the parallel runtime system for a scientific language compiler (nearly a million lines of code total - definitely a "real-world" program) - the example code is merely a pared-down demonstration of the problem. In parallel scientific computing, it's very common for many threads to be writing to stdout (usually for monitoring purposes) and it's expected and normal for output from separate threads to be arbitrarily interleaved, but it's *not* ok for output to be lost entirely. This is essentially equivalent to the real-world example you gave of many threads logging to a file.

We've worked around the problem in Linux 2.6 by adding locking at user-level around our writes, as you suggest, although this of course penalizes our performance on kernels that already correctly implement the thread-safety required by the POSIX spec. In any case it seemed like a problem that we should report, to be good open-source citizens - especially given that it appears to be a regression with respect to the Linux 2.4 kernel. How you choose to handle the report is of course your decision.

Thanks for your time.
Dan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux