Andrew Morton wrote:
Sam Vilain <[email protected]> wrote:
OGAWA Hirofumi wrote:
>>
>For block device's inode, we don't write a inode's meta data
>itself. But, I think we should write inode's meta data for fsync().
Ouch... won't that halve performance of database transaction logs?
Yes, it could well cause a lot more seeking to do atime and/or mtime
writes. Which aren't terribly important, really.
Unless I'm missing something, I suspect we'd be better off without this,
even though it's a correctness fix :(
Maybe atime/mtime aren't important, but I would be unhappy if a file
size change wasn't written to disk on fsync.
Anyway, shouldn't databases be using a combination of fixed-size files
and fdatasync? fsync doesn't perform well by definition, and I guess the
only reason databases still use it is because the kernel failed to
implement the sucky part of the behaviour.
A complex but perhaps viable suggestion: as atime/mtime are stored on
disk in second granularity (on ext3 at least, don't know about other
fss), wouldn't it somehow be possible to only regard atime/mtime changes
as real changes when i_(a|c)time.tv_sec changes? This would enable fsync
to write the inode once every second instead of on every fsync. The
performance drop would be much less dramatic than writing the inode on
every fsync, and it would at least yield correct behaviour.
Cheers,
Bart
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]