Re: [PATCH] Write the inode itself in block_fsync()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Andrew Morton wrote:
Sam Vilain <[email protected]> wrote:
OGAWA Hirofumi wrote:
>>
 >For block device's inode, we don't write a inode's meta data
 >itself. But, I think we should write inode's meta data for fsync().

 Ouch... won't that halve performance of database transaction logs?

Yes, it could well cause a lot more seeking to do atime and/or mtime
writes.   Which aren't terribly important, really.

Unless I'm missing something, I suspect we'd be better off without this,
even though it's a correctness fix :(

Maybe atime/mtime aren't important, but I would be unhappy if a file size change wasn't written to disk on fsync.

Anyway, shouldn't databases be using a combination of fixed-size files and fdatasync? fsync doesn't perform well by definition, and I guess the only reason databases still use it is because the kernel failed to implement the sucky part of the behaviour.

A complex but perhaps viable suggestion: as atime/mtime are stored on disk in second granularity (on ext3 at least, don't know about other fss), wouldn't it somehow be possible to only regard atime/mtime changes as real changes when i_(a|c)time.tv_sec changes? This would enable fsync to write the inode once every second instead of on every fsync. The performance drop would be much less dramatic than writing the inode on every fsync, and it would at least yield correct behaviour.

Cheers,
Bart
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux