Re: msync() behaviour broken for MS_ASYNC, revert patch?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Linus Torvalds wrote:

On Sat, 11 Feb 2006, Nick Piggin wrote:

Your pattern would actually be

	.. dirty offset 100-200 ..
	fadvice(fd, 100, 200, FADV_WRITE_START);

	.. dirty offset 200-300 ..
	fadvice(fd, 200, 300, FADV_WRITE_START);

	.. dirty offset 300-400 ..
	fadvice(fd, 300, 400, FADV_WRITE_START);

	.. dirty offset 400-415 .. (for the next transaction)


- IOW if the app or OS crashed here it would be possible to see 400-415 on
the disk and none of the previous transactions (assuming we don't know
the page size).


If the app/OS crashed here, nothing would matter. We haven't committed anything at all yet. We've just started the IO. What is at 400-415 simply doesn't matter, because nobody would have any reason to look at it.

(Besides, it's not at all clear that 400-415 would or would not be on disk. It depends on entirely on timing and buffering of the IO system at that point - the fact that its dirty in memory doesn't mean that it ever made it into the IO buffer that was started).


	fadvice(fd, 100, 400, FADV_JUST_WAIT); (for the previous one)


This is the one that waits for it to finish, so _now_ we can update the pointers (elsewhere) to that log (and if the app/OS crashes before that, nobody will even know about it).

See?


Well in that case in your argument your FADV_WRITE_START is of
the "waits for writeout then starts writeout if dirty" type.

In which case you've just made 3 consecutive  write+wait cycles
to the same page, so it is hardly an optimal IO pattern.


I'm not convinced. You above example was bogus.


No, your understanding was incomplete. I'm talking about just parts of a much bigger transaction. A single write on its own is almost never a transaction unless your system is _purely_ log-based (which it could be, of course. Not in my example).


You were saying that your above sequence would be more efficient
if implemented with "always start IO, and just wait for IO", because
"write and wait" would do 2 write+wait cycles.

However "always start IO, and just wait for IO" does 3 write+wait cycles.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com -
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux