Re: msync() behaviour broken for MS_ASYNC, revert patch?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Sat, 11 Feb 2006, Nick Piggin wrote:
> 
> MS_INVALIDATE does that (in Linux),

I don't actually think it does.

In _current_ linux it does. In some other versions, it will have thrown 
the dirty data away. Also, it will make subsequent accesses much much more 
expensive - and it doesn't work on locked areas.

>					 the spec is poorly worded but the
> intention seems to be that it would push dirty state back into pagecache for
> implementations such as ours.

As an application writer, you'd be absolutely crazy to depend on that.

Using "msync( .. 0)" _may_ actually work reliably under any Linux version, 
but I wouldn't bet on it, and it's quite possible that it does strange 
things on other systems. Again, an application writer that uses it would 
have to be deranged (or very much a kernel person - I could imagine doing 
it myself, but I could _not_ imagine doing it as a non-kernel developer).

> [email protected] has an application (database or logging I think), which
> uses MS_SYNC to provide integrity guarantees, however it is possible to do
> useful work between the last write to memory and the commit point. MS_ASYNC
> is used to start the IO and pipeline work.

So you're saying that there is one application that knows it could use 
different semantics?

Now, please enumerate all the applications that use MS_ASYNC and prefer 
the current semantics.

When you know that, you have an argument. 

In the meantime, you have an example of an application that wants _new_ 
semantics.

> > The current MS_ASYNC behaviour is the sane one. It's the one that doesn't
> > cause the harddisk to start ticking senselessly. It's the one that allows a
> > person on a laptop to say "don't write dirty data every 5 seconds - do it
> > just every hour".
> 
> MS_INVALIDATE

Repeating something doesn't make it so.

> > In contrast, _your_ proposal is just inflexible and inconvenient.
> 
> Currently MS_ASYNC does the same as MS_INVALIDATE. But it used to start
> IO (before 2.5.something), and apparently it does in Solaris as well.

Actually, it did _not_ use to start IO.

Then, somebody made it do so, and people eventually screamed, and it was 
reverted again.

Go check Linux-2.0 or something. You'll also see the "MS_INVALIDATE means 
throw the dirty bit away" behaviour.

The _sane_ semantics are that if you say "MS_INVALIDATE" the dirty bit is 
just thrown away. If you say "MS_INVALIDATE | MS_ASYNC", the dirty bit is 
saved in the page cache and then the page is unmapped. And MS_SYNC 
obviously does the same thing, except it also waits for it.

Those are the the _logically consistent_ semantics. And it's what Linux 
historically did. The fact that we now think "MS_INVALIDATE" on its own 
should mean "save the dirty state" is because some other broken operating 
system does it, and it's sadly the _safer_ thing to do, even if it's 
clearly logically not sane. If you invalidate a mapping, you throw it 
away, you don't save it.

Gaah. 

I took the time to actually unpack 2.0.40. And yes, it does exactly what I 
remember it doing. If you pass in MS_INVALIDATE (with no *SYNC flags) it 
does:

                pte_clear(ptep);
		...
                if (!pte_dirty(pte) || flags == MS_INVALIDATE) {
                        free_page(page);
                        return 0;
                }

without ever marking anything dirty.

> > If somebody really really wants to "start flushing data now", then he can do
> > so, but that actually has absolutely zero to do with "msync()" any more. A
> > person who wants the flushing to start "now" might want to flush any random
> > dirty buffers.
> 
> I didn't quite understand what you're saying here.

I'm saying that "start flushing now" has _zero_ to do with an mmap.

It's a perfectly valid operation after a _write_ call too - even if you 
never mmaped the area at all.

So if somebody wants to start background IO, what has that got to do with 
msync()?

			Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux