Re: MADV_FREE functionality

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 30, 2007 at 06:18:39PM -0700, Andrew Morton wrote:
> > In short:
> > - both MADV_FREE and MADV_DONTNEED only unmap file pages
> > - after MADV_DONTNEED the application will always get back
> >    fresh zero filled anonymous pages when accessing the
> >    memory
> > - after MADV_FREE the application can either get back the
> >    original data (without a page fault) or zero filled
> >    anonymous memory
> > 
> > The Linux MADV_DONTNEED behavior is not POSIX compliant.
> > POSIX says that with MADV_DONTNEED the application's data
> > will be preserved.
> > 
> > Currently glibc simply ignores POSIX_MADV_DONTNEED requests
> > from applications on Linux.  Changing the behaviour which
> > some Linux applications may rely on might not be the best
> > idea.
> 
> OK, thanks.  I stuck that in the changelog.

FYI, Solaris man page on MADV_FREE says:

      MADV_FREE
            Tells  the  kernel  that  contents  in  the  specified
            address  range  are  no longer important and the range
            will be overwritten. When there is demand for  memory,
            the  system will free pages associated with the speci-
            fied address range. In this instance, the next time  a
            page  in the address range is referenced, it will con-
            tail all zeroes.  Otherwise, it will contain the  data
            that was there prior to the MADV_FREE call. References
            made to the address range will  not  make  the  system
            read from backing store (swap space) until the page is
            modified again.

            This value cannot be used on mappings that have under-
            lying file objects.

The last paragraph seems to be just about the operation being
undefined, madvise MADV_FREE on MAP_SHARED file mapping returns 0
rather than flagging an error.

FreeBSD man page:

        MADV_FREE        Gives the VM system the freedom to free pages, and tells
                         the system that information in the specified page range
                         is no longer important.  This is an efficient way of
                         allowing malloc(3) to free pages anywhere in the address
                         space, while keeping the address space valid.  The next
                         time that the page is referenced, the page might be
                         demand zeroed, or might contain the data that was there
                         before the MADV_FREE call.  References made to that
                         address space range will not make the VM system page the
                         information back in from backing store until the page is
                         modified again.

> Also, where did we end up with the Solaris compatibility?
> 
> The patch I have at present retains MADV_FREE=0x05 for sparc and sparc64
> which should be good.
> 
> Did we decide that the Solaris and Linux implementations of MADV_FREE are
> compatible?

SPARC Solaris binary compatibility in Linux is in really bad shape, madvise
in Solaris is implemented using memcntl syscall (at least according to truss(1))
and that syscall is
systbl.S:       .word solaris_unimplemented     /* memcntl              131     */
When/if anyone decides to put more effort into the Solaris binary compatibility
(I'm quite doubtful anyone will), codes which don't match can be simply translated into
other codes, ignored etc., we can't use sys_madvise to implement memcntl
syscall anyway.  While Solaris MADV_FREE is the same as Linux MADV_FREE proposed
by Rik (except perhaps the documented undefined behavior with file mappings,
on
#include <sys/mman.h>
#include <unistd.h>
#include <fcntl.h>

int
main (void)
{
  getpid ();
  int fd = open ("test", O_RDWR);
  void *p = mmap (0, 8192, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
  memset (p, ' ', 8192);
  madvise (p, 8192, MADV_FREE);
  return 0;
}
on Solaris the spaces actually made it into the file), MADV_DONTNEED is not,
but that doesn't really matter except for arch/sparc*/solaris/ layer if anyone
cares.  We certainly can't change current MADV_DONTNEED behavior, all we
can do is implement a new MADV_* code with a different behavior and let glibc
translate POSIX_MADV_* codes on posix_madvise to the Linux specific MADV_*.

	Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux