Re: hugepage: Strict page reservation for hugepage inodes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 08, 2006 at 05:25:16PM +0800, Zhang, Yanmin wrote:
> >>-----Original Message-----
> >>From: David Gibson [mailto:[email protected]]
> >>Sent: 2006??3??1?? 7:36
> >>To: Zhang, Yanmin
> >>Cc: Andrew Morton; William Lee Irwin; [email protected]
> >>Subject: Re: hugepage: Strict page reservation for hugepage inodes
> >>
> >>hugepage: Strict page reservation for hugepage inodes
> >>
> >>These days, hugepages are demand-allocated at first fault time.
> >>There's a somewhat dubious (and racy) heuristic when making a new
> >>mmap() to check if there are enough available hugepages to fully
> >>satisfy that mapping.
> >>
> >>A particularly obvious case where the heuristic breaks down is where a
> >>process maps its hugepages not as a single chunk, but as a bunch of
> >>individually mmap()ed (or shmat()ed) blocks without touching and
> >>instantiating the pages in between allocations.  In this case the size
> >>of each block is compared against the total number of available
> >>hugepages.  It's thus easy for the process to become overcommitted,
> >>because each block mapping will succeed, although the total number of
> >>hugepages required by all blocks exceeds the number available.  In
> >>particular, this defeats such a program which will detect a mapping
> >>failure and adjust its hugepage usage downward accordingly.
> >>
> >>The patch below addresses this problem, by strictly reserving a number
> >>of physical hugepages for hugepage inodes which have been mapped, but
> >>not instatiated.
> The patch reserves a number of physical hugepages for hugepage inodes
> which have been mapped into address spaces, but it doesn't just reserve the
> pages what it needed. It reserves all huge pages from 0 to inode->i_size. For example,
> 
> 	fd = open("/mnt/hugepages/file1", O_CREAT|O_RDWR, 0755);
>  	*addr = mmap(NULL, HUGEPAGE_SIZE*3, PROT_NONE, MAP_SHARED, fd, HUGEPAGE_SIZE*5);
> 
> The patch would reserve 8 huge pages instead of 3 pages. I know that shmget/shmat
> have no such problem. But mmap has it and the patch looks not perfect.

Yes.  This is a simplifying assumption.  I know of no real application
that will waste pages because of this behaviour.  If you know one,
maybe we will need to reconsider.

> I have an idea. How about to record all the start/end address of huge page mmaping of the inode?
> Long long ago, there was a patch at http://marc.theaimsgroup.com/?l=lse-tech&m=108187931924134&w=2.
> Of course, we need port it to the latest kernel if this idea is better.

I know the patch - I was going to port it to the current kernel, but
came up with my patch instead, because it seemed like a simpler
approach.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux