RE: hugepage: Strict page reservation for hugepage inodes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>>-----Original Message-----
>>From: [email protected] [mailto:[email protected]] On Behalf Of David Gibson
>>Sent: 2006年2月28日 15:12
>>To: Andrew Morton
>>Cc: William Lee Irwin; [email protected]
>>Subject: hugepage: Strict page reservation for hugepage inodes
>>
>>This applies on top of my patch
>>hugepage-serialize-hugepage-allocation-and-instantiation.patch from
>>-mm, and akpm's associated tidying patches.
>>Signed-off-by: David Gibson <[email protected]>
>>
>>Index: working-2.6/mm/hugetlb.c
>>===================================================================
>>--- working-2.6.orig/mm/hugetlb.c	2006-02-28 18:03:25.000000000 +1100
>>+++ working-2.6/mm/hugetlb.c	2006-02-28 18:03:37.000000000 +1100
>>@@ -21,7 +21,7 @@
>> #include <linux/hugetlb.h>
>>
>> const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL;
>>-static unsigned long nr_huge_pages, free_huge_pages;
>>+static unsigned long nr_huge_pages, free_huge_pages, reserved_huge_pages;
>> unsigned long max_huge_pages;
>> static struct list_head hugepage_freelists[MAX_NUMNODES];
>> static unsigned int nr_huge_pages_node[MAX_NUMNODES];
>>@@ -119,17 +119,146 @@ void free_huge_page(struct page *page)
>>
>> struct page *alloc_huge_page(struct vm_area_struct *vma, unsigned long addr)
>> {
>>+	struct inode *inode = vma->vm_file->f_dentry->d_inode;
>> 	struct page *page;
>>+	int use_reserve = 0;
>>+	unsigned long idx;
>>
>> 	spin_lock(&hugetlb_lock);
>>-	page = dequeue_huge_page(vma, addr);
>>-	if (!page) {
>>-		spin_unlock(&hugetlb_lock);
>>-		return NULL;
>>+
>>+	if (vma->vm_flags & VM_MAYSHARE) {
>>+
>>+		/* idx = radix tree index, i.e. offset into file in
>>+		 * HPAGE_SIZE units */
>>+		idx = ((addr - vma->vm_start) >> HPAGE_SHIFT)
>>+			+ (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT));
>>+
>>+		/* The hugetlbfs specific inode info stores the number
>>+		 * of "guaranteed available" (huge) pages.  That is,
>>+		 * the first 'prereserved_hpages' pages of the inode
>>+		 * are either already instantiated, or have been
>>+		 * pre-reserved (by hugetlb_reserve_for_inode()). Here
>>+		 * we're in the process of instantiating the page, so
>>+		 * we use this to determine whether to draw from the
>>+		 * pre-reserved pool or the truly free pool. */
>>+		if (idx < HUGETLBFS_I(inode)->prereserved_hpages)
>>+			use_reserve = 1;
>>+	}
>>+
>>+	if (!use_reserve) {
>>+		if (free_huge_pages <= reserved_huge_pages)
>>+			goto fail;
>>+	} else {
>>+		BUG_ON(reserved_huge_pages == 0);
>>+		reserved_huge_pages--;
[YM] Consider this scenario of multi-thread:
One process has 2 threads. The process mmaps a hugetlb area with 1 huge page and
there is a free huge page. Later on, the 2 threads fault on the huge page at the same time.
The second thread would fail, and WARN_ON check is triggered, then the second thread is killed
by function hugetlb_no_page.



>> 	}
>>+
>>+	page = dequeue_huge_page(vma, addr);
>>+	if (!page)
>>+		goto fail;
>>+
>> 	spin_unlock(&hugetlb_lock);
>> 	set_page_count(page, 1);
>> 	return page;
>>+
>>+ fail:
>>+	WARN_ON(use_reserve); /* reserved allocations shouldn't fail */
>>+	spin_unlock(&hugetlb_lock);
>>+	return NULL;
>>+}
>>+
>>+/* hugetlb_extend_reservation()
>>+ *
>>+ * Ensure that at least 'atleast' hugepages are, and will remain,
>>+ * available to instantiate the first 'atleast' pages of the given
>>+ * inode.  If the inode doesn't already have this many pages reserved
>>+ * or instantiated, set aside some hugepages in the reserved pool to
>>+ * satisfy later faults (or fail now if there aren't enough, rather
>>+ * than getting the SIGBUS later).
>>+ */
>>+int hugetlb_extend_reservation(struct hugetlbfs_inode_info *info,
>>+			       unsigned long atleast)
>>+{
>>+	struct inode *inode = &info->vfs_inode;
>>+	struct address_space *mapping = inode->i_mapping;
>>+	unsigned long idx;
>>+	unsigned long change_in_reserve = 0;
>>+	struct page *page;
>>+	int ret = 0;
>>+
>>+	spin_lock(&hugetlb_lock);
>>+	read_lock_irq(&inode->i_mapping->tree_lock);
>>+
>>+	if (info->prereserved_hpages >= atleast)
>>+		goto out;
>>+
>>+	/* prereserved_hpages stores the number of pages already
>>+	 * guaranteed (reserved or instantiated) for this inode.
>>+	 * Count how many extra pages we need to reserve. */
>>+	for (idx = info->prereserved_hpages; idx < atleast; idx++) {
>>+		page = radix_tree_lookup(&mapping->page_tree, idx);
>>+		if (!page)
>>+			/* Pages which are already instantiated don't
>>+			 * need to be reserved */
>>+			change_in_reserve++;
>>+	}
[YM] Why always to go through the page cache? prereserved_hpages and reserved_huge_pages are protected by
hugetlb_lock.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux