RE: [PATCH] hugetlb_no_page might break hugetlb quota

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2006-03-06 at 17:06, Zhang, Yanmin wrote:
> On Mon, 2006-03-06 at 16:15, Chen, Kenneth W wrote:
> > Zhang, Yanmin wrote on Sunday, March 05, 2006 10:22 PM
> > > In function hugetlb_no_page, backout path always calls hugetlb_put_quota.
> > > It's incorrect when find_lock_page gets the page or the new page is added
> > > into page cache.
> > 
> > While I acknowledge the bug, this patch is not complete.  It makes file
> > system quota consistent with respect to page cache state. But such quota
> > (more severely, the page cache state) is still buggy, for example under
> > ftruncate case: if one ftrucate hugetlb file and then tries to fault a
> > page outside ftruncate area, a new hugetlb page is allocated and then
> > added into page cache along with file system quota; and at the end
> > returning VM_FAULT_SIGBUS.  In this case, kernel traps an unreachable
> > page until possibly next mmap that extends it.  That need to be fixed.
> I have another patch to fix it. The second patch is to delete checking
> (!(vma->vm_flags & VM_WRITE) && len > inode->i_size) in function
> hugetlbfs_file_mmap, and add a checking in hugetlb_no_page,
> to implement a capability for application to mmap
> an zero-length huge page area. It's useful for process to protect one area.
> As a side effect, the second patch also fixes the problem you said.
> 
Here is the second patch.

From: Zhang Yanmin <[email protected]>

Sometimes, applications need below call to be successful although
"/mnt/hugepages/file1" doesn't exist.

	fd = open("/mnt/hugepages/file1", O_CREAT|O_RDWR, 0755);
 	*addr = mmap(NULL, 0x1024*1024*256, PROT_NONE, MAP_SHARED, fd, 0);

As for regular pages (or files), above call does work, but as for huge pages,
above call would fail because hugetlbfs_file_mmap would fail if 
(!(vma->vm_flags & VM_WRITE) && len > inode->i_size). 

This capability on huge page is useful on ia64 when the process wants to 
protect one area on region 4, so other threads couldn't read/write this
area.

My patch against 2.6.16-rc5-mm3 implements it.

In addition, my patch fixes a bug caught by Ken. Consider below scenario:
Process A mmaps a hugetlb file with 5 huge pages, then process B truncates
the file with length of 2 huge pages. Later on, process A still could fault
on huge page 3-5, and these huge pages would be added into page cache.
Then, no processes could access these huge pages. My patch fixes it by
moving checking (idx >= size) to the begining of function hugetlb_no_page.

Thanks for Ken's good suggestions.

Signed-off-by: Zhang Yanmin <[email protected]>

---

diff -Nraup linux-2.6.16-rc5-mm3/fs/hugetlbfs/inode.c linux-2.6.16-rc5-mm3_huge_check/fs/hugetlbfs/inode.c
--- linux-2.6.16-rc5-mm3/fs/hugetlbfs/inode.c	2006-03-08 17:59:10.000000000 +0800
+++ linux-2.6.16-rc5-mm3_huge_check/fs/hugetlbfs/inode.c	2006-03-08 18:17:15.000000000 +0800
@@ -84,8 +84,6 @@ static int hugetlbfs_file_mmap(struct fi
 
 	ret = -ENOMEM;
 	len = vma_len + ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
-	if (!(vma->vm_flags & VM_WRITE) && len > inode->i_size)
-		goto out;
 
 	if (vma->vm_flags & VM_MAYSHARE)
 		if (hugetlb_extend_reservation(info, len >> HPAGE_SHIFT) != 0)
diff -Nraup linux-2.6.16-rc5-mm3/mm/hugetlb.c linux-2.6.16-rc5-mm3_huge_check/mm/hugetlb.c
--- linux-2.6.16-rc5-mm3/mm/hugetlb.c	2006-03-08 17:59:11.000000000 +0800
+++ linux-2.6.16-rc5-mm3_huge_check/mm/hugetlb.c	2006-03-08 18:01:21.000000000 +0800
@@ -555,6 +555,10 @@ int hugetlb_no_page(struct mm_struct *mm
 	idx = ((address - vma->vm_start) >> HPAGE_SHIFT)
 		+ (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT));
 
+	size = i_size_read(mapping->host) >> HPAGE_SHIFT;
+	if (idx >= size)
+		return ret;
+
 	/*
 	 * Use page lock to guard against racing truncation
 	 * before we get page_table_lock.
@@ -588,9 +592,6 @@ retry:
 	}
 
 	spin_lock(&mm->page_table_lock);
-	size = i_size_read(mapping->host) >> HPAGE_SHIFT;
-	if (idx >= size)
-		goto backout;
 
 	ret = VM_FAULT_MINOR;
 	if (!pte_none(*ptep))


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux