Re: [PATCH]: Handling spurious page fault for hugetlb region for 2.6.14-rc4-git5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



"Seth, Rohit" <[email protected]> wrote:
>
> Linus,
> 
> [PATCH]: Handle spurious page fault for hugetlb region
> 
> The hugetlb pages are currently pre-faulted.  At the time of mmap of
> hugepages, we populate the new PTEs.  It is possible that HW has already cached
> some of the unused PTEs internally.

What's an "unused pte"?  One which maps a regular-sized page at the same
virtual address?  How can such a thing come about, and why isn't it already
a problem for regular-sized pages?  From where does the hardware prefetch
the pte contents?

IOW: please tell us more about this hardware pte-fetcher.

>  These stale entries never get a chance to
> be purged in existing control flow.

I'd have thought that invalidating those ptes at mmap()-time would be a
more consistent approach.

> This patch extends the check in page fault code for hugepages.  Check if
> a faulted address falls with in size for the hugetlb file backing it.  We
> return VM_FAULT_MINOR for these cases (assuming that the arch specific
> page-faulting code purges the stale entry for the archs that need it).

Do you have an example of the code which does this purging?

> --- linux-2.6.14-rc4-git5-x86/include/linux/hugetlb.h	2005-10-18 13:14:24.879947360 -0700
> +++ b/include/linux/hugetlb.h	2005-10-18 13:13:55.711381656 -0700
> @@ -155,11 +155,24 @@
>  {
>  	file->f_op = &hugetlbfs_file_operations;
>  }
> +
> +static inline int valid_hugetlb_file_off(struct vm_area_struct *vma, 
> +					  unsigned long address) 
> +{
> +	struct inode *inode = vma->vm_file->f_dentry->d_inode;
> +	loff_t file_off = address - vma->vm_start;
> +	
> +	file_off += (vma->vm_pgoff << PAGE_SHIFT);
> +	
> +	return (file_off < inode->i_size);
> +}

I suppose we should use i_size_read() here.

> +		if (valid_hugetlb_file_off(vma, address))
> +			/* We get here only if there was a stale(zero) TLB entry 
> +			 * (because of  HW prefetching). 
> +			 * Low-level arch code (if needed) should have already
> +			 * purged the stale entry as part of this fault handling.  
> +			 * Here we just return.
> +			 */

If the low-level code has purged the stale pte then it knows what's
happening.  Perhaps it shouldn't call into handle_mm_fault() at all?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux