Re: RFC: Block reservation for hugetlbfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2006-02-21 at 13:21 +1100, David Gibson wrote:
> The patch below is a draft attempt to address this problem, by
> strictly reserving a number of physical hugepages for hugepages inodes
> which have been mapped, but not instatiated.  MAP_SHARED mappings are
> thus "safe" - they will fail on mmap(), not later with a SIGBUS.
> MAP_PRIVATE mappings can still SIGBUS.
...
> +	read_lock_irq(&inode->i_mapping->tree_lock);
> +	for (pg = inode->i_blocks; pg < npages; pg++) {
> +		page = radix_tree_lookup(&mapping->page_tree, pg);
> +		if (! page)
> +			change_in_reserve++;
> +	}
> +
> +	for (pg = npages; pg < inode->i_blocks; pg++) {
> +		page = radix_tree_lookup(&mapping->page_tree, pg);
> +		if (! page)
> +			change_in_reserve--;
> +	}
> +	spin_lock(&hugetlb_lock);

I'm a bit confused by this.  The for loops goes through and looks for
pages that have indexes greater than the current i_blocks, but less than
the number of pages requested by this reservation.  With demand
faulting, can there be holes in the file?  Will this code account for
them?

I think this would also be a bit more clear if the two for loops were a
bit more explicit in what they are doing.  The first is growing the
reservation when "npages > inode->i_blocks" and the second is shrinking
the reservation when "npages < inode->i_blocks", right?  Is this
completely clear from the code? ;)

Also, since the operation you are performing is actually counting pages
in the radix tree at certain indexes, would it be sane to have a patch
such as the (completely untested) attached one to do just that, but in
the radix code?

-- Dave


---

 memhotplug-dave/lib/radix-tree.c |   19 +++++++++++++++++--
 1 files changed, 17 insertions(+), 2 deletions(-)

diff -puN lib/radix-tree.c~radix-tree-count lib/radix-tree.c
--- memhotplug/lib/radix-tree.c~radix-tree-count	2006-02-21 10:50:25.000000000 -0800
+++ memhotplug-dave/lib/radix-tree.c	2006-02-21 10:51:00.000000000 -0800
@@ -538,7 +538,9 @@ __lookup(struct radix_tree_root *root, v
 	for (i = index & RADIX_TREE_MAP_MASK; i < RADIX_TREE_MAP_SIZE; i++) {
 		index++;
 		if (slot->slots[i]) {
-			results[nr_found++] = slot->slots[i];
+			if (results)
+				results[nr_found] = slot->slots[i];
+			nr_found++;
 			if (nr_found == max_items)
 				goto out;
 		}
@@ -559,6 +561,9 @@ out:
  *	them at *@results and returns the number of items which were placed at
  *	*@results.
  *
+ *	If *@results is NULL, the function will return a count of pages found,
+ *	without any actual results.
+ *
  *	The implementation is naive.
  */
 unsigned int
@@ -572,10 +577,13 @@ radix_tree_gang_lookup(struct radix_tree
 	while (ret < max_items) {
 		unsigned int nr_found;
 		unsigned long next_index;	/* Index of next search */
+		void *cur_result = NULL;
 
 		if (cur_index > max_index)
 			break;
-		nr_found = __lookup(root, results + ret, cur_index,
+		if (results)
+			cur_result = results + ret;
+		nr_found = __lookup(root, cur_result, cur_index,
 					max_items - ret, &next_index);
 		ret += nr_found;
 		if (next_index == 0)
@@ -586,6 +594,13 @@ radix_tree_gang_lookup(struct radix_tree
 }
 EXPORT_SYMBOL(radix_tree_gang_lookup);
 
+unsigned int
+radix_tree_lookup_count(struct radix_tree_root *root
+			unsigned long first_index)
+{
+	return radix_tree_gang_lookup(root, NULL, first_index, ~0);
+}
+
 /*
  * FIXME: the two tag_get()s here should use find_next_bit() instead of
  * open-coding the search.
_

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux