Re: Generic B-tree implementation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Folks

Following Andrea's suggestions I have implemented the Linux kernel
page cache using B-Trees. I am attaching the patch along with this
email. Note that the patch was developed against 2.6.16.2. Also the
patch offers the B-tree data structure as a library (like the radix
tree)

Also I haven't made any performance measurements to compare it with
the radix tree implementation. However ideas to do this are most
welcome.

Thanks

- Vishal

On 7/19/06, Vishal Patil <[email protected]> wrote:
Andrea

Thank you for your time and a very valuable input. I was thinking of
implementing the VM management using B-trees because I wanted to play
with something interesting in the kernel. However I will definately
look into your idea of page cache as well.

Will keep everyone informed about my progress.

- Vishal


On 7/19/06, Andrea Arcangeli <[email protected]> wrote:
> On Wed, Jul 19, 2006 at 09:34:43AM -0400, Vishal Patil wrote:
> > I can get rid of recursions using loops, will need to work a little more on
> > it.
>
> Before doing the above you may want to learn about all possible malloc
> retvals too and to make sure the interface has all needed oom failure
> paths that you're obviously missing.
>
> One of the advantages of rbtree vs b-trees (and vs radixtrees too) is
> the fact they require zero dynamic metadata allocations of ram. They
> use the same trick of list.h to avoid it while still being mostly
> generic and sharable library code. Imagine rbtrees like scalable
> lists. The kernel usage is quite optimized too, the mmap path for
> example does a single lookup and it stores the last "lookup" point
> before restarting with an insertion while keeping the mmap_sem (or
> mutex renaming of the day) on hold so to avoid the insertion operation
> to start over with a second (wasteful) lookup (again very similar to
> what you could do if you had list, and the rebalancing is a very
> immediate operation too involving only a limited number of pointers).
>
> > Also I will be working on developing a patch for VM management using
> > B-trees instead of RB-trees.
>
> Once you start changing those bits, you'll notice the further
> requirement of the btrees due to the oom failures in code paths that
> are already reasonably complex with vma oom failures.
>
> As speed of cache raises faster than speed of ram, memory seeks tends
> to cost more than they did in the past, but I doubt it worth it, most
> important especially in the common case of very few vmas. I like the
> common case of only a few dozen vmas to be so fast and low
> overhead. The corner cases like uml and oracle already use nonlinear,
> to also avoid the ram overhead of the vmas, with btree the lowmem
> overhead would be even higher (the only 4/8 bytes of overhead of the
> rbtrees would even be fixable with David's patch, but nobody
> considered it very important so far to eliminate those 4/8 bytes
> 32bit/64bit per vma, though we can do that in the future). So even if
> btree would be faster for those extreme corner cases, it would still
> not be a replacement for the nonlinear (I wish there was a decent
> replacement for nonlinear, whose only reason to exist seems to be uml
> on 64bit archs).
>
> If I would be in you, as a slightly more likely to succeed experiment,
> I would be looking into replacing the pagecache radix-tree with a
> btree, as long as you can leave intact the tagging properties we have
> in the radix-tree needed for finding only dirty elements in the tree
> etc... (we use that to avoid separate dirty lists for the pages). You
> should also size the order to automatically match the cache size of
> the arch (dunno if it's better at compile or run time). I'm no a
> radix-tree guru but the btree may save some ram if you've all
> pagecache pages scattered all over the place with random access. It
> also won't require all levels to be allocated. However it will require
> rebalancing, something the radix tree doesn't require, it seems a bit
> of a tradeoff, and I suspect the radix-tree will still win in all
> common cases. But at least all oom failure paths should already exists
> for you, so that should avoid you having to touch much code externally
> to your own btree files.
>
> I wish you to have fun with the btrees, I remember I had fun back then
> when I was playing with the rbtrees ;).
>


--
Motivation will almost always beat mere talent.



--
Motivation will almost always beat mere talent.
diff -urN linux-2.6.16.2/mm/filemap.c linux-2.6.16.2-btree/mm/filemap.c
--- linux-2.6.16.2/mm/filemap.c	2006-04-07 12:56:47.000000000 -0400
+++ linux-2.6.16.2-btree/mm/filemap.c	2006-08-06 21:37:55.000000000 -0400
@@ -112,10 +112,16 @@
  */
 void __remove_from_page_cache(struct page *page)
 {
+	struct page * bpage;
 	struct address_space *mapping = page->mapping;
+	unsigned long index = page->index;	
 
-	radix_tree_delete(&mapping->page_tree, page->index);
-	page->mapping = NULL;
+	bpage = btree_delete(&mapping->btree,mapping->btree.root,index).val;
+	if ( bpage != page) {
+		panic("DEBUG: Deleting wrong page in remove from page" 
+			"cache %d,0x%x\n",index,bpage);
+	} 
+     	page->mapping = NULL;
 	mapping->nrpages--;
 	pagecache_acct(-1);
 }
@@ -395,12 +401,14 @@
 int add_to_page_cache(struct page *page, struct address_space *mapping,
 		pgoff_t offset, gfp_t gfp_mask)
 {
-	int error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM);
-
-	if (error == 0) {
+        int btree_error = btree_preload(gfp_mask & ~__GFP_HIGHMEM);
+        
+	if (btree_error == 0) {
 		write_lock_irq(&mapping->tree_lock);
-		error = radix_tree_insert(&mapping->page_tree, offset, page);
-		if (!error) {
+                btree_error = 
+			btree_insert(&mapping->btree,offset,page);
+                
+		if (!btree_error) {
 			page_cache_get(page);
 			SetPageLocked(page);
 			page->mapping = mapping;
@@ -409,9 +417,11 @@
 			pagecache_acct(1);
 		}
 		write_unlock_irq(&mapping->tree_lock);
-		radix_tree_preload_end();
+                if(btree_error == 0) {
+                        btree_preload_end();
+                }                        
 	}
-	return error;
+	return btree_error;
 }
 
 EXPORT_SYMBOL(add_to_page_cache);
@@ -522,7 +532,7 @@
 	struct page *page;
 
 	read_lock_irq(&mapping->tree_lock);
-	page = radix_tree_lookup(&mapping->page_tree, offset);
+	page = btree_lookup(&mapping->btree,offset);
 	if (page)
 		page_cache_get(page);
 	read_unlock_irq(&mapping->tree_lock);
@@ -539,7 +549,7 @@
 	struct page *page;
 
 	read_lock_irq(&mapping->tree_lock);
-	page = radix_tree_lookup(&mapping->page_tree, offset);
+	page = btree_lookup(&mapping->btree,offset);
 	if (page && TestSetPageLocked(page))
 		page = NULL;
 	read_unlock_irq(&mapping->tree_lock);
@@ -563,10 +573,9 @@
 				unsigned long offset)
 {
 	struct page *page;
-
 	read_lock_irq(&mapping->tree_lock);
 repeat:
-	page = radix_tree_lookup(&mapping->page_tree, offset);
+	page = btree_lookup(&mapping->btree,offset);
 	if (page) {
 		page_cache_get(page);
 		if (TestSetPageLocked(page)) {
@@ -658,8 +667,10 @@
 	unsigned int ret;
 
 	read_lock_irq(&mapping->tree_lock);
-	ret = radix_tree_gang_lookup(&mapping->page_tree,
-				(void **)pages, start, nr_pages);
+
+	ret = btree_gang_lookup(&mapping->btree,(void **)pages,
+				start,nr_pages);
+
 	for (i = 0; i < ret; i++)
 		page_cache_get(pages[i]);
 	read_unlock_irq(&mapping->tree_lock);
@@ -677,8 +688,9 @@
 	unsigned int ret;
 
 	read_lock_irq(&mapping->tree_lock);
-	ret = radix_tree_gang_lookup_tag(&mapping->page_tree,
+	ret = btree_gang_lookup_tag(&mapping->btree,
 				(void **)pages, *index, nr_pages, tag);
+		
 	for (i = 0; i < ret; i++)
 		page_cache_get(pages[i]);
 	if (ret)
diff -urN linux-2.6.16.2/mm/page-writeback.c linux-2.6.16.2-btree/mm/page-writeback.c
--- linux-2.6.16.2/mm/page-writeback.c	2006-04-07 12:56:47.000000000 -0400
+++ linux-2.6.16.2-btree/mm/page-writeback.c	2006-08-06 22:08:50.000000000 -0400
@@ -634,8 +634,15 @@
 				BUG_ON(mapping2 != mapping);
 				if (mapping_cap_account_dirty(mapping))
 					inc_page_state(nr_dirty);
-				radix_tree_tag_set(&mapping->page_tree,
-					page_index(page), PAGECACHE_TAG_DIRTY);
+				btree_tag_set(&mapping->btree,
+				page_index(page), PAGECACHE_TAG_DIRTY);
+
+				if (btree_tag_set(&mapping->btree,
+				page_index(page), PAGECACHE_TAG_DIRTY) !=0 ) {
+					panic("Problem setting the tag\n");
+				}					
+
+				
 			}
 			write_unlock_irq(&mapping->tree_lock);
 			if (mapping->host) {
@@ -714,9 +721,16 @@
 	if (mapping) {
 		write_lock_irqsave(&mapping->tree_lock, flags);
 		if (TestClearPageDirty(page)) {
-			radix_tree_tag_clear(&mapping->page_tree,
+			btree_tag_clear(&mapping->btree,
 						page_index(page),
 						PAGECACHE_TAG_DIRTY);
+
+			if (btree_tag_get(&mapping->btree,
+					page_index(page),
+					PAGECACHE_TAG_DIRTY)  == 1) {
+				panic("Problem clearing the tag\n");
+			}				
+
 			write_unlock_irqrestore(&mapping->tree_lock, flags);
 			if (mapping_cap_account_dirty(mapping))
 				dec_page_state(nr_dirty);
@@ -769,10 +783,17 @@
 
 		write_lock_irqsave(&mapping->tree_lock, flags);
 		ret = TestClearPageWriteback(page);
-		if (ret)
-			radix_tree_tag_clear(&mapping->page_tree,
+		if (ret) {
+			btree_tag_clear(&mapping->btree,
 						page_index(page),
 						PAGECACHE_TAG_WRITEBACK);
+			if (btree_tag_get(&mapping->btree,
+				page_index(page),
+				PAGECACHE_TAG_WRITEBACK) == 1) {
+				panic("Problem clearing the tag\n");
+			}				
+
+		}			
 		write_unlock_irqrestore(&mapping->tree_lock, flags);
 	} else {
 		ret = TestClearPageWriteback(page);
@@ -790,14 +811,28 @@
 
 		write_lock_irqsave(&mapping->tree_lock, flags);
 		ret = TestSetPageWriteback(page);
-		if (!ret)
-			radix_tree_tag_set(&mapping->page_tree,
+		if (!ret) {
+			btree_tag_set(&mapping->btree,
 						page_index(page),
 						PAGECACHE_TAG_WRITEBACK);
-		if (!PageDirty(page))
-			radix_tree_tag_clear(&mapping->page_tree,
+			if (btree_tag_get(&mapping->btree,
+				page_index(page),
+				PAGECACHE_TAG_WRITEBACK) != 1) {
+				panic("Problem setting the tag\n");
+			}				
+
+		}			
+		if (!PageDirty(page)) {
+			btree_tag_clear(&mapping->btree,
 						page_index(page),
 						PAGECACHE_TAG_DIRTY);
+			if (btree_tag_get(&mapping->btree,
+				page_index(page),
+				PAGECACHE_TAG_DIRTY) == 1) {
+				panic("Problem clearing the tag");
+			}				
+
+		}			
 		write_unlock_irqrestore(&mapping->tree_lock, flags);
 	} else {
 		ret = TestSetPageWriteback(page);
@@ -817,7 +852,7 @@
 	int ret;
 
 	read_lock_irqsave(&mapping->tree_lock, flags);
-	ret = radix_tree_tagged(&mapping->page_tree, tag);
+	ret = btree_tagged(&mapping->btree,tag);
 	read_unlock_irqrestore(&mapping->tree_lock, flags);
 	return ret;
 }
diff -urN linux-2.6.16.2/mm/readahead.c linux-2.6.16.2-btree/mm/readahead.c
--- linux-2.6.16.2/mm/readahead.c	2006-04-07 12:56:47.000000000 -0400
+++ linux-2.6.16.2-btree/mm/readahead.c	2006-08-06 22:04:38.000000000 -0400
@@ -282,7 +282,8 @@
 		if (page_offset > end_index)
 			break;
 
-		page = radix_tree_lookup(&mapping->page_tree, page_offset);
+		page = btree_lookup(&mapping->btree,page_offset);
+	
 		if (page)
 			continue;
 
diff -urN linux-2.6.16.2/mm/swap_state.c linux-2.6.16.2-btree/mm/swap_state.c
--- linux-2.6.16.2/mm/swap_state.c	2006-04-07 12:56:47.000000000 -0400
+++ linux-2.6.16.2-btree/mm/swap_state.c	2006-08-06 21:38:25.000000000 -0400
@@ -37,6 +37,7 @@
 
 struct address_space swapper_space = {
 	.page_tree	= RADIX_TREE_INIT(GFP_ATOMIC|__GFP_NOWARN),
+        .btree          = BTREE_INIT(GFP_ATOMIC|__GFP_NOWARN), 
 	.tree_lock	= RW_LOCK_UNLOCKED,
 	.a_ops		= &swap_aops,
 	.i_mmap_nonlinear = LIST_HEAD_INIT(swapper_space.i_mmap_nonlinear),
@@ -71,16 +72,21 @@
 static int __add_to_swap_cache(struct page *page, swp_entry_t entry,
 			       gfp_t gfp_mask)
 {
-	int error;
+	int btree_error;
 
 	BUG_ON(PageSwapCache(page));
 	BUG_ON(PagePrivate(page));
-	error = radix_tree_preload(gfp_mask);
-	if (!error) {
+
+        btree_error = btree_preload(gfp_mask);
+	
+	if (!btree_error) {
 		write_lock_irq(&swapper_space.tree_lock);
-		error = radix_tree_insert(&swapper_space.page_tree,
+                if (!btree_error) {
+                       btree_error = btree_insert(&swapper_space.btree,
 						entry.val, page);
-		if (!error) {
+                }
+                                               
+		if (!btree_error) {
 			page_cache_get(page);
 			SetPageLocked(page);
 			SetPageSwapCache(page);
@@ -89,9 +95,12 @@
 			pagecache_acct(1);
 		}
 		write_unlock_irq(&swapper_space.tree_lock);
-		radix_tree_preload_end();
+
+                if (!btree_error) {
+                        btree_preload_end();
+                }
 	}
-	return error;
+	return btree_error;
 }
 
 static int add_to_swap_cache(struct page *page, swp_entry_t entry)
@@ -127,7 +136,10 @@
 	BUG_ON(PageWriteback(page));
 	BUG_ON(PagePrivate(page));
 
-	radix_tree_delete(&swapper_space.page_tree, page_private(page));
+        if (btree_delete(&swapper_space.btree,swapper_space.btree.root,
+                                page_private(page)).val != page) {
+		panic("DEBUG: Deleting wrong page from swap cache\n");			
+	}
 	set_page_private(page, 0);
 	ClearPageSwapCache(page);
 	total_swapcache_pages--;
diff -urN linux-2.6.16.2/mm/vmscan.c linux-2.6.16.2-btree/mm/vmscan.c
--- linux-2.6.16.2/mm/vmscan.c	2006-04-07 12:56:47.000000000 -0400
+++ linux-2.6.16.2-btree/mm/vmscan.c	2006-08-06 15:08:34.000000000 -0400
@@ -692,7 +692,7 @@
 				struct page *page, int nr_refs)
 {
 	struct address_space *mapping = page_mapping(page);
-	struct page **radix_pointer;
+	struct page * btree_pointer;
 
 	/*
 	 * Avoid doing any of the following work if the page count
@@ -733,12 +733,12 @@
 
 	write_lock_irq(&mapping->tree_lock);
 
-	radix_pointer = (struct page **)radix_tree_lookup_slot(
-						&mapping->page_tree,
+	btree_pointer = (struct page *)btree_lookup_slot(
+						&mapping->btree,
 						page_index(page));
 
 	if (!page_mapping(page) || page_count(page) != nr_refs ||
-			*radix_pointer != page) {
+			btree_pointer != page) {
 		write_unlock_irq(&mapping->tree_lock);
 		return -EAGAIN;
 	}
@@ -759,7 +759,7 @@
 		set_page_private(newpage, page_private(page));
 	}
 
-	*radix_pointer = newpage;
+	btree_pointer = newpage;
 	__put_page(page);
 	write_unlock_irq(&mapping->tree_lock);
 
diff -urN linux-2.6.16.2/lib/btree.c linux-2.6.16.2-btree/lib/btree.c
--- linux-2.6.16.2/lib/btree.c	1969-12-31 19:00:00.000000000 -0500
+++ linux-2.6.16.2-btree/lib/btree.c	2006-08-06 21:57:32.000000000 -0400
@@ -0,0 +1,1273 @@
+/*
+ *  btree.c
+ *
+ *  Copyright (C) 2006 Vishal Patil (vishpat AT gmail DOT com) 
+ *
+*/
+
+#include <linux/btree.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/radix-tree.h>
+#include <linux/percpu.h>
+#include <linux/slab.h>
+#include <linux/notifier.h>
+#include <linux/cpu.h>  
+#include <linux/gfp.h>
+#include <linux/string.h>
+#include <linux/bitops.h>
+#include <linux/mm.h>
+
+typedef enum {left = -1,right = 1} position_t;
+
+typedef struct {
+	btree_node * node;
+	unsigned int index;
+}node_pos;
+
+struct btree_node {
+	struct btree_node * next;		        // Pointer used for linked list 
+	bool leaf;			        // Used to indicate whether leaf or not
+        unsigned int nr_active;		        // Number of active keys
+	unsigned int level;		        // Level in the B-Tree
+        bt_key_val key_vals[2*BTREE_ORDER - 1];	// Array of keys and values
+        struct btree_node * children[2*BTREE_ORDER];	// Array of pointers to child nodes
+        unsigned long tags[BTREE_TAGS][BTREE_TAG_LONGS];
+                                                // Bitmap required by page
+                                                // cache
+	unsigned long padding[6];
+} ;
+
+/*
+*       B-Tree node and key val cache
+*/
+static kmem_cache_t *btree_node_cachep;
+
+/*
+ * Per-cpu pool of preloaded nodes
+ */
+struct btree_preload {
+        int nr;
+        struct btree_node *nodes[BTREE_MAX_PATH];
+};
+DEFINE_PER_CPU(struct btree_preload, btree_preloads) = { 0, };
+
+static btree_node * allocate_btree_node (btree * btree,unsigned int order);
+static node_pos get_btree_node(btree * btree,unsigned long key);
+static bt_key_val remove_key_from_leaf(btree * btree, node_pos * node_pos);
+static btree_node * merge_nodes(btree * btree, btree_node * n1, bt_key_val kv ,
+        btree_node * n2);
+static void move_key(btree * btree, btree_node * node, unsigned int index, 
+                                        position_t pos);
+static node_pos get_max_key_pos(btree * btree, btree_node * subtree);
+static node_pos get_min_key_pos(btree * btree, btree_node * subtree);
+static btree_node * merge_siblings(btree * btree, btree_node * parent,
+                      unsigned int index,position_t pos);
+static unsigned int __lookup(btree * btree, void ** results,
+        unsigned long first_index,unsigned int max_items, int tag);
+static void print_single_node(btree *btree, btree_node * node); 
+static void transfer_tags(btree_node * source_node, int source_index,
+                btree_node * dest_node,int dest_index);
+static void clear_tags(btree_node * node, int index);
+
+static void
+btree_node_ctor(void *node, kmem_cache_t *cachep, unsigned long flags) {
+        memset(node, 0, sizeof(struct btree_node));
+}
+
+void __init btree_init(void) {
+        btree_node_cachep = kmem_cache_create("btree_node",
+                sizeof(btree_node), 0,
+                SLAB_PANIC, btree_node_ctor, NULL);
+}
+
+unsigned long btree_key_fn (void * page) {
+        return (((struct page*)page)->index);
+}
+
+unsigned long btree_value_fn (unsigned long key) {
+        return *((unsigned long*)key);
+}
+
+void btree_print_fn (unsigned long key) {
+        printk("%lu ", *((unsigned long*)key));
+}
+
+static inline void tag_set(struct btree_node *node, int tag, int offset)
+{
+        __set_bit(offset, node->tags[tag]);
+}
+
+static inline void tag_clear(struct btree_node *node, int tag, int offset)
+{
+        __clear_bit(offset, node->tags[tag]);
+}
+
+static inline int tag_get(struct btree_node *node, int tag, int offset)
+{
+        return test_bit(offset, node->tags[tag]);
+}
+
+/*
+ * Load up this CPU's btree_node buffer with sufficient objects to
+ * ensure that the addition of a single element in the tree cannot fail.  On
+ * success, return zero, with preemption disabled.  On error, return -ENOMEM
+ * with preemption not disabled.
+ */
+int btree_preload(gfp_t gfp_mask) {
+        struct btree_preload *btp;
+        struct btree_node *node;
+        int ret = -ENOMEM;
+
+        preempt_disable();
+        btp = &__get_cpu_var(btree_preloads);
+        while (btp->nr < ARRAY_SIZE(btp->nodes)) {
+                preempt_enable();
+                node = kmem_cache_alloc(btree_node_cachep, gfp_mask);
+                if (node == NULL)
+                        goto out;
+                preempt_disable();
+                btp = &__get_cpu_var(btree_preloads);
+                if (btp->nr < ARRAY_SIZE(btp->nodes))
+                        btp->nodes[btp->nr++] = node;
+                else
+                        kmem_cache_free(btree_node_cachep, node);
+        }
+        ret = 0;
+out:
+        return ret;
+}
+
+/**
+*       Function used to allocate memory for the btree node
+*       @param btree The btree
+*       @param order Order of the B-Tree
+*       @return The allocated B-tree node
+*/
+static btree_node * allocate_btree_node (btree * btree,unsigned int order) {
+        btree_node * node;
+
+        // Allocate memory for the node
+        node = kmem_cache_alloc(btree_node_cachep,btree->gfp_mask);
+
+        // If this is NULL get preloaded node
+        if (node  == NULL && !(btree->gfp_mask & __GFP_WAIT)) {
+                struct btree_preload *btp;
+
+                btp = &__get_cpu_var(btree_preloads);
+                if (btp->nr) {
+                        node = btp->nodes[btp->nr - 1];
+                        btp->nodes[btp->nr - 1] = NULL;
+                        btp->nr--;
+                }
+        }
+       
+        // Initialize the number of active nodes
+        node->nr_active = 0;
+
+	// Use to determine whether it is a leaf
+	node->leaf = true;
+
+	// Use to determine the level in the tree
+	node->level = 0;
+
+	//Initialize the linked list pointer to NULL
+	node->next = NULL;
+
+        return node;
+}
+
+/**
+*       Function used to free the memory allocated to the b-tree 
+*       @param node The node to be freed
+*       @param order Order of the B-Tree
+*       @return The allocated B-tree node
+*/
+static int free_btree_node (btree_node * node) {
+        kmem_cache_free(btree_node_cachep,node);
+        return 0;
+}
+
+/**
+*	Used to split the child node and adjust the parent so that
+*	it has two children
+*	@param parent Parent Node
+*	@param index Index of the child node
+*	@param child  Full child node
+*	
+*/
+static void btree_split_child(btree * btree, btree_node * parent, 
+				unsigned int index,
+				btree_node * child) {
+	int i = 0;	
+	unsigned int order = btree->order;
+
+	btree_node * new_child = allocate_btree_node(btree,btree->order); 
+	new_child->leaf = child->leaf;
+	new_child->level = child->level;
+	new_child->nr_active = btree->order - 1;
+
+	// Copy the higher order keys to the new child	
+	for(i=0;i<order - 1;i++) {
+		new_child->key_vals[i] = child->key_vals[i + order];
+                transfer_tags(child,i + order,new_child,i); 
+                
+                if(!child->leaf) {
+			new_child->children[i] = 
+				child->children[i + order];
+		} 
+	}
+	
+	// Copy the last child pointer
+	if(!child->leaf) {
+		new_child->children[i] = 
+		child->children[i + order];
+	}
+
+	child->nr_active = order - 1;
+	
+	for(i = parent->nr_active + 1;i > index + 1;i--) {
+		parent->children[i] = parent->children[i - 1];
+	}
+	parent->children[index + 1] = new_child;
+
+	for(i = parent->nr_active;i > index;i--) {
+		parent->key_vals[i] = parent->key_vals[i - 1];
+                transfer_tags(parent,i - 1,parent,i);                
+	}
+
+	parent->key_vals[index] = child->key_vals[order - 1];	
+        transfer_tags(child,order - 1,parent,index);
+	parent->nr_active++;
+
+}
+
+/**
+*	Used to insert a key in the non-full node
+*	@param btree The btree
+*	@param node The node to which the key will be added
+*	@param the key value pair
+*	@return void
+*/
+
+static int btree_insert_nonfull (btree * btree, btree_node * parent_node,
+				bt_key_val * key_val) {
+	
+	unsigned long key = key_val->key;
+	int i,j;
+	btree_node * child;	
+	btree_node * node = parent_node;
+	
+insert:	i = node->nr_active - 1;
+	if(node->leaf) {
+		while(i >= 0 && key < node->key_vals[i].key) {
+			node->key_vals[i + 1] = node->key_vals[i];
+                        transfer_tags(node,i,node,i + 1);
+			i--;
+		}
+
+		node->key_vals[i + 1].key = key;
+                node->key_vals[i + 1].val = key_val->val;
+
+		// Identical key found
+          	if (i >= 0 && 
+			key == node->key_vals[i].key) {
+                        for (j = i + 1; j <node->nr_active; j++ ) {
+                                node->key_vals[j] =
+                                        node->key_vals[j + 1];
+				transfer_tags(node,j,node,j + 1);
+                        }
+			return -EEXIST;
+                } else
+		node->nr_active++; 
+	} else {
+		while (i >= 0 && key < node->key_vals[i].key) {
+			i--;
+		}
+
+                // Identical key found  
+                if (i >= 0 && key == node->key_vals[i].key) {
+                        return -EEXIST;
+                }
+
+		i++;
+		child = node->children[i]; 
+		
+		if(child->nr_active == 2*btree->order - 1) {
+
+ 			// Check for identical key here
+                        for (j=0;j<child->nr_active;j++) {
+                                if (key == child->key_vals[j].key) {
+                                        return -EEXIST;
+                                }
+
+                                if (key < child->key_vals[j].key) {
+                                        break;
+                                }
+                        }
+
+			btree_split_child(btree,node,i,child);
+			if(key > 
+				node->key_vals[i].key) {
+				i++;	
+			}	
+		}
+		node = node->children[i];
+		goto insert;	
+	}
+       return 0;	
+}
+
+
+/**
+*       Function used to insert node into a B-Tree
+*       @param root Root of the B-Tree
+*       @param node The node to be inserted
+*       @param compare Function used to compare the two nodes of the tree
+*       @return success or failure
+*/
+int btree_insert(btree * btree, unsigned long key, void * val) {
+	btree_node * rnode = NULL;
+        bt_key_val key_val;
+        int ret = -1;	
+       	
+        key_val.key = key;
+        key_val.val = val;
+
+        // During the B-tree initialization
+        if(unlikely(btree->root == NULL)) {
+                btree->root = allocate_btree_node(btree,btree->order);
+        }
+        rnode = btree->root;
+
+	if(rnode->nr_active == (2*btree->order - 1)) {
+		btree_node * new_root;
+		new_root = allocate_btree_node(btree,btree->order);
+		new_root->level = btree->root->level + 1;
+		btree->root = new_root;	
+		new_root->leaf = false;
+		new_root->nr_active = 0;
+		new_root->children[0]  = rnode;
+		btree_split_child(btree,new_root,0,rnode);
+		ret = btree_insert_nonfull(btree,new_root,&key_val);	
+	} else
+		ret = btree_insert_nonfull(btree,rnode,&key_val);		
+        return ret;
+}
+EXPORT_SYMBOL(btree_insert);        
+
+/**
+*	Used to get the position of the MAX key within the subtree
+*	@param btree The btree
+*	@param subtree The subtree to be searched
+*	@return The node containing the key and position of the key
+*/
+static node_pos get_max_key_pos(btree * btree, btree_node * subtree) {
+	node_pos node_pos;
+	btree_node * node = subtree; 
+	
+	node_pos.node = NULL;
+	node_pos.index = 0;
+
+	while(true) {
+		if(node == NULL) {
+			break;
+		}
+
+		if(node->leaf) {
+			node_pos.node 	= node;
+			node_pos.index 	= node->nr_active - 1;
+			return node_pos;	
+		} else {
+			node_pos.node 	= node;
+			node_pos.index 	= node->nr_active - 1;
+			node = node->children[node->nr_active];	
+		}
+	}
+	return node_pos;
+}
+
+/**
+*	Used to get the position of the MAX key within the subtree
+*	@param btree The btree
+*	@param subtree The subtree to be searched
+*	@return The node containing the key and position of the key
+*/
+static node_pos get_min_key_pos(btree * btree, btree_node * subtree) {
+	node_pos node_pos;
+	btree_node * node = subtree; 
+
+	node_pos.node = NULL;
+	node_pos.index = 0;
+
+	while(true) {
+		if(node == NULL) {
+			break;
+		}
+
+		if(node->leaf) {
+			node_pos.node 	= node;
+			node_pos.index 	= 0;
+			return node_pos;	
+		} else {
+			node_pos.node 	= node;
+			node_pos.index 	= 0;
+			node = node->children[0];	
+		}
+	}
+	return node_pos;
+}
+
+/**
+*	Merge nodes n1 and n2 (case 3b from Cormen)
+*	@param btree The btree 
+*	@param node The parent node 
+*	@param index of the child
+*	@param pos left or right
+*	@return none 
+*/
+static btree_node * merge_siblings(btree * btree, btree_node * parent, unsigned int index , 
+					position_t pos) {
+	unsigned int j;
+	btree_node * new_node;
+	btree_node * n1, * n2;
+        
+        if (index == (parent->nr_active)) {   
+               index--;
+	       n1 = parent->children[parent->nr_active - 1];
+	       n2 = parent->children[parent->nr_active];
+        } else {
+               n1 = parent->children[index];
+	       n2 = parent->children[index + 1];
+        }
+
+	//Merge the current node with the left node
+	new_node = allocate_btree_node(btree,btree->order);
+	new_node->level = n1->level;
+	new_node->leaf = n1->leaf;
+
+	for(j=0;j<btree->order - 1; j++) {
+		new_node->key_vals[j] =	n1->key_vals[j];
+                transfer_tags(n1,j,new_node,j);
+		new_node->children[j] =	n1->children[j];
+	}
+	
+	new_node->key_vals[btree->order - 1] =	parent->key_vals[index];
+        transfer_tags(parent,index,new_node,btree->order - 1);
+	new_node->children[btree->order - 1] =	n1->children[btree->order - 1];
+
+	for(j=0;j<btree->order - 1; j++) {
+		new_node->key_vals[j + btree->order] = 	n2->key_vals[j];
+                transfer_tags(n2,j,new_node,j + btree->order);
+		new_node->children[j + btree->order] = 	n2->children[j];
+	}
+	new_node->children[2*btree->order - 1] = n2->children[btree->order - 1];
+
+	parent->children[index] = new_node;
+
+	for(j = index;j<parent->nr_active;j++) {
+		parent->key_vals[j] = parent->key_vals[j + 1];
+                transfer_tags(parent,j + 1,parent,j);
+		parent->children[j + 1] = parent->children[j + 2];
+	}
+
+	new_node->nr_active = n1->nr_active + n2->nr_active + 1;
+	parent->nr_active--;
+
+	free_btree_node(n1);
+	free_btree_node(n2);
+
+	if (parent->nr_active == 0 && btree->root == parent) {
+		free_btree_node(parent);
+		btree->root = new_node;
+		if(new_node->level)
+			new_node->leaf = false;
+		else
+			new_node->leaf = true; 
+	}
+
+	return new_node;
+}
+
+/**
+*	Move the key from node to another	
+*	@param btree The B-Tree
+*	@param node The parent node
+*	@param index of the key to be moved done 
+*	@param pos the position of the child to receive the key 
+*	@return none
+*/
+static void move_key(btree * btree, btree_node * node, unsigned int index, position_t pos) {
+	btree_node * lchild;
+	btree_node * rchild;
+	unsigned int i;
+
+	if(pos == right) {
+		index--;
+	}
+	lchild = node->children[index];
+	rchild = node->children[index + 1];
+	
+	// Move the key from the parent to the left child
+	if(pos == left) {
+		lchild->key_vals[lchild->nr_active] = node->key_vals[index];
+                transfer_tags(node,index,lchild,lchild->nr_active);
+
+		lchild->children[lchild->nr_active + 1] = rchild->children[0];
+		rchild->children[0] = NULL;
+		lchild->nr_active++;
+
+		node->key_vals[index] = rchild->key_vals[0];
+                transfer_tags(rchild,0,node,index);
+                
+		for(i=0;i<rchild->nr_active - 1;i++) {
+			rchild->key_vals[i] = rchild->key_vals[i + 1];
+                        transfer_tags(rchild,i + 1,rchild,i);
+			rchild->children[i] = rchild->children[i + 1];
+		}
+		rchild->children[rchild->nr_active - 1] = 
+				rchild->children[rchild->nr_active];
+		rchild->nr_active--;
+	} else {
+		// Move the key from the parent to the right child
+		for(i=rchild->nr_active;i > 0 ; i--) {
+			rchild->key_vals[i] = rchild->key_vals[i - 1];
+                        transfer_tags(rchild,i - 1,rchild,i);
+			rchild->children[i + 1] = rchild->children[i];		
+		}
+		rchild->children[1] = rchild->children[0];		
+		rchild->children[0] = NULL;
+
+		rchild->key_vals[0] = node->key_vals[index];
+                transfer_tags(node,index,rchild,0);
+
+		rchild->children[0] = lchild->children[lchild->nr_active];	
+		lchild->children[lchild->nr_active] = NULL;
+
+		node->key_vals[index] = lchild->key_vals[lchild->nr_active - 1];
+		transfer_tags(lchild,lchild->nr_active - 1,node,index);
+                
+		lchild->nr_active--;
+		rchild->nr_active++;		
+	}				
+}
+
+/**
+*	Merge nodes n1 and n2
+*	@param n1 First node
+*	@param n2 Second node
+*	@return combined node
+*/
+static btree_node * merge_nodes(btree * btree, btree_node * n1, bt_key_val kv,
+                                                btree_node * n2) {
+	btree_node * new_node;
+	unsigned int i;	
+
+	new_node = allocate_btree_node(btree,btree->order);
+	new_node->leaf = true;
+	
+	for(i=0;i<n1->nr_active;i++) {
+		new_node->key_vals[i]   = n1->key_vals[i];
+                transfer_tags(n1,i,new_node,i);
+                new_node->children[i]   = n1->children[i];
+	}
+        new_node->children[n1->nr_active] = n1->children[n1->nr_active];
+        new_node->key_vals[n1->nr_active] = kv;
+
+	for(i=0;i<n2->nr_active;i++) {
+		new_node->key_vals[i + n1->nr_active + 1] = n2->key_vals[i];
+                transfer_tags(n2,i,new_node,i + n1->nr_active + 1);
+                new_node->children[i + n1->nr_active + 1] = n2->children[i];
+	}
+        new_node->children[2*btree->order - 1] = n2->children[n2->nr_active];
+	
+        new_node->nr_active = n1->nr_active + n2->nr_active + 1;
+        new_node->leaf = n1->leaf;
+        new_node->level = n1->level;
+
+	free_btree_node(n1);
+	free_btree_node(n2);
+
+	return new_node;
+}
+
+/**
+*	Used to remove a key from the B-tree node
+*	@param btree The btree
+*	@param node The node from which the key is to be removed 	
+*	@param key The key to be removed
+*	@return 0 on success -1 on error 
+*/
+
+bt_key_val remove_key_from_leaf(btree * btree, node_pos * node_pos) {
+	unsigned int keys_max = 2*btree->order - 1;
+	unsigned int i;
+	bt_key_val key_val;
+	btree_node * node = node_pos->node;
+
+        key_val.key = -1;
+        key_val.val = NULL;
+
+	if(node->leaf == false) {
+		return key_val;
+	}	
+	
+	key_val = node->key_vals[node_pos->index];
+
+	for(i=node_pos->index;i< keys_max - 1;i++) {
+		node->key_vals[i] = node->key_vals[i + 1];
+                transfer_tags(node,i + 1,node,i);
+	}
+	
+	node->nr_active--;
+
+	if(node->nr_active == 0 ) {
+                if (btree->root == node) {
+                        btree->root = NULL;
+                }
+		free_btree_node(node);
+	}
+	return key_val;
+}
+
+/**
+*       Function used to remove a node from a  B-Tree
+*       @param btree The B-Tree
+*       @param key Key of the node to be removed
+*       @param value function to map the key to an unique integer value        
+*       @param compare Function used to compare the two nodes of the tree
+*       @return The removed key value pair 
+*/
+
+bt_key_val btree_delete(btree * btree,btree_node * subtree,unsigned long key) {
+	unsigned int i,index;
+	btree_node * node = NULL, * rsibling, *lsibling;
+	btree_node * comb_node, * parent;
+	node_pos sub_node_pos;
+	node_pos node_pos;
+	bt_key_val key_val, removed_kv,to_remove_kv;
+	unsigned long kv = key;	
+
+	node = subtree;
+	parent = NULL;
+	removed_kv.key = -1;
+	removed_kv.val = NULL;	
+
+	// Empty subtree
+	if (node == NULL) {
+		return removed_kv;
+	}
+
+del_loop:for (i = 0;;i = 0) {
+             
+            //If there are no keys simply return
+            if(!node->nr_active)
+                return removed_kv;
+
+	    // Fix the index of the key greater than or equal
+	    // to the key that we would like to search
+		
+	    while (i < node->nr_active && kv > 
+			    node->key_vals[i].key) {
+		    i++;
+	    }
+	    index = i;
+
+	    // If we find such key break		    
+	    if(i < node->nr_active && 
+			kv == node->key_vals[i].key) {
+			break;
+	    }
+            if(node->leaf)
+                return removed_kv;
+                
+    	    //Store the parent node
+	    parent = node;
+
+	    // To get a child node 
+	    node = node->children[i];
+
+            //If NULL not found
+            if (node == NULL)
+                return removed_kv;
+
+            if (index == (parent->nr_active)) {   
+                lsibling =  parent->children[parent->nr_active - 1];
+	    	rsibling = NULL; 
+            } else if (index == 0) {
+                lsibling = NULL;
+                rsibling = parent->children[1];
+            } else {
+        	lsibling = parent->children[i - 1];
+	    	rsibling = parent->children[i + 1];
+            }
+
+	    if (node->nr_active == btree->order - 1 && parent) {
+		// The current node has (t - 1) keys but the right sibling has > (t - 1)
+		// keys
+		if (rsibling && (rsibling->nr_active > btree->order - 1)) {
+			move_key(btree,parent,i,left);
+		} else
+		// The current node has (t - 1) keys but the left sibling has (t - 1)
+		// keys
+		if (lsibling && (lsibling->nr_active > btree->order - 1)) {
+			move_key(btree,parent,i,right);
+		} else
+		// Left sibling has (t - 1) keys
+	        if(lsibling && (lsibling->nr_active == btree->order - 1)) {
+		        node = merge_siblings(btree,parent,i,left);
+		} else
+                // Right sibling has (t - 1) keys
+	        if(rsibling && (rsibling->nr_active == btree->order - 1)) {
+		        node = merge_siblings(btree,parent,i,right);
+		}
+	    }
+        }            
+	
+	//Case 1 : The node containing the key is found and is the leaf node.
+	//Also the leaf node has keys greater than the minimum required.
+	//Simply remove the key
+	if(node->leaf && (node->nr_active > btree->order - 1)) {
+		node_pos.node = node;
+		node_pos.index = index;
+		return remove_key_from_leaf(btree,&node_pos);
+	}
+
+	//If the leaf node is the root permit deletion even if the number of keys is
+	//less than (t - 1)
+	if(node->leaf && (node == btree->root)) {
+		node_pos.node = node;
+		node_pos.index = index;
+		return remove_key_from_leaf(btree,&node_pos);
+	}
+
+
+	//Case 2: The node containing the key is found and is an internal node
+	if(node->leaf == false) {
+		if(node->children[index]->nr_active > btree->order - 1 ) {
+			to_remove_kv = node->key_vals[index];
+
+			sub_node_pos = 
+				get_max_key_pos(btree,node->children[index]);
+                        key_val = 
+				sub_node_pos.node->key_vals[sub_node_pos.index];
+        		node->key_vals[index] = key_val;	
+                        transfer_tags(node,index,sub_node_pos.node,
+                                sub_node_pos.index);
+
+			sub_node_pos.node->key_vals[sub_node_pos.index] = 
+				to_remove_kv;
+                        node = node->children[index];
+                        goto del_loop;
+               
+		} else if ((node->children[index + 1]->nr_active > btree->order - 1) ) {
+			to_remove_kv = node->key_vals[index];
+
+			sub_node_pos = 
+                                get_min_key_pos(btree,node->children[index + 1]);
+                        key_val = sub_node_pos.node->key_vals[sub_node_pos.index];
+        		node->key_vals[index] = key_val;	
+                        transfer_tags(node,index,sub_node_pos.node,
+                                sub_node_pos.index);
+
+			sub_node_pos.node->key_vals[sub_node_pos.index] = 
+				to_remove_kv;
+
+                        node = node->children[index + 1];
+                        goto del_loop;
+              
+		} else if ( 
+			node->children[index]->nr_active == btree->order - 1 &&
+			node->children[index + 1]->nr_active == btree->order - 1) {
+
+			comb_node = merge_nodes(btree,node->children[index],
+                                node->key_vals[index],
+				node->children[index + 1]);
+			node->children[index] = comb_node;
+			
+			for(i=index + 1;i<node->nr_active;i++) {
+				node->children[i] = node->children[i + 1];
+				node->key_vals[i - 1] = node->key_vals[i];
+                                transfer_tags(node,i,node,i - 1);
+			}
+			node->nr_active--;
+                        if (node->nr_active == 0 && btree->root == node) {
+                                free_btree_node(node);
+                                btree->root = comb_node;        
+                        } 
+                        node = comb_node;
+                        goto del_loop;
+		}		
+          }
+	
+	// Case 3:
+	// In this case start from the top of the tree and continue
+	// moving to the leaf node making sure that each node that
+	// we encounter on the way has atleast 't' (order of the tree)
+	// keys 
+	if(node->leaf && (node->nr_active > btree->order - 1)) {
+	      node_pos.node = node;
+	      node_pos.index = index;
+	      return remove_key_from_leaf(btree,&node_pos);
+	}
+
+        return removed_kv;
+}
+
+/**
+*	Function used to get the node containing the given key
+*	@param btree The btree to be searched
+*	@param key The the key to be searched
+*	@return The node and position of the key within the node 
+*/
+node_pos  get_btree_node(btree * btree,unsigned long key) {
+	node_pos kp;
+	unsigned long key_val = key;	
+	btree_node * node;
+	unsigned int i = 0;
+
+	node = btree->root;
+	kp.node = NULL;
+	kp.index = 0;
+
+	// Empty tree
+	if (node == NULL) {
+		return kp;
+	}	
+	
+	for (;;i = 0) {	
+
+	    // Fix the index of the key greater than or equal
+	    // to the key that we would like to search
+		
+	    while (i < node->nr_active && key_val > 
+			    node->key_vals[i].key ) {
+		    i++;
+	    }
+
+	    // If we find such key return the key-value pair		    
+	    if(i < node->nr_active && 
+			key_val == node->key_vals[i].key) {
+		    kp.node = node;
+		    kp.index = i;	
+		    return kp;
+	    }
+	
+	    // If the node is leaf and if we did not find the key 
+	    // return NULL
+	    if(node->leaf) {
+		    return kp;
+	    }
+
+	    // To got a child node 
+	    node = node->children[i];
+	}
+      	return kp;
+
+}
+
+/**
+*       Used to destory btree
+*       @param btree The B-tree
+*       @return none
+*/
+void btree_destroy(btree * btree) {
+       int i = 0;
+       unsigned int current_level;
+
+       btree_node * head, * tail, * node;
+       btree_node * child, * del_node;
+
+       node = btree->root;
+       current_level = node->level;
+       head = node;
+       tail = node;
+
+       while(true) {
+               if(head == NULL) {
+                       break;
+               }
+               if (head->level < current_level) {
+                       current_level = head->level;
+               }
+
+               if(head->leaf == false) {
+                       for(i = 0 ; i < head->nr_active + 1; i++) {
+                               child = head->children[i];
+                               tail->next = child;
+                               tail = child;
+                               child->next = NULL;
+                       }
+               }
+               del_node = head;
+               head = head->next;
+               free_btree_node(del_node);
+       }
+
+}
+
+int btree_tagged(btree * btree, int tag) {
+        int i = 0;
+        btree_node * head, * tail;
+        btree_node * child;
+
+        head = btree->root;
+        tail = btree->root;
+
+        while(true) {
+                if(head == NULL) {
+                        break;
+                }
+
+                for (i = 0; i < head->nr_active; i++ ) {
+                  
+                  if (tag != -1 && tag_get(head,tag,i)) 
+                     return 1;
+                  
+                }     
+
+                if(head->leaf == false) {
+                        for(i = 0 ; i < head->nr_active + 1; i++) {
+                                child = head->children[i];
+                                tail->next = child;
+                                tail = child;
+                                child->next = NULL;
+                        }
+                }
+                head = head->next;
+        }
+        return 0;
+}
+EXPORT_SYMBOL(btree_tagged);
+
+/*
+ * Returns 1 if any slot in the node has this tag set.
+ * Otherwise returns 0.
+ */
+static inline int any_tag_set(struct btree_node *node, int tag)
+{
+        int idx;
+        for (idx = 0; idx < BTREE_TAG_LONGS; idx++) {
+                if (node->tags[tag][idx])
+                        return 1;
+        }
+        return 0;
+}
+
+static void clear_tags(btree_node * node, int index) {
+	 int tag;
+        for (tag = 0;tag< BTREE_TAGS; tag++) {
+		tag_clear(node,tag,index);
+        }
+}
+
+static void transfer_tags(btree_node * source_node, int source_index,
+                btree_node * dest_node,int dest_index) {
+
+        int tag;
+        for (tag = 0;tag< BTREE_TAGS; tag++) {
+                if (tag_get(source_node,tag,source_index)) {
+                        tag_set(dest_node,tag,dest_index);
+                        tag_clear(source_node,tag,source_index);
+                } else {
+                        tag_clear(dest_node,tag,dest_index);
+                }
+        }
+}
+
+static unsigned int __lookup(btree * btree, void ** results,
+        unsigned long first_index,unsigned int max_items, int tag) {
+
+        int i = 0, j = 0;
+        int ret = 0;
+        unsigned long key;
+        btree_node * head, * tail;
+        btree_node * child;
+
+        head = btree->root;
+        tail = btree->root;
+
+        while(true) {
+                if(head == NULL) {
+                        break;
+                }
+
+                if ((head->key_vals[head->nr_active - 1].key 
+                                >= first_index) && (ret < max_items)) {
+                   for (i = 0; i < head->nr_active; i++ ) {
+                     
+                     if (tag != -1 && !tag_get(head,tag,i)) 
+                        continue;
+                     
+                     if ((key = head->key_vals[i].key) 
+                                                >= first_index) {
+                                // Insertion sort                
+                                for (j = (ret - 1);j>=0;j--) {
+                                      if (btree->key(results[j]) > key) {
+                                                results[j + 1] =
+                                                    results[j];    
+                                      } else {
+                                                break;
+                                      }
+                                }
+                               
+                                results[++j] = head->key_vals[i].val; 
+                                ret++;
+
+                                if(ret == max_items) {
+                                        break;
+                                }
+                     }           
+                   }     
+                }
+
+                if(head->leaf == false) {
+                        for(i = 0 ; i < head->nr_active + 1; i++) {
+                                child = head->children[i];
+                                tail->next = child;
+                                tail = child;
+                                child->next = NULL;
+                        }
+                }
+                head = head->next;
+        }
+        return ret;
+}
+
+/**
+*       Perform multiple lookup on the B-Tree
+*       @param btree The btree
+*       @param results where the results will be placed
+*       @param first_index start the lookup from this tree
+*       @param max_items Place upto these many items in results
+*       @return The number of items found
+*/
+unsigned int btree_gang_lookup(btree * btree, void ** results,
+        unsigned long first_index,unsigned int max_items) {
+        return __lookup(btree,results,first_index,max_items, -1);
+}
+
+
+EXPORT_SYMBOL(btree_gang_lookup);
+
+/**
+*       Perform multiple lookup on the B-Tree
+*       @param btree The btree
+*       @param results where the results will be placed
+*       @param first_index start the lookup from this tree
+*       @param max_items Place upto these many items in results
+*       @return The number of items found
+*/
+unsigned int btree_gang_lookup_tag(btree * btree, void ** results,
+        unsigned long first_index,unsigned int max_items,int tag) {
+        return __lookup(btree,results,first_index,max_items, tag);
+}
+
+
+EXPORT_SYMBOL(btree_gang_lookup_tag);
+
+
+/**
+*       Function used to search a node in a B-Tree
+*       @param btree The B-tree to be searched
+*       @param key Key of the node to be search
+*       @return The key-value pair
+*/
+void * btree_lookup(btree * btree,unsigned long key) {
+
+	bt_key_val key_val;
+	node_pos kp;
+	
+	if (btree->root == NULL) {
+		return NULL;
+	}
+
+	kp = get_btree_node(btree,key);
+	key_val.val = NULL;
+
+	if(kp.node) {
+		key_val = kp.node->key_vals[kp.index];
+	} 
+	return key_val.val; 
+}
+EXPORT_SYMBOL(btree_lookup);
+
+/**
+*       Function used to get the slot in a node of a B-Tree
+*       @param btree The B-tree to be searched
+*       @param key Key of the node to be search
+*       @return The key-value pair
+*/
+bt_key_val * btree_lookup_slot(btree * btree,unsigned long key) {
+
+	bt_key_val * key_val = NULL;
+	node_pos kp;
+
+	if (btree->root == NULL) {
+		return NULL;
+	}
+
+	kp = get_btree_node(btree,key);
+
+	if(kp.node) {
+		key_val = &kp.node->key_vals[kp.index];
+	}
+	return key_val; 
+}
+EXPORT_SYMBOL(btree_lookup_slot);
+
+/**
+*       Used to set the tag for a node
+*       @param btree The btree
+*       @param key The key of the page for which the tag is to be set
+*       @param tag The tag to be set
+*       @return The btree key and val for which the tag is set  
+*/
+
+bt_key_val * btree_tag_set(btree * btree,unsigned long key, int tag) {
+	bt_key_val * key_val = NULL;
+	node_pos kp = get_btree_node(btree,key);
+
+	if(kp.node) {
+		key_val = &kp.node->key_vals[kp.index];
+                if(!tag_get(kp.node,tag,kp.index))
+                        tag_set(kp.node,tag,kp.index);
+	}
+        BUG_ON(key_val == NULL);
+	return key_val; 
+}
+EXPORT_SYMBOL(btree_tag_set);
+
+/**
+*       Used to clear the tag for a node
+*       @param btree The btree
+*       @param key The key of the page for which the tag is to be set
+*       @param tag The tag to be set
+*       @return The btree key and val for which the tag is cleared  
+*/
+
+bt_key_val * btree_tag_clear(btree * btree,unsigned long key, int tag) {
+         bt_key_val * key_val = NULL;
+	node_pos kp = get_btree_node(btree,key);
+
+	if(kp.node) {
+		key_val = &kp.node->key_vals[kp.index];
+                if(tag_get(kp.node,tag,kp.index))
+                        tag_clear(kp.node,tag,kp.index);
+	}
+        BUG_ON(key_val == NULL);
+	return key_val; 
+}
+EXPORT_SYMBOL(btree_tag_clear);
+
+/**
+*       Used to verify whether a tag is set
+*       @param btree The btree
+*       @param key The key
+*       @param tag The tag
+*       @return
+*       0: tag not present
+*       1: tag present, set
+*       -1: tag present, unset 
+*/
+int btree_tag_get(btree * btree,unsigned long key, int tag) {
+        int ret = 0;
+        bt_key_val * key_val = NULL;
+	node_pos kp = get_btree_node(btree,key);
+
+	if(kp.node) {
+		key_val = &kp.node->key_vals[kp.index];
+                ret = tag_get(kp.node,tag,kp.index);
+                return ret? 1 : -1;
+	}
+        BUG_ON(key_val == NULL);
+	return ret; 
+}
+EXPORT_SYMBOL(btree_tag_get);
+
+
+/**
+*	Get the max key in the btree
+*	@param btree The btree
+*	@return The max key 
+*/
+unsigned long btree_get_max_key(btree * btree) {
+	node_pos node_pos;
+	node_pos = get_max_key_pos(btree,btree->root);
+	return node_pos.node->key_vals[node_pos.index].key;
+}
+
+/**
+*	Get the min key in the btree
+*	@param btree The btree
+*	@return The max key 
+*/
+unsigned long btree_get_min_key(btree * btree) {
+	node_pos node_pos;
+	node_pos = get_min_key_pos(btree,btree->root);
+	return node_pos.node->key_vals[node_pos.index].key;
+}
+
+/**
+*	Used to print the keys of the btree_node
+*	@param node The node whose keys are to be printed
+*	@return none
+*/
+
+static void print_single_node(btree *btree, btree_node * node) {
+	
+	int i = 0;
+	
+	printk(" { ");	
+	while(i < node->nr_active) {
+		printk("%d(0x%x) ", node->key_vals[i].key,
+			node->key_vals[i].val);
+		i++;
+	}
+	printk("} (0x%x,%d,%d) ", node,node->nr_active,node->level);	
+}
+
+/**
+*       Function used to print the B-tree
+*       @param root Root of the B-Tree
+*       @param print_key Function used to print the key value
+*       @return none
+*/
+
+void print_subtree(btree *btree,btree_node * node) {
+	
+	int i = 0;
+	unsigned int current_level;
+
+	btree_node * head, * tail;
+
+	current_level = node->level;
+	head = node;
+	tail = node;
+	
+	printk("DEBUG: Printing subtree\n");
+	while(true) {
+		if(head == NULL) {
+			break;
+		}
+		if (head->level < current_level) {
+			current_level = head->level;
+			printk("\n");
+		}
+		print_single_node(btree,head);
+
+		if(head->leaf == false) {	
+			for(i = 0 ; i < head->nr_active + 1; i++) {
+				tail->next = head->children[i];
+				tail = head->children[i];
+				head->children[i]->next = NULL;
+			}
+		}
+		head = head->next;	
+	}
+	printk("\n");
+}
+EXPORT_SYMBOL(print_subtree);
+
diff -urN linux-2.6.16.2/lib/Makefile linux-2.6.16.2-btree/lib/Makefile
--- linux-2.6.16.2/lib/Makefile	2006-04-07 12:56:47.000000000 -0400
+++ linux-2.6.16.2-btree/lib/Makefile	2006-08-05 10:57:12.000000000 -0400
@@ -3,7 +3,7 @@
 #
 
 lib-y := errno.o ctype.o string.o vsprintf.o cmdline.o \
-	 bust_spinlocks.o rbtree.o radix-tree.o dump_stack.o \
+	 bust_spinlocks.o rbtree.o radix-tree.o dump_stack.o btree.o\
 	 idr.o div64.o int_sqrt.o bitmap.o extable.o prio_tree.o \
 	 sha1.o
 
diff -urN linux-2.6.16.2/init/main.c linux-2.6.16.2-btree/init/main.c
--- linux-2.6.16.2/init/main.c	2006-04-07 12:56:47.000000000 -0400
+++ linux-2.6.16.2-btree/init/main.c	2006-08-06 15:20:06.000000000 -0400
@@ -84,6 +84,7 @@
 extern void pidmap_init(void);
 extern void prio_tree_init(void);
 extern void radix_tree_init(void);
+extern void btree_init(void);
 extern void free_initmem(void);
 extern void populate_rootfs(void);
 extern void driver_init(void);
@@ -529,7 +530,8 @@
 	key_init();
 	security_init();
 	vfs_caches_init(num_physpages);
-	radix_tree_init();
+//	radix_tree_init();
+        btree_init();
 	signals_init();
 	/* rootfs populating might need page-writeback */
 	page_writeback_init();
diff -urN linux-2.6.16.2/fs/buffer.c linux-2.6.16.2-btree/fs/buffer.c
--- linux-2.6.16.2/fs/buffer.c	2006-04-07 12:56:47.000000000 -0400
+++ linux-2.6.16.2-btree/fs/buffer.c	2006-08-06 15:10:49.000000000 -0400
@@ -859,9 +859,15 @@
 		if (page->mapping) {	/* Race with truncate? */
 			if (mapping_cap_account_dirty(mapping))
 				inc_page_state(nr_dirty);
-			radix_tree_tag_set(&mapping->page_tree,
+			btree_tag_set(&mapping->btree,
 						page_index(page),
 						PAGECACHE_TAG_DIRTY);
+
+			if (btree_tag_get(&mapping->btree,
+			  page_index(page),PAGECACHE_TAG_DIRTY) != 1) {
+				panic("Problem setting the tag\n");
+			}				
+
 		}
 		write_unlock_irq(&mapping->tree_lock);
 		__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
diff -urN linux-2.6.16.2/fs/inode.c linux-2.6.16.2-btree/fs/inode.c
--- linux-2.6.16.2/fs/inode.c	2006-04-07 12:56:47.000000000 -0400
+++ linux-2.6.16.2-btree/fs/inode.c	2006-08-06 21:36:22.000000000 -0400
@@ -16,6 +16,7 @@
 #include <linux/backing-dev.h>
 #include <linux/wait.h>
 #include <linux/hash.h>
+#include <linux/btree.h>
 #include <linux/swap.h>
 #include <linux/security.h>
 #include <linux/pagemap.h>
@@ -196,6 +197,7 @@
 	mutex_init(&inode->i_mutex);
 	init_rwsem(&inode->i_alloc_sem);
 	INIT_RADIX_TREE(&inode->i_data.page_tree, GFP_ATOMIC);
+        INIT_BTREE(&inode->i_data.btree, GFP_ATOMIC);
 	rwlock_init(&inode->i_data.tree_lock);
 	spin_lock_init(&inode->i_data.i_mmap_lock);
 	INIT_LIST_HEAD(&inode->i_data.private_list);
diff -urN linux-2.6.16.2/include/linux/btree.h linux-2.6.16.2-btree/include/linux/btree.h
--- linux-2.6.16.2/include/linux/btree.h	1969-12-31 19:00:00.000000000 -0500
+++ linux-2.6.16.2-btree/include/linux/btree.h	2006-08-06 21:41:02.000000000 -0400
@@ -0,0 +1,89 @@
+/*
+ *  btree.h
+ *
+ *  Copyright (C) 2006 Vishal Patil (vishpat AT gmail DOT com) 
+ *
+*/
+
+#ifndef _BTREE_H_
+#define _BTREE_H_
+
+#include <linux/sched.h>
+#include <linux/preempt.h>
+#include <linux/types.h>
+
+#define BTREE_ORDER 8 
+#define BTREE_TAGS 2
+#define BTREE_TAG_LONGS    \
+        ((BTREE_ORDER + BITS_PER_LONG - 1) / BITS_PER_LONG)
+#define BTREE_MAX_PATH 8
+
+
+typedef enum  {false,true} bool;
+typedef struct btree_node btree_node;
+
+typedef struct {
+        unsigned long key;
+	void * val;
+} bt_key_val;
+
+
+typedef struct btree {
+	unsigned int order;			// B-Tree order
+	btree_node * root;		        // Root of the B-Tree
+       	unsigned long (*value)(unsigned long key);	// Generate uint value for the key
+        unsigned long (*key)(void * data);      // Get the value of the key from
+                                                // from the data
+	void (*print_key)(unsigned long key);		// Print the key
+        gfp_t                   gfp_mask;
+} btree; 
+
+extern int btree_insert(btree * btree, unsigned long key, void * val);
+extern bt_key_val btree_delete(btree * btree,btree_node * subtree,
+                                unsigned long key);
+extern void * btree_lookup(btree * btree,  unsigned long key);
+extern unsigned int btree_gang_lookup(btree * btree, void ** results,
+        unsigned long first_index,unsigned int max_items);
+extern unsigned int btree_gang_lookup_tag(btree * btree, void ** results,
+        unsigned long first_index,unsigned int max_items,int tag);
+extern bt_key_val * btree_lookup_slot(btree * btree, unsigned long key);
+extern bt_key_val * btree_tag_set(btree * btree,unsigned long key, int tag);
+extern int btree_tag_get(btree * btree,unsigned long key, int tag);
+extern bt_key_val * btree_tag_clear(btree * btree,unsigned long key, int tag);
+extern void btree_destroy(btree * btree);
+extern unsigned long btree_get_max_key(btree * btree);
+extern unsigned long btree_get_min_key(btree * btree);
+extern unsigned long btree_value_fn(unsigned long key);
+extern unsigned long btree_key_fn(void * page);
+extern void btree_print_fn(unsigned long key);
+extern int btree_preload(gfp_t gfp_mask);
+extern int btree_tagged(btree * btree, int tag);
+
+static inline void btree_preload_end(void)
+{
+        preempt_enable();
+}
+
+#define BTREE_INIT(mask) { \
+        .order          = BTREE_ORDER,                            \
+        .root           = NULL,                                   \
+        .value          = btree_value_fn,                         \
+        .key            = btree_key_fn,                           \
+        .print_key      = btree_print_fn,                         \
+        .gfp_mask       = (mask)                                  \
+}
+
+#define INIT_BTREE(btree,mask)                            \
+do {                                                      \
+        (btree)->order = BTREE_ORDER;                     \
+        (btree)->root = NULL;                             \
+        (btree)->value            = btree_value_fn;       \
+        (btree)->key              = btree_key_fn;         \
+        (btree)->print_key        = btree_print_fn;       \
+        (btree)->gfp_mask         = (mask);               \
+} while (0)
+
+
+extern void print_subtree(btree * btree,btree_node * node);
+
+#endif
diff -urN linux-2.6.16.2/include/linux/fs.h linux-2.6.16.2-btree/include/linux/fs.h
--- linux-2.6.16.2/include/linux/fs.h	2006-04-07 12:56:47.000000000 -0400
+++ linux-2.6.16.2-btree/include/linux/fs.h	2006-08-05 10:57:12.000000000 -0400
@@ -214,6 +214,7 @@
 #include <linux/kobject.h>
 #include <linux/list.h>
 #include <linux/radix-tree.h>
+#include <linux/btree.h>
 #include <linux/prio_tree.h>
 #include <linux/init.h>
 #include <linux/sched.h>
@@ -372,6 +373,7 @@
 struct address_space {
 	struct inode		*host;		/* owner: inode, block_device */
 	struct radix_tree_root	page_tree;	/* radix tree of all pages */
+        struct btree            btree;          /* BTree of all pages*/
 	rwlock_t		tree_lock;	/* and rwlock protecting it */
 	unsigned int		i_mmap_writable;/* count VM_SHARED mappings */
 	struct prio_tree_root	i_mmap;		/* tree of private and shared mappings */

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux