Re: [POLL] SLAB : Are the 32 and 192 bytes caches really usefull on x86_64 machines ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2005-12-28 at 18:57 +0100, Andreas Kleen wrote:
[...]
> 
> This whole discussion is pointless anyways because most kmallocs are
> constant
> sized and with a constant sized kmalloc the slab is selected at compile
> time.
> 
> What would be more interesting would be to redo the complete kmalloc
> slab list.
> 
> I remember the original slab paper from Bonwick actually mentioned that
> power of
> two slabs are the worst choice for a malloc - but for some reason Linux
> chose them
> anyways. That would require a lot of measurements in different workloads
> on the
> actual kmalloc sizes and then select a good list, but could ultimately
> safe
> a lot of memory (ok not that much anymore because the memory intensive
> allocations should all have their own caches, but at least some)
> 
> Most likely the best list is different for 32bit and 64bit too.
> 
> Note that just looking at slabinfo is not enough for this - you need the
> original
> sizes as passed to kmalloc, not the rounded values reported there.
> Should be probably not too hard to hack a simple monitoring script up
> for that
> in systemtap to generate the data.
> 

OK then, after reading this I figured there must be a way to dynamically
allocate slab sizes based on the kmalloc constants.  So I spent last
night and some of this morning coming up with the below patch.

Right now it only works with i386, but I'm sure it can be hacked to work
with all archs.  At compile time it creates a table of sizes for all
kmallocs (outside of slab.c and arch/i386/mm/init.c) that uses a
constant declaration.

This table is then initialized in arch/i386/mm/init.c to use a cache
that is either already created (like the mem_sizes array) or it creates
a new cache of that size (L1 cached aligned), and then updates the
pointers to use that cache.

Here's what was created on my test box:

cat /proc/slabinfo
[...]
dynamic_dma-1536       0      0   1536    5    2 : tunables   24   12    0 : slabdata      0      0      0
dynamic-1536           1      5   1536    5    2 : tunables   24   12    0 : slabdata      1      1      0
dynamic_dma-1280       0      0   1280    3    1 : tunables   24   12    0 : slabdata      0      0      0
dynamic-1280           6      6   1280    3    1 : tunables   24   12    0 : slabdata      2      2      0
dynamic_dma-2176       0      0   2176    3    2 : tunables   24   12    0 : slabdata      0      0      0
dynamic-2176           0      0   2176    3    2 : tunables   24   12    0 : slabdata      0      0      0
dynamic_dma-1152       0      0   1152    7    2 : tunables   24   12    0 : slabdata      0      0      0
dynamic-1152           0      0   1152    7    2 : tunables   24   12    0 : slabdata      0      0      0
dynamic_dma-1408       0      0   1408    5    2 : tunables   24   12    0 : slabdata      0      0      0
dynamic-1408           0      0   1408    5    2 : tunables   24   12    0 : slabdata      0      0      0
dynamic_dma-640        0      0    640    6    1 : tunables   54   27    0 : slabdata      0      0      0
dynamic-640            0      0    640    6    1 : tunables   54   27    0 : slabdata      0      0      0
dynamic_dma-768        0      0    768    5    1 : tunables   54   27    0 : slabdata      0      0      0
dynamic-768            0      0    768    5    1 : tunables   54   27    0 : slabdata      0      0      0
dynamic_dma-3200       0      0   3200    2    2 : tunables   24   12    0 : slabdata      0      0      0
dynamic-3200           8      8   3200    2    2 : tunables   24   12    0 : slabdata      4      4      0
dynamic_dma-896        0      0    896    4    1 : tunables   54   27    0 : slabdata      0      0      0
dynamic-896            9     12    896    4    1 : tunables   54   27    0 : slabdata      3      3      0
dynamic_dma-384        0      0    384   10    1 : tunables   54   27    0 : slabdata      0      0      0
dynamic-384           40     40    384   10    1 : tunables   54   27    0 : slabdata      4      4      0
size-131072(DMA)       0      0 131072    1   32 : tunables    8    4    0 : slabdata      0      0      0
size-131072            0      0 131072    1   32 : tunables    8    4    0 : slabdata      0      0      0
size-65536(DMA)        0      0  65536    1   16 : tunables    8    4    0 : slabdata      0      0      0
size-65536             1      1  65536    1   16 : tunables    8    4    0 : slabdata      1      1      0
size-32768(DMA)        0      0  32768    1    8 : tunables    8    4    0 : slabdata      0      0      0
size-32768             0      0  32768    1    8 : tunables    8    4    0 : slabdata      0      0      0
size-16384(DMA)        0      0  16384    1    4 : tunables    8    4    0 : slabdata      0      0      0
size-16384             0      0  16384    1    4 : tunables    8    4    0 : slabdata      0      0      0
size-8192(DMA)         0      0   8192    1    2 : tunables    8    4    0 : slabdata      0      0      0
size-8192             40     40   8192    1    2 : tunables    8    4    0 : slabdata     40     40      0
size-4096(DMA)         0      0   4096    1    1 : tunables   24   12    0 : slabdata      0      0      0
size-4096             34     34   4096    1    1 : tunables   24   12    0 : slabdata     34     34      0
size-2048(DMA)         0      0   2048    2    1 : tunables   24   12    0 : slabdata      0      0      0
size-2048            266    266   2048    2    1 : tunables   24   12    0 : slabdata    133    133      0
size-1024(DMA)         0      0   1024    4    1 : tunables   54   27    0 : slabdata      0      0      0
size-1024             24     24   1024    4    1 : tunables   54   27    0 : slabdata      6      6      0
size-512(DMA)          0      0    512    8    1 : tunables   54   27    0 : slabdata      0      0      0
size-512              90    112    512    8    1 : tunables   54   27    0 : slabdata     14     14      0
size-256(DMA)          0      0    256   15    1 : tunables  120   60    0 : slabdata      0      0      0
size-256             735    735    256   15    1 : tunables  120   60    0 : slabdata     49     49      0
size-128(DMA)          0      0    128   30    1 : tunables  120   60    0 : slabdata      0      0      0
size-128            2750   2760    128   30    1 : tunables  120   60    0 : slabdata     92     92      0
size-64(DMA)           0      0     64   59    1 : tunables  120   60    0 : slabdata      0      0      0
size-32(DMA)           0      0     32  113    1 : tunables  120   60    0 : slabdata      0      0      0
size-64              418    472     64   59    1 : tunables  120   60    0 : slabdata      8      8      0
size-32             1175   1243     32  113    1 : tunables  120   60    0 : slabdata     11     11      0
[...]

Not sure if this is worth looking further into, but it might actually be
a way to use less memory.  For example, the above 384 size with 40
objects cost only 4 4k pages, where as these same objects would be 40
512 objects (in size-512) costing 5 4k pages.  Plus the 384 probably has
ON_SLAB management where as the 512 does not.

Comments?

-- Steve

Index: linux-2.6.15-rc7/arch/i386/Kconfig
===================================================================
--- linux-2.6.15-rc7.orig/arch/i386/Kconfig	2005-12-29 09:09:29.000000000 -0500
+++ linux-2.6.15-rc7/arch/i386/Kconfig	2005-12-29 09:09:53.000000000 -0500
@@ -173,6 +173,14 @@
 	depends on HPET_TIMER && RTC=y
 	default y
 
+config DYNAMIC_SLABS
+	bool "Dynamically create slabs for constant kmalloc"
+	default y
+	help
+	  This enables the creation of SLABS using information created at
+	  compile time.  Then on boot up, the slabs are created to fit
+	  more with what was asked for.
+
 config SMP
 	bool "Symmetric multi-processing support"
 	---help---
Index: linux-2.6.15-rc7/arch/i386/kernel/vmlinux.lds.S
===================================================================
--- linux-2.6.15-rc7.orig/arch/i386/kernel/vmlinux.lds.S	2005-12-29 09:09:29.000000000 -0500
+++ linux-2.6.15-rc7/arch/i386/kernel/vmlinux.lds.S	2005-12-29 09:09:53.000000000 -0500
@@ -68,6 +68,13 @@
 	*(.data.init_task)
   }
 
+#ifdef CONFIG_DYNAMIC_SLABS
+  . = ALIGN(16);		/* dynamic slab table */
+  __start____slab_addresses = .;
+  __slab_addresses : AT(ADDR(__slab_addresses) - LOAD_OFFSET) { *(__slab_addresses) }
+  __stop____slab_addresses = .;
+#endif
+
   /* will be freed after init */
   . = ALIGN(4096);		/* Init code and data */
   __init_begin = .;
@@ -107,6 +114,14 @@
   .altinstr_replacement : AT(ADDR(.altinstr_replacement) - LOAD_OFFSET) {
 	*(.altinstr_replacement)
   }
+#ifdef CONFIG_DYNAMIC_SLABS
+  . = ALIGN(16);		/* dynamic slab table */
+  __start____slab_preprocess = .;
+  __slab_preprocess : AT(ADDR(__slab_preprocess) - LOAD_OFFSET) { *(__slab_preprocess) }
+  __slab_process_ret : AT(ADDR(__slab_process_ret) - LOAD_OFFSET) { *(__slab_process_ret) }
+  __stop____slab_preprocess = .;
+#endif
+
   /* .exit.text is discard at runtime, not link time, to deal with references
      from .altinstructions and .eh_frame */
   .exit.text : AT(ADDR(.exit.text) - LOAD_OFFSET) { *(.exit.text) }
@@ -119,7 +134,7 @@
   __per_cpu_start = .;
   .data.percpu  : AT(ADDR(.data.percpu) - LOAD_OFFSET) { *(.data.percpu) }
   __per_cpu_end = .;
-  . = ALIGN(4096);
+ . = ALIGN(4096);
   __init_end = .;
   /* freed after init ends here */
 	
Index: linux-2.6.15-rc7/arch/i386/mm/init.c
===================================================================
--- linux-2.6.15-rc7.orig/arch/i386/mm/init.c	2005-12-29 09:09:29.000000000 -0500
+++ linux-2.6.15-rc7/arch/i386/mm/init.c	2005-12-29 14:31:08.000000000 -0500
@@ -6,6 +6,7 @@
  *  Support of BIGMEM added by Gerhard Wichert, Siemens AG, July 1999
  */
 
+#define DYNAMIC_SLABS_BOOTSTRAP
 #include <linux/config.h>
 #include <linux/module.h>
 #include <linux/signal.h>
@@ -748,3 +749,187 @@
 	}
 }
 #endif
+
+#ifdef CONFIG_DYNAMIC_SLABS
+extern void __start____slab_preprocess(void);
+extern unsigned long __start____slab_addresses;
+extern unsigned long __stop____slab_addresses;
+
+static __initdata LIST_HEAD(slablist);
+
+struct slab_links {
+	struct cache_sizes *c;
+	struct list_head list;
+};
+
+static struct cache_sizes *find_slab_size(int size)
+{
+	struct list_head *curr;
+	struct slab_links *s;
+
+	list_for_each(curr, &slablist) {
+		s = list_entry(curr, struct slab_links, list);
+		if (s->c->cs_size == size)
+			return s->c;
+	}
+	return NULL;
+}
+
+static void free_slablist(void)
+{
+	struct list_head *curr, *next;
+	struct slab_links *s;
+
+	list_for_each_safe(curr, next, &slablist) {
+		s = list_entry(curr, struct slab_links, list);
+		list_del(&s->list);
+		kfree(s);
+	}
+}
+
+#ifndef ARCH_KMALLOC_MINALIGN
+#define ARCH_KMALLOC_MINALIGN L1_CACHE_BYTES
+#endif
+#ifndef ARCH_KMALLOC_FLAGS
+#define ARCH_KMALLOC_FLAGS SLAB_HWCACHE_ALIGN
+#endif
+#define	BYTES_PER_WORD		sizeof(void *)
+
+#ifdef DEBUG_ADDR
+static __init void print_slab_addresses(int hex)
+{
+	unsigned long *slab_addresses = &__start____slab_addresses;
+	unsigned long *end = &__stop____slab_addresses;
+
+
+	for (; slab_addresses < end; slab_addresses++) {
+		if (hex)
+			printk("slab %p = %lx\n",slab_addresses, *slab_addresses);
+		else
+			printk("slab %p = %ld\n",slab_addresses, *slab_addresses);
+	}
+}
+#else
+# define print_slab_addresses(x) do {} while(0)
+#endif
+
+int __init dynamic_slab_init(void)
+{
+	unsigned long *slab_addresses = &__start____slab_addresses;
+	unsigned long *end = &__stop____slab_addresses;
+	struct cache_sizes *c;
+	struct slab_links *s;
+	unsigned long sizes[] = {
+#define CACHE(C) C,
+#include <linux/kmalloc_sizes.h>
+#undef CACHE
+	};
+	int i;
+
+
+	asm (".section __slab_process_ret,\"ax\"\n"
+	     "ret\n"
+	     ".previous\n");
+
+	__start____slab_preprocess();
+
+	printk("Before update!\n");
+	print_slab_addresses(0);
+
+	/*
+	 * DYNAMIC_SLABS_BOOTSTRAP is defined, so we don't need
+	 * to worry about kmalloc hardcoded.
+	 */
+
+	/*
+	 * This is really bad, but I don't want to go monkey up the
+	 * slab.c to get to the cache_chain.  So right now I just
+	 * allocate a pointer list to search for slabs that are
+	 * of the right size, and then free it at the end.
+	 *
+	 * Hey, you find a better way, then fix this ;)
+	 */
+	for (i=0; i < sizeof(sizes)/sizeof(sizes[0]); i++) {
+		s = kmalloc(sizeof(*s), GFP_ATOMIC);
+		if (!s)
+			panic("Can't create link list for slabs\n");
+		s->c = &malloc_sizes[i];
+		list_add_tail(&s->list, &slablist);
+	}
+
+	for (; slab_addresses < end; slab_addresses++) {
+		char *name;
+		char *name_dma;
+		unsigned long size = *slab_addresses;
+		struct cache_sizes **ptr = (struct cache_sizes**)slab_addresses;
+
+		if (!size)
+			continue;
+
+		size = (size + (L1_CACHE_BYTES-1)) & ~(L1_CACHE_BYTES-1);
+		if (size < BYTES_PER_WORD)
+			size = BYTES_PER_WORD;
+		if (size < ARCH_KMALLOC_MINALIGN)
+			size = ARCH_KMALLOC_MINALIGN;
+
+		c = find_slab_size(size);
+		if (c) {
+			*ptr = c;
+			continue;
+		}
+
+		/*
+		 * Create a cache for this specific size.
+		 */
+		name = kmalloc(25, GFP_ATOMIC);
+		if (!name)
+			panic("Can't allocate name for dynamic slab\n");
+
+		snprintf(name, 25, "dynamic-%ld", size);
+		name_dma = kmalloc(25, GFP_ATOMIC);
+		if (!name_dma)
+			panic("Can't allocate name for dynamic slab\n");
+
+		snprintf(name_dma, 25, "dynamic_dma-%ld", size);
+
+		c = kmalloc(sizeof(*c), GFP_ATOMIC);
+
+		if (!c)
+			panic("Can't allocate cache_size descriptor\n");
+
+		c->cs_size = size;
+
+		/*
+		 * For performance, all the general caches are L1 aligned.
+		 * This should be particularly beneficial on SMP boxes, as it
+		 * eliminates "false sharing".
+		 * Note for systems short on memory removing the alignment will
+		 * allow tighter packing of the smaller caches.
+		 */
+		c->cs_cachep = kmem_cache_create(name,
+				c->cs_size, ARCH_KMALLOC_MINALIGN,
+				(ARCH_KMALLOC_FLAGS | SLAB_PANIC), NULL, NULL);
+
+		c->cs_dmacachep = kmem_cache_create(name_dma,
+			c->cs_size, ARCH_KMALLOC_MINALIGN,
+			(ARCH_KMALLOC_FLAGS | SLAB_CACHE_DMA | SLAB_PANIC),
+			NULL, NULL);
+
+		s = kmalloc(sizeof(*s), GFP_ATOMIC);
+		if (!s)
+			panic("Can't create link list for slabs\n");
+		s->c = c;
+		list_add_tail(&s->list, &slablist);
+
+		*ptr = c;
+
+	}
+
+	free_slablist();
+
+	printk("\nAfter update!\n");
+	print_slab_addresses(1);
+
+	return 0;
+}
+#endif
Index: linux-2.6.15-rc7/include/asm-i386/dynamic_slab.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.15-rc7/include/asm-i386/dynamic_slab.h	2005-12-29 09:09:53.000000000 -0500
@@ -0,0 +1,20 @@
+
+/*
+ * Included in slab.h
+ *
+ * @c    - cache pointer to return base on size
+ * @size - size of cache.
+ */
+__asm__ __volatile__ (
+	"jmp 2f\n"
+	".section __slab_preprocess,\"ax\"\n"
+	"movl %1,1f\n"
+	".previous\n"
+	".section __slab_addresses,\"aw\"\n"
+	".align 4\n"
+	"1:\n"
+	".long 0\n"
+	".previous\n"
+	"2:\n"
+	"movl 1b, %0\n"
+	: "=r"(c) : "i"(size));
Index: linux-2.6.15-rc7/include/linux/slab.h
===================================================================
--- linux-2.6.15-rc7.orig/include/linux/slab.h	2005-12-29 09:09:29.000000000 -0500
+++ linux-2.6.15-rc7/include/linux/slab.h	2005-12-29 09:23:44.000000000 -0500
@@ -80,6 +80,15 @@
 {
 	if (__builtin_constant_p(size)) {
 		int i = 0;
+#if defined(CONFIG_DYNAMIC_SLABS) && !defined(MODULE) && !defined(DYNAMIC_SLABS_BOOTSTRAP)
+		{
+			struct cache_sizes *c;
+# include <asm/dynamic_slab.h>
+		return kmem_cache_alloc((flags & GFP_DMA) ?
+			c->cs_dmacachep :
+			c->cs_cachep, flags);
+		}
+#endif
 #define CACHE(x) \
 		if (size <= x) \
 			goto found; \
Index: linux-2.6.15-rc7/mm/slab.c
===================================================================
--- linux-2.6.15-rc7.orig/mm/slab.c	2005-12-29 09:09:29.000000000 -0500
+++ linux-2.6.15-rc7/mm/slab.c	2005-12-29 14:04:44.000000000 -0500
@@ -86,6 +86,7 @@
  *	All object allocations for a node occur from node specific slab lists.
  */
 
+#define DYNAMIC_SLABS_BOOTSTRAP
 #include	<linux/config.h>
 #include	<linux/slab.h>
 #include	<linux/mm.h>
@@ -1165,6 +1166,19 @@
 	/* Done! */
 	g_cpucache_up = FULL;
 
+#ifdef CONFIG_DYNAMIC_SLABS
+	{
+		extern int dynamic_slab_init(void);
+		/*
+		 * Create the caches that will handle
+		 * kmallocs of constant sizes.
+		 */
+		dynamic_slab_init();
+	}
+#endif
+	/*
+	 */
+
 	/* Register a cpu startup notifier callback
 	 * that initializes ac_data for all new cpus
 	 */





-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux