Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020

On (11/05/07 13:51), Nicolas Mailhot didst pronounce:
> Le vendredi 11 mai 2007 à 10:08 +0100, Mel Gorman a écrit :
> 
> > > seems to have cured the system so far (need to charge it a bit longer to
> > > be sure)
> > > 
> > 
> > The longer it runs the better, particularly under load and after
> > updatedb has run. Thanks a lot for testing
> 
> After a few hours of load testing still nothing in the logs, so the
> revert was probably the right thing to do

Excellent. I am somewhat suprised by the result so I'd like to look at the
alternative option with kswapd as well. Could you put that patch back in again
please and try the following patch instead? The patch causes kswapd to reclaim
at higher orders if it's requested to.  Christoph, can you look at the patch
as well and make sure it's doing the right thing with respect to SLUB please?

Ultimatly, it's probably still a bad plan for atomic allocations to
depend on high-order allocations being possible but it's interesting to
see how it behaves.

Thanks

=====

Subject: [RFC] Have kswapd keep a minimum order free other than order-0

kswapd normally reclaims at order 0 unless there is a higher-order allocation
under way. However, in some cases, it is known that there are a minimum
order size of general interest such as the SLUB allocator requiring regular
high-order allocations. This allows a minimum order to be set to that
min_free_kbytes is kept at higher orders.

With a simple stress test, buddyinfo looks like this at the end with kswapd at ddefault;

Node 0, zone      DMA     10     12     11     10     10     10      6      5      4      1      0 
Node 0, zone   Normal     87    232    601    490    369    282    197    116     79     39     28 

With kswapd attempting to keep pages free at order-4, it looks like

Node 0, zone      DMA     35     37     29     28     22     18      8      6      0      1      0 
Node 0, zone   Normal     96    203    361    355    265    203    141     97     57     34     48 

---
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-clean/include/linux/mmzone.h linux-2.6.21-mm2-kswapd_minorder/include/linux/mmzone.h
--- linux-2.6.21-mm2-clean/include/linux/mmzone.h	2007-05-09 10:21:28.000000000 +0100
+++ linux-2.6.21-mm2-kswapd_minorder/include/linux/mmzone.h	2007-05-11 11:12:43.000000000 +0100
@@ -499,6 +499,8 @@ typedef struct pglist_data {
 void get_zone_counts(unsigned long *active, unsigned long *inactive,
 			unsigned long *free);
 void build_all_zonelists(void);
+int kswapd_order(unsigned int order);
+void set_kswapd_order(unsigned int order);
 void wakeup_kswapd(struct zone *zone, int order);
 int zone_watermark_ok(struct zone *z, int order, unsigned long mark,
 		int classzone_idx, int alloc_flags);
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-clean/mm/slub.c linux-2.6.21-mm2-kswapd_minorder/mm/slub.c
--- linux-2.6.21-mm2-clean/mm/slub.c	2007-05-09 10:21:28.000000000 +0100
+++ linux-2.6.21-mm2-kswapd_minorder/mm/slub.c	2007-05-11 11:10:08.000000000 +0100
@@ -2131,6 +2131,7 @@ static struct kmem_cache *kmalloc_caches
 static int __init setup_slub_min_order(char *str)
 {
 	get_option (&str, &slub_min_order);
+	set_kswapd_order(slub_min_order);
 	user_override = 1;
 	return 1;
 }
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-clean/mm/vmscan.c linux-2.6.21-mm2-kswapd_minorder/mm/vmscan.c
--- linux-2.6.21-mm2-clean/mm/vmscan.c	2007-05-09 10:21:28.000000000 +0100
+++ linux-2.6.21-mm2-kswapd_minorder/mm/vmscan.c	2007-05-11 11:55:45.000000000 +0100
@@ -1407,6 +1407,32 @@ out:
 	return nr_reclaimed;
 }
 
+static unsigned int kswapd_min_order __read_mostly;
+
+/**
+ * set_kswapd_order - Set the minimum order that kswapd reclaims at
+ * @order: The new minimum order
+ *
+ * kswapd normally reclaims at order 0 unless there is a higher-order
+ * allocation under way. However, in some cases, it is known that there
+ * are a minimum order size of general interest such as the SLUB allocator
+ * requiring regular high-order allocations. This allows a minimum order
+ * to be set to that min_free_kbytes is kept at higher orders
+ */
+void set_kswapd_order(unsigned int order)
+{
+	if (order >= MAX_ORDER)
+		return;
+	
+	printk(KERN_INFO "kswapd reclaim order set to %d\n", order);
+	kswapd_min_order = order;
+}
+	
+int kswapd_order(unsigned int order)
+{
+	return max(kswapd_min_order, order);
+}
+
 /*
  * The background pageout daemon, started as a kernel thread
  * from the init process. 
@@ -1450,13 +1476,13 @@ static int kswapd(void *p)
 	 */
 	tsk->flags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
 
-	order = 0;
+	order = kswapd_order(0);
 	for ( ; ; ) {
 		unsigned long new_order;
 
 		prepare_to_wait(&pgdat->kswapd_wait, &wait, TASK_INTERRUPTIBLE);
-		new_order = pgdat->kswapd_max_order;
-		pgdat->kswapd_max_order = 0;
+		new_order = kswapd_order(pgdat->kswapd_max_order);
+		pgdat->kswapd_max_order = kswapd_order(0);
 		if (order < new_order) {
 			/*
 			 * Don't sleep if someone wants a larger 'order'
@@ -1467,7 +1493,7 @@ static int kswapd(void *p)
 			if (!freezing(current))
 				schedule();
 
-			order = pgdat->kswapd_max_order;
+			order = kswapd_order(pgdat->kswapd_max_order);
 		}
 		finish_wait(&pgdat->kswapd_wait, &wait);
 
-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020
  - From: Christoph Lameter <[email protected]>
- Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020
  - From: Nicolas Mailhot <[email protected]>

References:
- Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020
  - From: Christoph Lameter <[email protected]>
- Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020
  - From: [email protected] (Mel Gorman)
- Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020
  - From: Christoph Lameter <[email protected]>
- Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020
  - From: [email protected] (Mel Gorman)
- Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020
  - From: Christoph Lameter <[email protected]>
- Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020
  - From: [email protected] (Mel Gorman)
- Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020
  - From: Christoph Lameter <[email protected]>
- Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020
  - From: Nicolas Mailhot <[email protected]>
- Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020
  - From: [email protected] (Mel Gorman)
- Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020
  - From: Nicolas Mailhot <[email protected]>

Prev by Date: [PATCH] "volatile considered harmful", take 3
Next by Date: Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020
Previous by thread: Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020
Next by thread: Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]