[PATCH 05/16] mm: balance zone aging in kswapd reclaim path

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The vm subsystem is rather complex. System memory is divided into zones,
lower zones act as fallback of higher zones in memory allocation.  The page
reclaim algorithm should generally keep zone aging rates in sync. But if a
zone under watermark has many unreclaimable pages, it has to be scanned much
more to get enough free pages. While doing this,

- lower zones should also be scanned more, since their pages are also usable
  for higher zone allocations.
- higher zones should not be scanned just to keep the aging in sync, which
  can evict large amount of pages without saving the problem(and may well
  worsen it).

With that in mind, the patch does the rebalance in kswapd as follows:
1) reclaim from the lowest zone when
	- under pages_high
	- under pages_high+lowmem_reserve, and less/equal aged than highest
	  zone(or out of sync with it)
2) reclaim from higher zones when
	- under pages_high+lowmem_reserve, and less/equal aged than its
	  immediate lower neighbor(or out of sync with it)

Note that the zone age is a normalized value in range 0-4096 on i386/4G. 4096
corresponds to a full scan of one zone. And the comparison of ages are only
deemed ok if the gap is less than 4096/8, or they will be regarded as out of
sync.

On exit, the code ensures:
1) the lowest zone will be pages_high ok
2) at least one zone will be pages_high+lowmem_reserve ok
3) a very strong force of rebalancing with the exception of
	- some lower zones are unreclaimable: we must let them go ahead
	  alone, leaving higher zones back
	- shrink_zone() scans too much and creates huge imbalance in one
	  run(Nick is working on this)

The logic can deal with known normal/abnormal situations gracefully:
1) Normal case
	- zone ages are cyclicly tied together: taking over each other, and
	  keeping close enough

2) A Zone is unreclaimable, scanned much more, and become out of sync
	- if ever a troublesome zone is being overscanned, the logic brings
	  its lower neighbors ahead together, leaving higher neighbors back.
	- the aging tie between the two groups is broken, and the relevant
	  zones are reclaimed when pages_high+lowmem_reserve not ok, just as
	  before the patch.
	- at some time the zone ages meet again and back to normal
	- a possiblely better strategy, as soon as the pressure disappeared,
	  might be relunctant to reclaim from the already overscanned lower
	  group, and let the higher group slowly catch up.

3) Zone is truncated
	- will not reclaim from it until under watermark

With this patch, the meaning of zone->pages_high+lowmem_reserve changed from
the _required_ watermark to the _recommended_ watermark. Someone might be
willing to increase them somehow.

Signed-off-by: Wu Fengguang <[email protected]>
---

 mm/vmscan.c |   34 +++++++++++++++++++++++++++++-----
 1 files changed, 29 insertions(+), 5 deletions(-)

--- linux.orig/mm/vmscan.c
+++ linux/mm/vmscan.c
@@ -1364,6 +1364,7 @@ static int balance_pgdat(pg_data_t *pgda
 	int total_scanned, total_reclaimed;
 	struct reclaim_state *reclaim_state = current->reclaim_state;
 	struct scan_control sc;
+	struct zone *prev_zone = pgdat->node_zones;
 
 loop_again:
 	total_scanned = 0;
@@ -1379,6 +1380,9 @@ loop_again:
 		struct zone *zone = pgdat->node_zones + i;
 
 		zone->temp_priority = DEF_PRIORITY;
+
+		if (populated_zone(zone))
+			prev_zone = zone;
 	}
 
 	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
@@ -1409,14 +1413,34 @@ loop_again:
 			if (!populated_zone(zone))
 				continue;
 
-			if (nr_pages == 0) {	/* Not software suspend */
-				if (zone_watermark_ok(zone, order,
-					zone->pages_high, 0, 0))
-					continue;
+			if (nr_pages) 	/* software suspend */
+				goto scan_swspd;
 
-				all_zones_ok = 0;
+			if (zone_watermark_ok(zone, order,
+						zone->pages_high,
+						pgdat->nr_zones - 1, 0)) {
+				/* free pages enough, no reclaim */
+			} else if (zone < prev_zone) {
+				if (!zone_watermark_ok(zone, order,
+						zone->pages_high, 0, 0)) {
+					/* have to scan for free pages */
+					goto scan;
+				}
+				if (age_ge(prev_zone, zone)) {
+					/* catch up if falls behind */
+					goto scan;
+				}
+			} else if (!age_gt(zone, prev_zone)) {
+				/* catch up if falls behind or out of sync */
+				goto scan;
 			}
 
+			prev_zone = zone;
+			continue;
+scan:
+			prev_zone = zone;
+			all_zones_ok = 0;
+scan_swspd:
 			if (zone->all_unreclaimable && priority != DEF_PRIORITY)
 				continue;
 

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux