The vm subsystem is rather complex. System memory is divided into zones,
lower zones act as fallback of higher zones in memory allocation. The page
reclaim algorithm should generally keep zone aging rates in sync. But if a
zone under watermark has many unreclaimable pages, it has to be scanned much
more to get enough free pages. While doing this,
- lower zones should also be scanned more, since their pages are also usable
for higher zone allocations.
- higher zones should not be scanned just to keep the aging in sync, which
can evict large amount of pages without saving the problem(and may well
worsen it).
With that in mind, the patch does the rebalance in kswapd as follows:
1) reclaim from the lowest zone when
- under pages_high
- under pages_high+lowmem_reserve, and less/equal aged than highest
zone(or out of sync with it)
2) reclaim from higher zones when
- under pages_high+lowmem_reserve, and less/equal aged than its
immediate lower neighbor(or out of sync with it)
Note that the zone age is a normalized value in range 0-4096 on i386/4G. 4096
corresponds to a full scan of one zone. And the comparison of ages are only
deemed ok if the gap is less than 4096/8, or they will be regarded as out of
sync.
On exit, the code ensures:
1) the lowest zone will be pages_high ok
2) at least one zone will be pages_high+lowmem_reserve ok
3) a very strong force of rebalancing with the exception of
- some lower zones are unreclaimable: we must let them go ahead
alone, leaving higher zones back
- shrink_zone() scans too much and creates huge imbalance in one
run(Nick is working on this)
The logic can deal with known normal/abnormal situations gracefully:
1) Normal case
- zone ages are cyclicly tied together: taking over each other, and
keeping close enough
2) A Zone is unreclaimable, scanned much more, and become out of sync
- if ever a troublesome zone is being overscanned, the logic brings
its lower neighbors ahead together, leaving higher neighbors back.
- the aging tie between the two groups is broken, and the relevant
zones are reclaimed when pages_high+lowmem_reserve not ok, just as
before the patch.
- at some time the zone ages meet again and back to normal
- a possiblely better strategy, as soon as the pressure disappeared,
might be relunctant to reclaim from the already overscanned lower
group, and let the higher group slowly catch up.
3) Zone is truncated
- will not reclaim from it until under watermark
With this patch, the meaning of zone->pages_high+lowmem_reserve changed from
the _required_ watermark to the _recommended_ watermark. Someone might be
willing to increase them somehow.
Signed-off-by: Wu Fengguang <[email protected]>
---
mm/vmscan.c | 25 ++++++++++++++++++++-----
1 files changed, 20 insertions(+), 5 deletions(-)
--- linux-2.6.15-rc5-mm1.orig/mm/vmscan.c
+++ linux-2.6.15-rc5-mm1/mm/vmscan.c
@@ -1359,6 +1359,7 @@ static int balance_pgdat(pg_data_t *pgda
int total_scanned, total_reclaimed;
struct reclaim_state *reclaim_state = current->reclaim_state;
struct scan_control sc;
+ struct zone *prev_zone = pgdat->node_zones;
loop_again:
total_scanned = 0;
@@ -1374,6 +1375,9 @@ loop_again:
struct zone *zone = pgdat->node_zones + i;
zone->temp_priority = DEF_PRIORITY;
+
+ if (populated_zone(zone))
+ prev_zone = zone;
}
for (priority = DEF_PRIORITY; priority >= 0; priority--) {
@@ -1404,14 +1408,25 @@ loop_again:
if (!populated_zone(zone))
continue;
- if (nr_pages == 0) { /* Not software suspend */
- if (zone_watermark_ok(zone, order,
- zone->pages_high, 0, 0))
- continue;
+ if (nr_pages) /* software suspend */
+ goto scan_swspd;
- all_zones_ok = 0;
+ if (zone < prev_zone &&
+ !zone_watermark_ok(zone, order,
+ zone->pages_high, 0, 0)) {
+ } else if (!age_gt(zone, prev_zone) &&
+ !zone_watermark_ok(zone, order,
+ zone->pages_high,
+ pgdat->nr_zones - 1, 0)) {
+ } else {
+ prev_zone = zone;
+ continue;
}
+ prev_zone = zone;
+ all_zones_ok = 0;
+
+scan_swspd:
if (zone->all_unreclaimable && priority != DEF_PRIORITY)
continue;
--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]