On Mon, 4 Dec 2006, Andrew Morton wrote:
On Mon, 4 Dec 2006 14:07:47 +0000
[email protected] (Mel Gorman) wrote:
o copy_strings() and variants are no longer setting the flag as the pages
are not obviously movable when I took a much closer look.
o The arch function alloc_zeroed_user_highpage() is now called
__alloc_zeroed_user_highpage and takes flags related to
movability that will be applied. alloc_zeroed_user_highpage()
calls __alloc_zeroed_user_highpage() with no additional flags to
preserve existing behavior of the API for out-of-tree users and
alloc_zeroed_user_highpage_movable() sets the __GFP_MOVABLE flag.
o new_inode() documents that it uses GFP_HIGH_MOVABLE and callers are expected
to call mapping_set_gfp_mask() if that is not suitable.
umm, OK. Could we please have some sort of statement pinning down the
exact semantics of __GFP_MOVABLE, and what its envisaged applications are?
"An allocation marked __GFP_MOVABLE may be moved using either page
migration or by paging out."
Right now, it's paging out. It isn't smart enough to use page migration.
Bottom line, if a __GFP_MOVABLE allocation is in an awkward place, it can
be got rid of somewhow.
My concern is that __GFP_MOVABLE is useful for fragmentation-avoidance, but
useless for memory hot-unplug.
Anti-fragmentation did allow SPARSEMEM sections to be off-lined when it
was tested a long time ago so it's not useless. Where it could help
general hotplug remove is by keeping non-movable allocations at the lower
PFNs as much as possible.
So that if/when hot-unplug comes along
we'll add more gunk which is a somewhat-superset of the GFP_MOVABLE
infrastructure, hence we didn't need the GFP_MOVABLE code. Or something.
If/when hot-unplug comes along, it's going to need some way of identifying
pages that are safe to place in a hot-unpluggable areas so you'll end up
with something like __GFP_MOVABLE.
That depends on how we do hot-unplug, if we do it. I continue to suspect
that it'll be done via memory zones: effectively by resurrecting
GFP_HIGHMEM. In which case there's little overlap with anti-frag.
And will introduce a zone that must be tuned at boot-time which is
undesirable but doable. With arch-independent zone-sizing in place, it's
considerably easier to create such a zone and then use __GFP_MOVABLE as a
zone modifier within the allocator. I have really old patches that do
something like this that I can bring up to date. However, that zone will
only be usable by __GFP_MOVABLE pages and will not help the e1000 case for
example.
On the other hand anti-frag (exists) + keeping non-movable pages at
lowest-possible-pfn (doesn't exist yet) would allow some DIMMs to be
unplugged without needing additional zones or tuning.
(btw, I
have a suspicion that the most important application of memory hot-unplug
will be power management: destructively turning off DIMMs).
You're probably right.
I'd also like to pin down the situation with lumpy-reclaim versus
anti-fragmentation. No offence
None taken.
, but I would of course prefer to avoid
merging the anti-frag patches simply based on their stupendous size.
It seems to me that lumpy-reclaim is suitable for the e1000 problem
, but perhaps not for the hugetlbpage problem.
I believe you'll hit similar problems even with lumpy-reclaim for the
e1000 (I've added Andy to the cc so he can comment more). Lumpy provides a
much smarter way of freeing higher-order contiguous blocks without having
to reclaim 95%+ of memory - this is good. However, if you are currently
seeing situations where the allocations fails even after you page out
everything possible, smarter reclaim that eventually pages out everything
anyway will not help you (chances are it's something like page tables that
are in your way).
This is where anti-frag comes in. It clusters pages together based on
their type - unmovable, reapable (inode caches, short-lived kernel
allocations, skbuffs etc) and movable. When kswapd kicks in, the slab
caches will be reaped. As reapable pages are clustered together, that will
free some contiguous areas - probably enough for the e1000 allocations to
succeed!
If that doesn't work, kswapd and direct reclaim will start reclaiming the
"movable" pages. Without lumpy reclaim, 95%+ of memory could be paged out
which is bad. Lumpy finds the contiguous pages faster and with less IO,
that's why it's important.
Tests I am aware of show that lumpy-reclaim on it's own makes little or no
difference to the hugetlb page problem. However, with anti-frag,
hugetlb-sized allocations succeed much more often even when under memory
pressure.
Whereas anti-fragmentation adds
vastly more code, but can address both problems? Or something.
Anti-frag goes a long way to addressing both problems. Lumpy-reclaim
increases it's success rates under memory pressure and reduces the amount
of reclaim that occurs.
IOW: big-picture where-do-we-go-from-here stuff.
Start with lumpy reclaim, then I'd like to merge page clustering piece by
piece, ideally with one of the people with e1000 problems testing to see
does it make a difference.
Assuming they are shown to help, where we'd go from there would be stuff
like;
1. Keep non-movable and reapable allocations at the lower PFNs as much as
possible. This is so DIMMS for higher PFNs can be removed (doesn't
exist)
2. Use page migration to compact memory rather than depending solely on
reclaim (doesn't exist)
3. Introduce a mechanism for marking a group of pages as being offlined so
that they are not reallocated (code that does something like this
exists)
4. Resurrect the hotplug-remove code (exists, but probably very stale)
5. Allow allocations for hugepages outside of the pool as long as the
process remains with it's locked_vm limits (patches were posted to
libhugetlbfs last Friday. will post to linux-mm tomorrow).
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]