Re: [PATCH] Add __GFP_MOVABLE for callers to flag allocations that may be migrated

On Mon, 4 Dec 2006, Andrew Morton wrote:

On Mon, 4 Dec 2006 14:07:47 +0000
[email protected] (Mel Gorman) wrote:

o copy_strings() and variants are no longer setting the flag as the pages
  are not obviously movable when I took a much closer look.

o The arch function alloc_zeroed_user_highpage() is now called
  __alloc_zeroed_user_highpage and takes flags related to
  movability that will be applied.  alloc_zeroed_user_highpage()
  calls __alloc_zeroed_user_highpage() with no additional flags to
  preserve existing behavior of the API for out-of-tree users and
  alloc_zeroed_user_highpage_movable() sets the __GFP_MOVABLE flag.

o new_inode() documents that it uses GFP_HIGH_MOVABLE and callers are expected
  to call mapping_set_gfp_mask() if that is not suitable.


umm, OK.  Could we please have some sort of statement pinning down the
exact semantics of __GFP_MOVABLE, and what its envisaged applications are?

"An allocation marked __GFP_MOVABLE may be moved using either pagemigration or by paging out."

Right now, it's paging out. It isn't smart enough to use page migration.Bottom line, if a __GFP_MOVABLE allocation is in an awkward place, it canbe got rid of somewhow.

My concern is that __GFP_MOVABLE is useful for fragmentation-avoidance, but
useless for memory hot-unplug.

Anti-fragmentation did allow SPARSEMEM sections to be off-lined when itwas tested a long time ago so it's not useless. Where it could helpgeneral hotplug remove is by keeping non-movable allocations at the lowerPFNs as much as possible.

So that if/when hot-unplug comes along
we'll add more gunk which is a somewhat-superset of the GFP_MOVABLE
infrastructure, hence we didn't need the GFP_MOVABLE code.  Or something.

If/when hot-unplug comes along, it's going to need some way of identifyingpages that are safe to place in a hot-unpluggable areas so you'll end upwith something like __GFP_MOVABLE.

That depends on how we do hot-unplug, if we do it.  I continue to suspect
that it'll be done via memory zones: effectively by resurrecting
GFP_HIGHMEM.  In which case there's little overlap with anti-frag.

And will introduce a zone that must be tuned at boot-time which isundesirable but doable. With arch-independent zone-sizing in place, it'sconsiderably easier to create such a zone and then use __GFP_MOVABLE as azone modifier within the allocator. I have really old patches that dosomething like this that I can bring up to date. However, that zone willonly be usable by __GFP_MOVABLE pages and will not help the e1000 case forexample.

On the other hand anti-frag (exists) + keeping non-movable pages atlowest-possible-pfn (doesn't exist yet) would allow some DIMMs to beunplugged without needing additional zones or tuning.

(btw, I
have a suspicion that the most important application of memory hot-unplug
will be power management: destructively turning off DIMMs).


You're probably right.

I'd also like to pin down the situation with lumpy-reclaim versus
anti-fragmentation.  No offence


None taken.

, but I would of course prefer to avoid
merging the anti-frag patches simply based on their stupendous size.It seems to me that lumpy-reclaim is suitable for the e1000 problem
, but perhaps not for the hugetlbpage problem.

I believe you'll hit similar problems even with lumpy-reclaim for thee1000 (I've added Andy to the cc so he can comment more). Lumpy provides amuch smarter way of freeing higher-order contiguous blocks without havingto reclaim 95%+ of memory - this is good. However, if you are currentlyseeing situations where the allocations fails even after you page outeverything possible, smarter reclaim that eventually pages out everythinganyway will not help you (chances are it's something like page tables thatare in your way).

This is where anti-frag comes in. It clusters pages together based ontheir type - unmovable, reapable (inode caches, short-lived kernelallocations, skbuffs etc) and movable. When kswapd kicks in, the slabcaches will be reaped. As reapable pages are clustered together, that willfree some contiguous areas - probably enough for the e1000 allocations tosucceed!

If that doesn't work, kswapd and direct reclaim will start reclaiming the"movable" pages. Without lumpy reclaim, 95%+ of memory could be paged outwhich is bad. Lumpy finds the contiguous pages faster and with less IO,that's why it's important.

Tests I am aware of show that lumpy-reclaim on it's own makes little or nodifference to the hugetlb page problem. However, with anti-frag,hugetlb-sized allocations succeed much more often even when under memorypressure.

Whereas anti-fragmentation adds
vastly more code, but can address both problems?  Or something.

Anti-frag goes a long way to addressing both problems. Lumpy-reclaimincreases it's success rates under memory pressure and reduces the amountof reclaim that occurs.

IOW: big-picture where-do-we-go-from-here stuff.

Start with lumpy reclaim, then I'd like to merge page clustering piece bypiece, ideally with one of the people with e1000 problems testing to seedoes it make a difference.

Assuming they are shown to help, where we'd go from there would be stufflike;


1. Keep non-movable and reapable allocations at the lower PFNs as much as
   possible. This is so DIMMS for higher PFNs can be removed (doesn't
   exist)
2. Use page migration to compact memory rather than depending solely on
   reclaim (doesn't exist)
3. Introduce a mechanism for marking a group of pages as being offlined so
   that they are not reallocated (code that does something like this
   exists)
4. Resurrect the hotplug-remove code (exists, but probably very stale)
5. Allow allocations for hugepages outside of the pool as long as the
   process remains with it's locked_vm limits (patches were posted to
   libhugetlbfs last Friday. will post to linux-mm tomorrow).

--
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: [PATCH] Add __GFP_MOVABLE for callers to flag allocations that may be migrated
  - From: Andy Whitcroft <[email protected]>
- Re: [PATCH] Add __GFP_MOVABLE for callers to flag allocations that may be migrated
  - From: Andrew Morton <[email protected]>

References:
- [PATCH] Add __GFP_MOVABLE for callers to flag allocations that may be migrated
  - From: [email protected] (Mel Gorman)
- Re: [PATCH] Add __GFP_MOVABLE for callers to flag allocations that may be migrated
  - From: Andrew Morton <[email protected]>
- Re: [PATCH] Add __GFP_MOVABLE for callers to flag allocations that may be migrated
  - From: Mel Gorman <[email protected]>
- Re: [PATCH] Add __GFP_MOVABLE for callers to flag allocations that may be migrated
  - From: Andrew Morton <[email protected]>
- Re: [PATCH] Add __GFP_MOVABLE for callers to flag allocations that may be migrated
  - From: [email protected] (Mel Gorman)
- Re: [PATCH] Add __GFP_MOVABLE for callers to flag allocations that may be migrated
  - From: Andrew Morton <[email protected]>

Prev by Date: Re: [RFC][PATCH 2/2] x86_64: earlyprintk usb debug device support.
Next by Date: ata_piix multithreaded device probes breaks detection of SCSI device
Previous by thread: Re: [PATCH] Add __GFP_MOVABLE for callers to flag allocations that may be migrated
Next by thread: Re: [PATCH] Add __GFP_MOVABLE for callers to flag allocations that may be migrated
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]