On Thu, 3 Nov 2005, Linus Torvalds wrote:
>
>
> On Thu, 3 Nov 2005, Martin J. Bligh wrote:
> >
> > Ha. Just because I don't think I made you puke hard enough already with
> > foul approximations ... for order 2, I think it's
>
> Your basic fault is in believing that the free watermark would stay
> constant.
>
> That's insane.
>
> Would you keep 8MB free on a 64MB system?
>
> Would you keep 8MB free on a 8GB system?
>
> The point being, that if you start with insane assumptions, you'll get
> insane answers.
>
> The _correct_ assumption is that you aim to keep some fixed percentage of
> memory free. With that assumption and your math, finding higher-order
> pages is equally hard regardless of amount of memory.
>
> Now, your math then doesn't allow for the fact that buddy automatically
> coalesces for you, so in fact things get _easier_ with more memory, but
> hey, that needs more math than I can come up with (I never did it as math,
> only as simulations with allocation patterns - "smart people use math,
> plodding people just try to simulate an estimate" ;)
>
My math is not that great either, so here is a simulation.
Setup: Reboot the machine which is a quad xeon xSeries 350 with 1.5GiB of
RAM. Configure /proc/sys/vm/min_free_kbytes to try and keep 1/8th of
physical memory free. This is to keep in line with your suggestion that
fragmentation is low when there is a higher percentage of memory free.
Load: Run a load - 7 kernels compiling simultaneously at -j2 which gives
loads between 10-14. Try and get 50% worth of physical memory in 4MiB
pages (1024 contiguous pages) while compiling. When the test ends and the
system is quiet, try again. 4MiB in this case is a single HugeTLB page.
Here are the results;
2.6.14-rc5-mm1-clean (OOM killer disabled) Allocating Under Load
Order: 10
Allocation type: HighMem
Attempted allocations: 160
Success allocs: 24
Failed allocs: 136
DMA zone allocs: 0
Normal zone allocs: 16
HighMem zone allocs: 8
% Success: 15
2.6.14-rc5-mm1-mbuddy-v19 Allocating Under Load
Order: 10
Allocation type: HighMem
Attempted allocations: 160
Success allocs: 24
Failed allocs: 136
DMA zone allocs: 0
Normal zone allocs: 11
HighMem zone allocs: 13
% Success: 15
Not a lot of difference there and the success rate is not great.
mbuddy-v19 is a bit better at the normal zone and that's about it. These
results are not surprising as kswapd is making no effort to get contiguous
pages. Under a load of 7 kernel compiles, kswapd will not free pages fast
enough.
When the test ends and the system is quiet, try and get 80% of physical
memory in large pages. 4 attempts are made to satisfy the requests to give
kswapd lots of time.
2.6.14-rc5-mm1-clean (OOM killer disabled) Allocating while rested
Order: 10
Allocation type: HighMem
Attempted allocations: 300
Success allocs: 159
Failed allocs: 141
DMA zone allocs: 0
Normal zone allocs: 46
HighMem zone allocs: 113
% Success: 53
Mainly highmem there.
2.6.14-rc5-mm1-mbuddy-v19 Allocating while rested
Order: 10
Allocation type: HighMem
Attempted allocations: 300
Success allocs: 212
Failed allocs: 88
DMA zone allocs: 0
Normal zone allocs: 102
HighMem zone allocs: 110
% Success: 70
Look at the big difference in the number of successful allocations in
ZONE_NORMAL because the kernel allocations were kept together. Experience
has shown me that failure to get higher success rates depended on per-cpu
pages and the number of kernel pages that leaked to other areas (56 over
the course of this test). Kernel pages leaking was helped a lot by setting
min_free_kbytes higher than the default.
I then ported forward the linear scanner and ran the tests again. The
linear scanner does two things - finds linear reclaimable pages using
information provided by anti-defrag and drains the per-cpu caches. I'll
post the linear scanner code if people want to look at it but it's really
crap. It's slow, works too hard and doesn't try to hold on to the pages
for the process reclaiming the pages are just some of it's problems. I
need to rewrite it almost from scratch and avoid all the mistakes but it's
a path that is hit *only* if you are allocating high orders.
2.6.14-rc5-mm1-mbuddy-v19-lnscan Allocating under load
Order: 10
Allocation type: HighMem
Attempted allocations: 160
Success allocs: 155
Failed allocs: 0
DMA zone allocs: 0
Normal zone allocs: 12
HighMem zone allocs: 143
% Success: 96
Mainly got it's pages back from highmem which is always easier as long as
PTE pages are not in the way.
2.6.14-rc5-mm1-mbuddy-v19-lnscan Allocating while rested
Order: 10
Allocation type: HighMem
Attempted allocations: 300
Success allocs: 275
Failed allocs: 0
DMA zone allocs: 0
Normal zone allocs: 133
HighMem zone allocs: 142
% Success: 91
That is 71% of physical memory available in contiguous blocks with the
linear scanner but that code is not ready. anti-defrag on it's own as it
is today was able to get 55% of physical memory in 4MiB chunks.
This is provided without performance regressions in the normal case
everyone cares about. In my tests, there are minor improvements on aim9
which is artificial, and gained a few seconds on kernel build tests which
people do care about.
Does these patches still make no sense to you? Lower fragmentation that
does not impact the cases everyone cares about? If so, why?
To get the best possibly results, a zone approach could still be built on
top of this and it seems as if it's worth developing. At the cost of some
configuration, the zone would give *hard* guarantees on the available
number of large pages and anti-defrag would give best effort everywhere
else. By default without configuration, you would get best-effort.
--
Mel Gorman
Part-time Phd Student Java Applications Developer
University of Limerick IBM Dublin Software Lab
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]