Re: [Lhms-devel] [PATCH 0/7] Fragmentation Avoidance V19

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



* Andy Nelson <[email protected]> wrote:

> I think it was Martin Bligh who wrote that his customer gets 25% 
> speedups with big pages. That is peanuts compared to my factor 3.4 
> (search comp.arch for John Mashey's and my name at the University of 
> Edinburgh in Jan/Feb 2003 for a conversation that includes detailed 
> data about this), but proves the point that it is far more than just 
> me that wants big pages.

ok, this posting of you seems to be it:

 http://groups.google.com/group/comp.sys.sgi.admin/browse_thread/thread/39884db861b7db15/e0332608c52a17e3?lnk=st&q=&rnum=35#e0332608c52a17e3

|  Timing for the tree traveral+gravity calculation were
|
|   16MBpages    1MBpages    64kpages
|    1  *          *         2361.8s
|    8  86.4s     198.7s      298.1s
|   16  43.5s      99.2s      148.9s
|   32  22.1s      50.1s       75.0s
|   64  11.2s      25.3s       37.9s
|   96   7.5s      17.1s       25.4s
|
|   (*) test not done.
|
|   As near as I can tell the numbers show perfect
|   linear speedup for the runs for each page size.
|
|   Across different page sizes there is degradation
|   as follows:
|
|   16m --> 64k   decreases by a factor 3.39 in speed
|   16m --> 1m    decreases by a factor 2.25 in speed
|   1m  --> 64k   decreases by a factor 1.49 in speed

[...]
|
|   Sum over cpus of TLB miss times for each test:
|
|   16MBpages    1MBpages    64kpages
|    1                       3489s
|    8  64.3s     1539s      3237s
|   16  64.5s     1540s      3241s
|   32  64.5s     1542s      3244s
|   64  64.9s     1545s      3246s
|   96  64.7s     1545s      3251s
|
|   Thus the 16MB pages rarely produced page misses,
|   while the 64kB pages used up 2.5x more time than
|   the floating point operations that we wanted to
|   have. I have at least some feeling that the 16MB pages
|   rarely caused misses because with a 128 entry
|   TLB (on the R12000 cpu) that gives about 1GB of
|   addressible memory before paging is required at all,
|   which I think is quite comparable to the size of
|   the memory actually used.

to me it seems that this slowdown is due to some inefficiency in the 
R12000's TLB-miss handling - possibly very (very!) long TLB-miss 
latencies? On modern CPUs (x86/x64) the TLB-miss latency is rarely 
visible. Would it be possible to run some benchmarks of hugetlbs vs. 4K 
pages on x86/x64?

if my assumption is correct, then hugeTLBs are more of a workaround for 
bad TLB-miss properties of the CPUs you are using, not something that 
will inevitably happen in the future. Hence i think the 'factor 3x' 
slowdown should not be realistic anymore - or are you still running 
R12000 CPUs?

	Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux