On Fri, 17 Nov 2006, dean gaudet wrote:

> another pointer chase arranged to fill the L1 (or L2) using many many 
> pages.  i.e. suppose i wanted to traverse 32KiB L1 with 64B cache lines 
> then i'd allocate 512 pages and put one line on each page (pages ordered 
> randomly), but colour them so they fill the L1.  this conveniently happens 
> to fit in a 2MiB huge page on x86, so you could even ameliorate the TLB 
> pressure from the microbenchmark.

btw, for L2-sized measurements you don't need to be so clever... you can 
get away with a random traversal of the L2 on 128B boundaries.  (need to 
avoid the "next-line prefetch" issues on p-m/core/core2, p4 model 3 and 
later.)  there's just so many more pages required to map the L2 than any 
reasonable prefetcher is going to have any time soon.


> the benchmark i was considering would be like so:
> 	switch to cpu m
> 	scrub the cache
> 	switch to cpu n
> 	scrub the cache
> 	traverse the coloured list and modify each cache line as we go
> 	switch to cpu m
> 	start timing
> 	traverse the coloured list without modification
> 	stop timing
