Andi Kleen wrote:
> "Bela Lubkin" <[email protected]> writes:
> >
> > /*
> > * Dirty a big buffer in a hard-to-predict (for the L2 cache) way. This
> > * is the operation that is timed, so we try to generate unpredictable
> > * cachemisses that still end up filling the L2 cache:
> > */
>
> The comment is misleading anyways. AFAIK several of the modern
> CPUs (at least K8, later P4s, Core2, POWER4+, PPC970) have prefetch
> predictors advanced enough to follow several streams forward and backwards
> in parallel.
>
> I hit this while doing NUMA benchmarking for example.
>
> Most likely to be really unpredictable you need to use a
> true RND and somehow make sure still the full cache range
> is covered.
The corrected code in <http://bugzilla.kernel.org/show_bug.cgi?id=7476#c4>
covers the full cache range. Granted that modern CPUs may be able to track
multiple simultaneous cache access streams: how many such streams are they
likely to be able to follow at once? It seems like going from 1 to 2 would
be a big win, 2 to 3 a small win, beyond that it wouldn't likely make much
incremental difference. So what do the actual implementations in the field
support?
The code (original and corrected) uses 6 simultaneous streams.
I have a modified version that takes a `ways' parameter to use an arbitrary
number of streams. I'll post that onto bugzilla.kernel.org.
>Bela<
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]