Arjan van de Ven wrote:
On Wed, 24 Oct 2007 21:29:56 -0700
"David Schwartz" <[email protected]> wrote:
Well that's exactly right. For threaded programs (and maybe even
real-world non-threaded ones in general), you don't want to be
even _reading_ global variables if you don't need to. Cache misses
and cacheline bouncing could easily cause performance to completely
tank in some cases while only gaining a cycle or two in
microbenchmarks for doing these funny x86 predication things.
For some CPUs, replacing an conditional branch with a conditional
move is a *huge* win because it cannot be mispredicted.
please name one...
Hint: It's not one made by either Intel or AMD in the last 4 years...
It is a win if the branch cannot be effectively predicted, i.e. if the
outcome is essentially random, as may occur with data-dependent
conditionals. I've seen a doubling of performance on one workload using
a predicated instruction instead of a branch on newer Xeons in such a case.
I suspect that if branch prediction fails often, the data dependency
created by the cmov, etc. is less expensive than the pipeline flush
required by mispredicts..
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]