David Miller a écrit :
From: Eric Dumazet <[email protected]>
Date: Mon, 04 Dec 2006 22:34:29 +0100
On a 200 MHz sparcv9 machine, the division takes 64 cycles instead of 1 cycle
for a multiply.
For UltraSPARC I and II (which is what this 200mhz guy probably is),
it's 4 cycle latency for a multiply (32-bit or 64-bit) and 68 cycles
for a 64-bit divide (32-bit divide is 37 cycles).
I must have Ultra-2 (running Solaris :( )
for (ui = 0 ; ui < 100000000 ; ui++)
val += reciprocal_divide(ui, reciproc);
100000cb0: 83 31 20 00 srl %g4, 0, %g1
100000cb4: 82 48 40 12 mulx %g1, %l2, %g1
100000cb8: 83 30 70 20 srlx %g1, 0x20, %g1
100000cbc: 88 01 20 01 inc %g4
100000cc0: 80 a1 00 05 cmp %g4, %g5
100000cc4: 08 4f ff fb bleu %icc, 100000cb0
100000cc8: b0 06 00 01 add %i0, %g1, %i0
I confirm that this block uses 20 cycles/iteration,
while next one uses 72 cycles/iteration
for (ui = 0 ; ui < 100000000 ; ui++)
val += ui / value;
100000ca8: 83 31 20 00 srl %g4, 0, %g1
100000cac: 82 68 40 11 udivx %g1, %l1, %g1
100000cb0: 88 01 20 01 inc %g4
100000cb4: 80 a1 00 05 cmp %g4, %g5
100000cb8: 08 4f ff fc bleu %icc, 100000ca8
100000cbc: b0 06 00 01 add %i0, %g1, %i0
UltraSPARC-III and IV are worse, 6 cycles for multiply and 40/71
cycles (32/64-bit) for integer divides.
Niagara is even worse :-) 11 cycle integer multiply and a 72 cycle
integer divide (regardless of 32-bit or 64-bit).
(more details in gcc/config/sparc/sparc.c:{ultrasparc,ultrasparc3,niagara}_cost).
So this change has tons of merit for sparc64 chips at least :-)
Also, the multiply can parallelize with other operations but it
seems that integer divide stalls the pipe for most of the duration
of the calculation. So this makes the divide even worse.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]