Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jeff Garzik wrote:
Matt Mackall wrote:
Have you benchmarked this against lib/sha1.c? Please post the results.
Until then, I'm frankly skeptical that your unrolled version is faster
because when I introduced lib/sha1.c the rolled version therein won by
a significant margin and had 1/10th the cache footprint.

See the benchmark tables in patch 0 at the head of this thread. Performance improved by at least 25% in every test, and 40-60% was more common for the 32-bit version (on a Pentium IV).

It's not just the loop unrolling; it's the register allocation and spilling. For comparison, I built SHATransform() from the drivers/char/random.c in 2.6.11, using gcc 3.3.5 with -O2 and SHA_CODE_SIZE == 3 (i.e., fully unrolled); I'm guessing this is pretty close to what you tested back then. The resulting code is 49% MOV instructions, and 80% of *those* involve memory. gcc4 is somewhat better, but it still spills a whole lot, both for the 2.6.11 unrolled code and for the current lib/sha1.c.

In contrast, the assembly implementation in this patch only has to go to memory for data and workspace (with one small exception in the F3 rounds), and the workspace has a fifth of the cache footprint of the default implementation.

Yes. And it also depends on the CPU as well. Testing on a server-class x86 CPU (often with bigger L2, and perhaps even L1, cache) will produce different result than from popular but less-capable "value" CPUs.

Good point.  I benchmarked the 32-bit assembly code on a couple more boxes:

=== AMD Duron, average of 5 trials ===
Test#  Bytes/  Bytes/  Cyc/B  Cyc/B  Change
        block  update    (C)  (asm)
    0      16      16    104     72     31%
    1      64      16     52     36     31%
    2      64      64     45     29     36%
    3     256      16     33     23     30%
    4     256      64     27     17     37%
    5     256     256     24     14     42%
    6    1024      16     29     20     31%
    7    1024     256     20     11     45%
    8    1024    1024     19     11     42%
    9    2048      16     28     20     29%
   10    2048     256     19     11     42%
   11    2048    1024     18     10     44%
   12    2048    2048     18     10     44%
   13    4096      16     28     19     32%
   14    4096     256     18     10     44%
   15    4096    1024     18     10     44%
   16    4096    4096     18     10     44%
   17    8192      16     27     19     30%
   18    8192     256     18     10     44%
   19    8192    1024     18     10     44%
   20    8192    4096     17     10     41%
   21    8192    8192     17     10     41%

=== Classic Pentium, average of 5 trials ===
Test#  Bytes/  Bytes/  Cyc/B  Cyc/B  Change
        block  update    (C)  (asm)
    0      16      16    145    144      1%
    1      64      16     72     61     15%
    2      64      64     65     52     20%
    3     256      16     46     39     15%
    4     256      64     39     32     18%
    5     256     256     36     29     19%
    6    1024      16     40     33     18%
    7    1024     256     30     23     23%
    8    1024    1024     29     23     21%
    9    2048      16     39     32     18%
   10    2048     256     29     22     24%
   11    2048    1024     28     22     21%
   12    2048    2048     28     22     21%
   13    4096      16     38     32     16%
   14    4096     256     28     22     21%
   15    4096    1024     28     21     25%
   16    4096    4096     27     21     22%
   17    8192      16     38     32     16%
   18    8192     256     28     22     21%
   19    8192    1024     28     21     25%
   20    8192    4096     27     21     22%
   21    8192    8192     27     21     22%

The improvement isn't as good, but it's still noticeable.

--Benjamin Gilbert

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux