On Sunday 04 June 2006 21:10, Andi Kleen wrote:
> On Sunday 04 June 2006 15:16, Joachim Fritschi wrote:
> > This patch adds the twofish x86_64 assembler routine.
> >
> > Changes since last version:
> > - The keysetup is now handled by the twofish_common.c (see patch 1 )
> > - The last round of the encrypt/decrypt routines where optimized saving 5
> > instructions.
> >
> > Correctness was verified with the tcrypt module and automated test
> > scripts.
>
> Do you have some benchmark numbers that show that it's actually worth
> it?
Here are the outputs from the tcrypt speedtests. They haven't changed much
since the last patch:
http://homepages.tu-darmstadt.de/~fritschi/twofish/tcrypt-speed-c-i586.txt
http://homepages.tu-darmstadt.de/~fritschi/twofish/tcrypt-speed-asm-i586.txt
http://homepages.tu-darmstadt.de/~fritschi/twofish/tcrypt-speed-c-x86_64.txt
http://homepages.tu-darmstadt.de/~fritschi/twofish/tcrypt-speed-asm-x86_64.txt
Summary for cycles used for CBC encrypt decrypt (256bit / 8k blocks) assembler
vs. generic-c:
i586 encrypt: - 17%
i568 decrypt: -24%
x86_64 encrypt: -22%
x86_64 decrypt: -17%
The numbers vary a bit with different blocksizes / keylength and per test.
I also did some filesystem benchmarks (bonnie++) with various ciphers. Most
write tests maxed out my drives writing to disk. But at least for the read
speed you can see some notable performance improvements:
(Note: The x86 and x86_64 numbers are not comparable since the tests were done
on different machines)
http://homepages.tu-darmstadt.de/~fritschi/twofish/output_20060531_160442_x86.html
Summary:
Sequential read speed improved between 25-32%
Sequential write speed improved at least 15% but the disk maxed out
Twofish 256 is a little bit faster than AES 128
http://homepages.tu-darmstadt.de/~fritschi/twofish/output_20060601_113747_x86_64.html
Summary:
Sequential read speed improved 13%
Seqential write speed maxed out the drives
> > +/* Defining a few register aliases for better reading */
>
> Maybe you can read it now better, but for everybody else it is extremly
> confusing. It would be better if you just used the original register names.
The problem with the registers is that the original naming is really crappy in
the first place. The lower registers use the old x86 naming scheme plus some
extensions but the upper registers are totally different. Since i use a lot
of the lower / higher 8bit and 32 bit parts it would be virtually impossible
to write simple macros with this naming scheme because there would be no easy
way to switch from a (unknown) register to its 8bit subregister or the 32bit
in a macro. While there might be some way to do this, i have not found any
example in the kernel or any other source code.
As an explanation i can only add that i looked at the aes assembler
implementation and used it as a example. I know refering to bad coding style
in the kernel is no excuse for more bad coding but it seems to me that it is
the only way to deal with the insane original register naming.
Imho using the original names would only complicate the macros and complicate
understanding the algorithm itself. For the algorithm itself the the actual
registers are simply irrelevant . It calls no other functions or uses any
syscalls. It only uses them as storage. That way reffering to them in a
numbered order with a suffix for 8/16/32bit was an easy way to improve
readability and easier programming.
There might be some way to further improve readability but i have not found
any other way. I'm open to suggestions :)
Regards,
Joachim
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]