Re: [CRYPTO] is it really optimized ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi

[Sorry for the late answer]

On 4/19/07, Francis Moreau <[email protected]> wrote:
On 4/17/07, Roland Dreier <[email protected]> wrote:
>  > > It seems trivial to keep the last key you were given and do a quick
>  > > memcmp in your setkey method to see if it's different from the last
>  > > key you pushed to hardware, and set a flag if it is.  Then only do
>  > > your set_key() if you have a new key to pass to hardware.
>  > >
>  > > I'm assuming the expense is in the aes_write() calls, and you could
>  > > avoid them if you know you're not writing something new.
>
>  > that's a wrong assumption. aes_write()/aes_read() are both used to
>  > access to the controller and are slow (no cache involved).
>
> Sorry, I wasn't clear.  I meant that the hardware access is what is
> slow, and that anything you do on the CPU is relatively cheap compared
> to that.
>
> So my suggestion is just to keep a cache (in CPU memory) of what you
> have already loaded into the HW, and before reloading the HW just
> check the cache and don't do the actual HW access if you're not going
> to change the HW contents.  So you avoid any extra aes_write and
> aes_read calls in the cache hit case.
>
> This would have the advantage of making anything that does lots of
> bulk encryption fast without special casing ecryptfs.
>

I'm not sure how "memcmp(key, cache, KEY_SIZE)" would impact AES
performance. I need to give it a test but can't today. I'll do
tomorrow and give you back the result.


OK, I gave it a test and it appears that the cache hit case is
slightly worse than unconditionnal key loading. So it means that
testing that hte key is cached is as long as loading the key into the
controller. Here is what I did in set_key() function:

static void set_key(const char *key)
{
	static u32 my_key[4] __cacheline_aligned;

	u32 key0 = *(const u32 *)(key + 12);
	u32 key1 = *(const u32 *)(key + 8);
	u32 key2 = *(const u32 *)(key + 4);
	u32 key3 = *(const u32 *)(key);
	int timeout = 100;
	u32 miss = 0;

	miss |= key0 ^ my_key[0];
	miss |= key1 ^ my_key[1];
	miss |= key2 ^ my_key[2];
	miss |= key3 ^ my_key[3];
	if (miss == 0)
		return;

	my_key[0] = key0;
	my_key[1] = key1;
	my_key[2] = key2;
	my_key[3] = key3;
	
	aes_write(be32_to_cpu(key0), AES_KEY0);
	aes_write(be32_to_cpu(key1), AES_KEY1);
	aes_write(be32_to_cpu(key2), AES_KEY2);
	aes_write(be32_to_cpu(key3), AES_KEY3);

	/* generate dkey: should take 11 cycles */
	aes_write(aes_read(AES_CR) | CR_DKEYGEN, AES_CR);

	while (aes_read(AES_CR) & CR_DKEYGEN) {
		if (--timeout == 0)
			break;
	}
}

So I was wrong, hardware access is not so expensive as I thought. But
it also means that all instructions executed in the drivers'
encrypt()/decrypt() methods have a real cost and skipping key loadings
is a win.

Using the driver exclusively doesn't seem to be the right solution,
but I don't see another way to do that...
--
Francis
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux