vendor rust-crypto aes/soft implementation #2024

rohhamh · 2024-04-10T20:21:34Z

I've tried following the instructions at #1886 (comment)

I ran the tests for fixslice64 and they all seem to pass.
However I'm very reluctant about some of the changes made here,
like increasing rd_key's size to 240 and having u32-u64 conversion functions
or using self.inner.rounds to differentiate between the target functions
or sending less than four actual blocks for encryption to the vendored code.

This should fail for fixslice32, I would like it if I could have your opinion on how you think it is best to handle fixslice32 as it takes u32 parameters.

The code in ctr32_encrypt_within is basically my attempt at replicating what aes_nohw_ctr32_encrypt_blocks in aes_nohw.c does, I imagine it's accurate as the tests did pass.

There's also the contribution guide.
I'm not sure if the statement should be added to every commit message or one would suffice,
or how the license would apply to the vendored code.

And finally, this is basically the first time I've done anything in rust or cryptography, so I imagine there would be many ways to improve the code. I'd be very happy to know about them.

RustCrypto/block-ciphers commit: 6b263c07436a9861c8456bc73f81ff075673f7a5

allow deadcode for decrypt and 192bit functions

briansmith

Thank you very much for doing this. I am sorry to admit that I overlooked that the Rust Crypto key representation was so much larger when I originally suggested this plan. I think it's important for us to find a way to minimize (ideally eliminate) the size overhead, at least for targets that are likely to have 128-bit SIMD and/or hardware instructions available.

briansmith · 2024-04-11T17:52:28Z

include/ring-core/aes.h

@@ -60,7 +60,7 @@
 // aes_key_st should be an opaque type, but EVP requires that the size be
 // known.
 struct aes_key_st {
-  uint32_t rd_key[4 * (AES_MAXNR + 1)];
+  uint32_t rd_key[240];


First, most fundamentally, we cannot change the AES_KEY structure because the assembly language code needs exactly this layout. So, we'd have to use an enum/union of AES_KEY and another key type, if we need to have more round keys for the new fallback implementation.

4 * (AES_MAXNR + 1) is 60, so this is making the keys 4x the size they were previously. We should spend effort to reduce this. When I originally suggested we use the rust-crypto implementation as the fallback, I did not realize that it had this huge space overhead.

IIRC, in their blog post, the Rust Crypto team said that this implementation is faster than bitslicing when SIMD isn't available, but slower than bitslicing when (128-bit) SIMD is available. Well, in practice almost all targets should be able to use a bitsliced SIMD implementation, and only fall back to this when SIMD isn't available. That means that any significant space overhead for this fallback implementation is very hard to accept.

What's your opinion on this?

I am sorry for the delayed response.
in response to your comment I must say that I would personally enjoy it very much if I can help with ring and would be happy to try to fix the issues you find in the PR.

I imagine you mean something like the following by having an enum/union of AES_KEY.
To my understanding, something like this would still cause an overhead as one of the variants would still require allocating a [u64; 120]

// Keep this in sync with `AES_MAXNR` in aes.h. const MAX_ROUNDS: usize = 14; // Keep this in sync with AES_KEY in aes.h. #[repr(C)] #[derive(Clone)] pub(super) struct AES_KEY_HW { pub rd_key: [u32; 4 * (MAX_ROUNDS + 1)], pub rounds: c::uint, } #[derive(Clone)] pub(super) enum AES_NOHW_ROUND_KEY_VARIANT { v128([u64; 88]), v256([u64; 120]), } #[repr(C)] #[derive(Clone)] pub(super) struct AES_KEY_NOHW { pub rd_key: AES_NOHW_ROUND_KEY_VARIANT, pub rounds: u8, } #[derive(Clone)] pub(super) enum AES_KEY { HW(AES_KEY_HW), NOHW(AES_KEY_NOHW), }

I'm not sure what other solutions are available,
Do you think using Vec or Box would be valid options for us?

And about this being slower than bitslicing with SIMD, are suggesting to keep the vendored code only as fallback and for when SIMD instructions aren't available?

briansmith · 2024-04-11T17:58:41Z

src/aead/aes.rs

@@ -267,7 +313,76 @@ impl Key {
            }

            Implementation::NOHW => {
-                ctr32_encrypt_blocks!(aes_nohw_ctr32_encrypt_blocks, in_out, src, &self.inner, ctr)
+                let in_out_len = in_out[src.clone()].len();


If the logic is this large, then it should be split out into a separate standalone function that is called from here, much like bsaes_ctr32_encrypt_blocks_with_vpaes_key just was.

briansmith · 2024-04-11T18:02:53Z

src/rust_crypto/aes/soft/fixslice32.rs

+        }
+
+        xor_columns(&mut rkeys, rk_off, 8, ror_distance(1, 3));
+    }


It looks like maybe we could split this into two functions so that the above logic initializes an AES_KEY structure as it exists today, which would be stored in the aes::Key. Then whenever we do an encryption/decryption operation, we would first execute the below logic to expand the round keys into a temporary buffer that contains the fixedslice representation. Of course, this would add some overhead to each encryption batch. Then we'd need to measure to see if it would still be faster than the non-SIMD bitsliced implementation.

WDYT?

Is it so we won't change the AES_KEY structure of today's and/or have smaller changes to the current codebase?

I'm afraid I'm not well-versed enough in the subject to have a definitive answer, normally, I'd say that if we can avoid the overhead and have the expansion happen only once then we could do just that.
But I imagine it wouldn't matter much if the difference in performance is negligible when compared to what rust-crypto provides at best (and is still faster than what's currently available).

Do you think this could be the best approach?

briansmith · 2024-04-11T18:04:03Z

src/rust_crypto/aes/soft/fixslice32.rs

+
+/// Fully bitsliced AES-192 key schedule to match the fully-fixsliced representation.
+#[allow(dead_code)]
+pub(crate) fn aes192_key_schedule(key: &[u8; 24]) -> FixsliceKeys192 {


I think we can add a commit to just delete the aes192-specific code.

briansmith · 2024-04-11T18:05:37Z

src/rust_crypto/aes/soft/fixslice32.rs

+            inv_shift_rows_1(&mut rkeys[i..(i + 8)]);
+        }
+    }
+    #[cfg(not(aes_compact))]


And/or we should consider deleting all the not(aes_compact) code paths and reduce the key size accordingly.

briansmith · 2024-04-11T18:07:15Z

src/rust_crypto/aes/soft/fixslice32.rs

+        inv_shift_rows_2(&mut state);
+    }
+
+    let mut rk_off = 72;


This makes me think that many of the round keys in rkeys are only needed for decryption and not encryption? If so, we could reduce the size of rkeys by half by just supporting only encryption?

We might have to check more accurately, but I think that's not the case as the following is part of the code of encryption

let mut rk_off = 8; loop { sub_bytes(&mut state); mix_columns_1(&mut state); add_round_key(&mut state, &rkeys[rk_off..(rk_off + 8)]); rk_off += 8; #[cfg(aes_compact)] { shift_rows_2(&mut state); } if rk_off == 80 { break; }

rohhamh · 2024-07-08T06:07:35Z

Is the PR not relevant anymore? @briansmith

briansmith · 2024-07-09T15:28:48Z

Is the PR not relevant anymore? @briansmith

PR #2070 is somewhat of an alternative to this PR that is more conservative because it doesn't increase the size of the key. I think especially in the case where it is more common to have a hardware implementation, avoiding increasing the key size is important.

In the case where there are vector instructions available, my understanding is that the 128-bit SIMD version of the code in PR #2070 (to be done later) will be faster. So I think that it makes sense to merge PR #2070 and also to port the 128-bit SIMD version from BoringSSL to Rust.

I think this PR is still potentially useful for platforms where we're always going to use the fallback implementation and there are no vector instructions available.

And/or, like I mentioned before, if we can figure out a way to keep the key size the same as it is now, and "expand" the key schedule on-demand during each encryption operation, then this would be useful in the case where vector instructions cannot be used, even as the fallback implementation.

rohhamh added 4 commits March 31, 2024 17:32

vendor rust-crypto aes/soft implementation briansmith#1886

54c25cc

RustCrypto/block-ciphers commit: 6b263c07436a9861c8456bc73f81ff075673f7a5

use rust_crypto aes instead of aes_nohw.c

8e08844

adapt ctr32_encrypt_blocks

5dad8a3

rm unused code

9c81b2f

allow deadcode for decrypt and 192bit functions

rohhamh changed the title ~~vendor rust-crypto aes/soft implementation https://github.com/briansmith/ring/issues/1886~~ vendor rust-crypto aes/soft implementation Apr 10, 2024

briansmith reviewed Apr 11, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vendor rust-crypto aes/soft implementation #2024

vendor rust-crypto aes/soft implementation #2024

rohhamh commented Apr 10, 2024

briansmith left a comment

briansmith Apr 11, 2024

rohhamh May 9, 2024

briansmith Apr 11, 2024

briansmith Apr 11, 2024

rohhamh May 9, 2024

briansmith Apr 11, 2024

briansmith Apr 11, 2024

briansmith Apr 11, 2024

rohhamh May 9, 2024

rohhamh commented Jul 8, 2024

briansmith commented Jul 9, 2024

vendor rust-crypto aes/soft implementation #2024

Are you sure you want to change the base?

vendor rust-crypto aes/soft implementation #2024

Conversation

rohhamh commented Apr 10, 2024

briansmith left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rohhamh commented Jul 8, 2024

briansmith commented Jul 9, 2024