Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory optimization #4107

Merged
merged 10 commits into from
Sep 12, 2024
Merged

Memory optimization #4107

merged 10 commits into from
Sep 12, 2024

Conversation

onvej-sl
Copy link
Contributor

@onvej-sl onvej-sl commented Aug 16, 2024

Solves #4079.

In this pull request, I optimized the implementations of blak256, blake2s, blake2b, groestl and AES in terms of flash size.

Name Size before Size after Throughput before Throughput after Usage
blake256 2329 B 1938 B 1.5 MB/s 0.59 MB/s decred
blake2b 15704 B 2336 B 2.3 MB/s 1.2 MB/s tezos, zcash, cardano
blake2s 4984 B 1804 B 3.1 MB/s 0.96 MB/s firmware hash, bootloader check, vendor header check
groestl512 9404 B 6088 B 0.16 MB/s 0.13 MB/s groestlcoin
AES 39964 B 9538 B
AES128 ECB 0.94 MB/s 0.80 MB/s
AES128 CCM 0.46 MB/s 0.43 MB/s optiga secure channel
AES128 GCM 0.31 MB/s 0.30 MB/s probably in THP
ripemd160 5788 B 1792 B 5.5 MB/s 1.3 MB/s bitcoin, eos, ripple, nem

It's strange that it is possible to reduce the size of AES by half with only minimal impact on its speed.

There are still two places that would make sense to optimize, but I haven't addressed them yet:

  • groestl512 that uses precomputed tables of size 4 KB
  • ripemd160 that has 5 KB

EDITED: I fixed the "Size before" of AES, which should have been 20480 B higher.

@onvej-sl
Copy link
Contributor Author

My suggestion is

  • to use the memory-optimized versions of blake2b, AES and groestl512
  • keep the existing (speed-optimized) version of blake2s

I don't have a strong opinion on blake256, as it only saves a few hundred bytes. I'm not sure if it is worth diverging from the original implementation.

What do you think @andrewkozlik , @matejcik ?

Copy link

core UI changes device test click test persistence test
T2T1 Model T test(screens) main(screens) test(screens) main(screens) test(screens) main(screens)
T2B1 Safe 3 test(screens) main(screens) test(screens) main(screens) 2724
T3T1 test(screens) main(screens) test(screens) main(screens) test(screens) main(screens)
All main(screens)

@TychoVrahe
Copy link
Contributor

+1 for keeping blake2s speed optimized, at least on F4 based models.

@matejcik
Copy link
Contributor

My suggestion is

  • to use the memory-optimized versions of blake2b, AES and groestl512
  • keep the existing (speed-optimized) version of blake2s

agreed

I don't have a strong opinion on blake256, as it only saves a few hundred bytes. I'm not sure if it is worth diverging from the original implementation.

let's keep the original then.

please go ahead and modify the PR accordingly.

crypto/aes/aesopt.h Outdated Show resolved Hide resolved
@onvej-sl
Copy link
Contributor Author

There are two things I'm considering:

  1. In groestl.c and blake2b.c, I'm using #ifdef directives to switch between the original speed-optimized implementation and the new memory-optimized implementation. I'm considering whether to keep the #ifdef directives or to use the speed-optimized implementation directly.
  2. blake2b.c and groestl.c are currently excluded from style formatting. However, I don't see a reason for this. The upstream implementations haven't changed for more than eight years. On the contrary, we occasionally make modifications to these implementations. Therefore, I'm considering including them in style formatting.

Any opinion, @andrewkozlik ?

@andrewkozlik
Copy link
Contributor

andrewkozlik commented Aug 27, 2024

  1. I am inclined to keeping the #ifdef directives, including for blake2s, e.g. say we are running out of room in the boardloader and want to use the smaller variant of blake2s, then it will come in handy. However, in case of groestl.c I wouldn't mind getting rid of the speed-optimized version because it makes very little difference in speed and because its reuse outside of groestlcoin is unlikely.

  2. On the contrary, we occasionally make modifications to these implementations.

    Until now our modifications were only variable initialization and memzero(). However, the changes in this PR are significant, making it difficult to keep in sync with upstream anyway if there were changes. So I agree about applying formatting. I'd consider doing it as a single commit in Update clang-format style #4073 to keep the number of reformatting commits low.

@andrewkozlik
Copy link
Contributor

There are still two places that would make sense to optimize, but I haven't addressed them yet:

  • groestl512 that uses precomputed tables of size 4 KB
  • ripemd160 that has 5 KB

Any update on ripemd160?

@onvej-sl
Copy link
Contributor Author

Any update on ripemd160?

I didn't succeed in optimizing it myself. However a found this implementation which is 1792 B in size and has decent speed. I added all the values to the table above.

@andrewkozlik
Copy link
Contributor

I am slightly in favor of replacing the RIPEMD implementation too with the one you found.

@onvej-sl onvej-sl marked this pull request as ready for review September 1, 2024 19:31
crypto/ripemd160.c Outdated Show resolved Hide resolved
crypto/ripemd160.c Outdated Show resolved Hide resolved
@andrewkozlik
Copy link
Contributor

As I mentioned earlier, I think it would be good have the #ifdef for blake2s as well. Firstly, it would be good to keep the blake2b and blake2s implementations consistent. Secondly, in the bootloader we may need to use the smaller blake2s version in the future.

That means we will need to add OPTIMIZE_SIZE_BLAKE2S, OPTIMIZE_SIZE_BLAKE2B, OPTIMIZE_SIZE_ED25519, etc., which is a good thing, because it explicitly documents what we are doing, whereas right now it is not obvious that blake2s is not size-optimized. Furthermore it documents what speed/size tradeoffs are available without having to investigate the source code. The definition in options.h should default to OPTIMIZE_SIZE when undefined to make it easy to use:

#ifndef OPTIMIZE_SIZE_BLAKE2S
#define OPTIMIZE_SIZE_BLAKE2S OPTIMIZE_SIZE
#endif

#ifndef OPTIMIZE_SIZE_BLAKE2B
#define OPTIMIZE_SIZE_BLAKE2B OPTIMIZE_SIZE
#endif

// ...

crypto/groestl.c Outdated Show resolved Hide resolved
crypto/blake2b.c Outdated Show resolved Hide resolved
Copy link
Contributor

@andrewkozlik andrewkozlik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please set OPTIMIZE_SIZE_BLAKE2S to 0 somewhere for the FW build.

Other than that, high level ACK and blake2s/blake2b ACK from me.

I haven't reviewed the details the Groestl refactor and the new RIPEMD implementation. @M1nd3r please have a look at groestl.c and ripemd160.c.

Copy link
Contributor

@M1nd3r M1nd3r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed groestl and ripemd160, looks good.

crypto/ripemd160.h Show resolved Hide resolved
@onvej-sl
Copy link
Contributor Author

onvej-sl commented Sep 9, 2024

@prusnak Your suggestion was addressed in d707e54. I think I cannot merge until you confirm it was addressed.

@onvej-sl onvej-sl merged commit 7a992a5 into main Sep 12, 2024
82 of 85 checks passed
@onvej-sl onvej-sl deleted the onvej-sl/memory-optimization branch September 12, 2024 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Approved
Development

Successfully merging this pull request may close these issues.

6 participants