Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

serde library should have hash functions #105

Closed
catgirlinspace opened this issue Sep 18, 2023 · 12 comments · Fixed by #193
Closed

serde library should have hash functions #105

catgirlinspace opened this issue Sep 18, 2023 · 12 comments · Fixed by #193
Labels
enhancement New feature or request

Comments

@catgirlinspace
Copy link
Contributor

The serde library should include some general hashing functions.

Some possible hash functions that could be implemented are

  • md5
  • SHA-1
  • SHA-256
  • SHA-512
  • PBKDF2
  • Argon2
  • HMAC

Ideally, there'd probably be more. This is just what I came up with. I'd imagine that all of these have Rust implementations available?

@filiptibell filiptibell added the enhancement New feature or request label Sep 18, 2023
@Dekkonot
Copy link
Contributor

Given several of these operate with 64 bit integers, it's non-trivial to implement them in pure Luau. It'd be easier and significantly faster to do on the Rust side.

For reference, here's a SHA-512 implementation I made. It's likely a significant portion of people don't have the knowledge or quite frankly will power to implement these on their own just due to the sheer amount of bit math involved to treat two 32-bit integers as one 64-bit one.

@CompeyDev
Copy link
Contributor

This would be fairly simple to implement in Rust, there's a bunch of well-vetted implementations for most most cryptographic algorithms.

@Dekkonot
Copy link
Contributor

If hashing support does get added, here's a list of my recommendations, along with why:

  • SHA-2 and its flavors - NIST standard and widespread use
  • SHA-3 and its flavors - NIST standard
  • Blake3 - Very fast
  • Argon2id - Winner of Password Hashing Competition
  • PBKDF2 - Standard password hashing algorithm

Despite widespread use, including SHA-1 and MD5 as built-in hashes is irresponsible IMO since they're broken and easily implemented in Luau at acceptable speeds.

In regards to stuff like HMAC, I am of two minds. HMAC is hilariously easy to implement (I have an old implementation that is 43 lines long) so it's probably fine to leave out. On the other hand, it's also a standard so expecting people to implement it themselves is maybe weird!

I'd also like to comment on potential non-cryptographic functions. I think we should err on the side of caution and not touch them. Something like xxHash3 has its uses but it's really not secure, and including it is asking for people to just use it because it's fast. Blake3 is a great alternative for most use cases anyway.

@filiptibell filiptibell added the pending / needs design Awaiting design and/or implementation discussion and decisions label Oct 12, 2023
@Dekkonot
Copy link
Contributor

I think before we go further with this, incidentally, we should settle two questions:

  1. Are we okay with putting this in the Serde library? It feels inappropriate to put it here but up to you @filiptibell.

  2. What does the API look like? The obvious API is just something like sha256(algo: "hash" | "algorithm" | "names", message: string) -> string but it's not very robust. I know that @0x5eal has been working on an implementation, but we should probably settle on an API before it gets too far.

@filiptibell
Copy link
Collaborator

I left some thoughts on the related PR (#122) and would like to leave a couple extra ones here.

  1. Hashing algorithms that are not cryptographic ones... I personally wouldn't mind having something like GxHash, but I definitely see where the concern is coming from. If we can make it painfully obvious that certain algos are not cryptographically secure I would be fine with including them, but would also like to know what everyone thinks
  2. HMAC would probably be one of the most common use cases (maybe the most common) and we could easily get some big optimization and friendliness wins by including it natively
  3. With mlua getting native modules support, it's probably not too far out for Lune to have it either. It's worth considering this and how it influences what should be prioritized in Lune's standard library / built-ins vs more niche use cases that could be implemented in native modules. We should still try to do the best we can to balance flexibility + familiarity + performance, though

All in all, I'm currently leaning towards an API that looks something like:

type Hasher = "hash" | "algorithm" | "names"

serde.hash(algo: Hasher, message: string): string

serde.hmac(algo: Hasher, secret: string, message: string): string

where if necessary for some specific hashing algorithm we could add additional arguments to the end after message, like we already do with stdio.prompt.

This does leave out the use case of updating a hasher multiple times for performance when there are several inputs. I don't think we should prioritize this use case, since it changes our entire API surface, and makes it much less friendly for every other use case. If we support the new Luau buffer type, performance will also be much less of an issue.

@Dekkonot
Copy link
Contributor

I have a rough implementation that uses basically the format above. It's straight forward, but still uses a macro to erase some boilerplate (quite frankly, nobody needs to see the same 3 lines repeated for every single hashing algorithm).

I'm not sure if buffer is actually supported by Mlua yet though, which might be a downer. Also, I have no idea what the impact on the binary size a bunch of hashing crates might have. I haven't compared the two yet. If it's too big, we may need to put it all behind a feature.

@CompeyDev
Copy link
Contributor

Buffer is actually supported by mlua, and lune 0.8.0 ships with buffer support.

@Dekkonot
Copy link
Contributor

Buffer is actually supported by mlua, and lune 0.8.0 ships with buffer support.

I'm not seeing references to it in the docs. The Value enum as an example doesn't seem to have a Buffer variant.

@CompeyDev
Copy link
Contributor

CompeyDev commented Feb 21, 2024

I'm not seeing references to it in the docs. The Value enum as an example doesn't seem to have a Buffer variant.

That's because buffers are of type Value::UserData(data), here's an example of how I deal with them for #148: buffer.rs.

@Dekkonot
Copy link
Contributor

Disappointing, but not surprising. Never mind then, thank you.

@filiptibell
Copy link
Collaborator

I have a rough implementation that uses basically the format above. It's straight forward, but still uses a macro to erase some boilerplate (quite frankly, nobody needs to see the same 3 lines repeated for every single hashing algorithm).

I'm not sure if buffer is actually supported by Mlua yet though, which might be a downer. Also, I have no idea what the impact on the binary size a bunch of hashing crates might have. I haven't compared the two yet. If it's too big, we may need to put it all behind a feature.

@Dekkonot thought I'd chime in with a small update here - you can accept buffers as rust function args very easily now in the latest mlua version. If you use BString as a function argument it will accept both luau strings and buffers and the bytes are easily accessible

@RuizuKun-Dev
Copy link

I'd greatly appreciate support for this

my use case is caching raw binary file data with hashes,

i have an automatic image compression program and i'd like to cache whatever images already compressed as a Hash

@filiptibell filiptibell removed the pending / needs design Awaiting design and/or implementation discussion and decisions label Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
5 participants