Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple secure random number generation #393

Closed
joshtriplett opened this issue Jun 11, 2024 · 54 comments
Closed

Simple secure random number generation #393

joshtriplett opened this issue Jun 11, 2024 · 54 comments
Labels
ACP-accepted API Change Proposal is accepted (seconded with no objections) api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api

Comments

@joshtriplett
Copy link
Member

joshtriplett commented Jun 11, 2024

API design partially based on discussions with @BartMassey. Revised based on feedback, in particular from @Amanieu.

Proposal

Problem statement

People regularly want to generate random numbers for a variety of use cases. Doing so currently requires using a crate from crates.io. rand is the 6th most downloaded crate on crates.io, and fastrand is quite popular as well. Many, many people over the years have expressed frustration that random number generation is not available in the standard library (despite the standard library using randomness internally for HashMap).

There are multiple reasons why we have not historically added this capability. Primarily, there are three capabilities people want, and those capabilities seem to present a "pick any two" constraint:

  • Secure random number generation
  • Seedable random number generation
  • Stability between versions of Rust when seeded

These constraints arise from the possibility of a secure random number generator potentially requiring updates for security reasons. Changing the random number generator would result in different sequences for the same seed.

In addition to that primary constraint, there have also been design difficulties: there are numerous pieces of additional functionality people may want surrounding random number generation, which makes any proposal for it subject to massive scope creep and bikeshed painting. Most notably: users of random numbers may want to represent the state of the RNG explicitly as something they can pass around, or implicitly as global state for simplicity.

This ACP proposes a solution that aims to be as simple as possible, satisfy all the stated constraints on the problem, and allow for future expansion if desired. This ACP proposes a generator that is secure, but not stable across versions of Rust; this allows us to update the secure RNG if any issue or potential improvement arises.

Separately, ACP 394 proposes an RNG that is seedable and guarantees identical seeding across Rust versions, but is not a secure RNG.

Motivating examples or use cases

  • Generating randomness for games (e.g. rolling dice).
  • Randomly shuffling a list
  • Randomly sampling elements from a set
  • Random test case generation / fuzz testing

Solution sketch

// In `core::random`:

/// A source of random data
pub trait RandomSource {
    /// Fill `buf` with random bytes.
    fn gen_bytes(&mut self, buf: &mut [u8]);
}

/// Trait for any type that can be generated via random number generation.
pub trait Random {
    fn random(random_source: &mut (impl RandomSource + ?Sized)) -> Self;
}

// In `std::random`:

/// Random source that produces cryptographically secure random numbers at a reasonable speed.
///
/// This generator does not encapsulate any state, and cannot be seeded or replayed. It may vary by target and by Rust version.
#[derive(Default, Copy, Clone, Debug)]
pub struct DefaultRng;
impl RandomSource for DefaultRng { /* ... */ }

/// Generate a random value, uniformly distributed over possible values.
///
/// This uses `DefaultRng`.
pub fn random<R: Random>() -> R {
    <R as Random>::random(&mut DefaultRng)
}

The trait Random will initially be implemented for all iN and uN integer types, isize/usize, and bool, as well as arrays and tuples of such values.

Notably, Random will not initially be implemented for floating-point values (to initially avoid questions of distribution), strings/paths/etc (to avoid questions of length and character selection), or char (to avoid questions of what subset of values to generate). We can consider such additions in the future; see the section on future work below.

The random number generator may use OS randomness directly, if available, or use a CSPRNG seeded from OS/hardware randomness.

Alternatives

We could do nothing, and continue to refer people to external crates like rand and fastrand.

We could eliminate the convenience functions that implicitly use DefaultRng, and require all users to pass around a RandomSource directly. However, this would add complexity to many simple use cases.

We could allow gen_bytes to fill an uninitialized buffer. This would be a more complex interface, however. We could potentially introduce such an interface later, when we've stabilized the types to make it simpler.

We could use Read to get data from random sources (e.g. trait RandomSource: Read). This would have the advantage of letting us use read_buf once we stabilize that. However, this would prevent us from putting RandomSource in core.

We could allow gen_bytes to fail and return a Result. However, it seems unlikely that most consumers of randomness would be able to handle not having access to it. The higher-level functions will almost certainly want to panic rather than returning a result, for convenience of invocation, and having callers who can deal with an absence of randomness call the low-level interface directly doesn't seem like a sufficiently useful interface to support. We also don't want to introduce a whole family of parallel try_random/try_random_range/etc functions. We could always introduce a try_gen_bytes interface later, if there's a need for one. (In addition, we would not want to use io::Result here, as that would prevent RandomSource from living in core.)

We could rename Random::random to Random::random_with or similar, and use random for a function that calls random_with(DefaultRng). This would make it convenient to call, for instance, u32::random(). This doesn't seem like an important convenience, however, as it's just as easy to call random::<u32>() (or in most contexts just random() with type inference).

Future work

We should support randomly shuffling arrays:

impl<T> [T] {
    /// Shuffle a slice into a random order, using `DefaultRng`.
    pub fn shuffle(&mut self) {
        self.shuffle_with(DefaultRng);
    }
    
    /// Shuffle a slice into a random order, using the specified random source
    pub fn shuffle_with(&mut self, rng: &mut (impl RandomSource + ?Sized));
}

If we need it for performance, we could add functions like gen_u64 or gen_u32 to RandomSource (with default implementations based on gen_bytes), to help RNGs that can implement those more efficiently than they can implement gen_bytes. This is inspired by Hasher doing something similar. I propose that we initially have just gen_bytes, and require benchmarks demonstrating a substantial speedup before we consider the additional complexity and interface surface area.

We should support random generation in ranges (e.g. random_range(1..6)). Providing a correct implementation will steer people away from the most common open-coded implementation (using %), which introduces bias.

We could support choosing a random item from an iterator. (This can be done more optimally when the iterator implements some additional traits.)

We can implement Random for many more types, such as NonZero<T> and Wrapping<T>.

We should support derive(Random) for types whose fields all implement Random. This is relatively straightforward for structs. However, supporting this for enums would require some additional care to handle discriminants: should a random Option<u8> be 50% None, or 1/257 None? The latter is much more difficult to implement correctly.

We could add additional RandomSource implementations to the standard library, including seeded sources. This would enable purposes such as testing, reproducing the creation of objects whose algorithms require randomness to generate, replaying processes such as fuzz testing, supporting game seeds or replays, and various others. Note that this would not be the same type as DefaultRng.

We should not allow seeding the DefaultRng state, as that would affect all users rather than only affecting those that are opting into supporting seeding, and would also preclude designs that don't involve a seed (e.g. obtaining random numbers directly from the OS).

We could consider, in the future, introducing random float generation, or random character generation, or generation of various other types. A careful design could allow using the same mechanisms for this as for random generation in ranges (e.g. a Distribution<T> trait to sample from). Alternatively, we could leave the full breadth of this to the ecosystem, and just support random ranges directly, as well as some common specific ways to generate floats and characters.

We could provide a trait to fill an existing value rather than creating a new one.

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

  • We think this problem seems worth solving, and the standard library might be the right place to solve it.
  • We think that this probably doesn't belong in the standard library.

Second, if there's a concrete solution:

  • We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
  • We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.
@joshtriplett joshtriplett added T-libs-api api-change-proposal A proposal to add or alter unstable APIs in the standard libraries labels Jun 11, 2024
@programmerjake
Copy link
Member

impl Send for Rng;
impl !Sync for Rng;

why can't Rng be Sync? if all methods that modify state take &mut self then being Sync should be fine

@the8472
Copy link
Member

the8472 commented Jun 11, 2024

So the Rng type is only meant for the insecure generation and the free function is meant for secure generation?
That means there's no common abstraction for them if one wants to use a seedable RNG in tests and a real one in production.

@joshtriplett
Copy link
Member Author

joshtriplett commented Jun 11, 2024

So the Rng type is only meant for the insecure generation and the free function is meant for secure generation? That means there's no common abstraction for them if one wants to use a seedable RNG in tests and a real one in production.

I put that in future work: it's possible we might want to have a common type, or alternatively a common trait and two different types (so that the type system ensures you don't use an insecure RNG where you wanted a secure one). But that is likely to require further design, and I would like to leave that out of the initial design to keep the initial design simple. Past efforts to add an RNG have run into endless design issues and tradeoffs. The smaller the surface area, the fewer of those issues come up.

@joshtriplett
Copy link
Member Author

impl Send for Rng;
impl !Sync for Rng;

why can't Rng be Sync? if all methods that modify state take &mut self then being Sync should be fine

You're entirely right; fixed.

@ChrisDenton
Copy link
Member

Do we need a more descriptive name than Rng? I'm nervous about claiming such a generic name for the insecure version. And if this is to be stable for all time then I think that the rng algorithm becomes an API question.

Also shouldn't this have a way to fill a buffer using the system crng? It's less convenient but it's the most basic building block that platforms provide so it may be worth having a cross-platform version exposed.

@joshtriplett
Copy link
Member Author

joshtriplett commented Jun 11, 2024

Also shouldn't this have a way to fill a buffer using the system crng? It's less convenient but it's the most basic building block that platforms provide so it may be worth having a cross-platform version exposed.

I don't think we should make any guarantees that something is using the "system" RNG, just that it's using some cryptographically secure RNG.

I do agree that "fill this buffer with randomness" is a useful building block, but the goal of this ACP was the simple end-user-targeted interface, since existing libraries already have those building blocks.

@joshtriplett
Copy link
Member Author

And if this is to be stable for all time then I think that the rng algorithm becomes an API question.

The intent is that the seeded, insecure version is an API guarantee. The secure version makes no guarantees about algorithm, but the seeded version does.

@ChrisDenton
Copy link
Member

And if this is to be stable for all time then I think that the rng algorithm becomes an API question.

The intent is that the seeded, insecure version is an API guarantee. The secure version makes no guarantees about algorithm, but the seeded version does.

Sure, picking an algorithm for the insecure version is what I mean. That would need to be part of the proposal as it's part of the API, no?

@BurntSushi
Copy link
Member

I am in favor of the general idea here.

A domain question for folks using secure RNG: how do you write tests for the behavior of routines that make use of the secure RNG? Do you do it by providing it with a seed? I think the API as proposed would be insufficient for that because there's no way to create a secure RNG with a seed.

I have some initial thoughts:

(This would require deciding whether to have type-level separation between secure and seeded Rng states.)

I kinda lean towards this in the affirmative. That is, that we should have type-level separation. The use cases are different enough that being able to say "this type is always going to do 'secure' rng" is likely quite valuable. I'm not sure we need to commit to this in the initial design, but I think we should leave room for it. For example, perhaps by renaming Rng to InsecureRng or whatever.

Stability between versions of Rust when seeded

This is a little concerning to me because it feels like a very strong guarantee that will lock us into something for eternity. But, I confess, I am a bit ignorant here, and perhaps this isn't as big of a deal as I think it is. Do other language ecosystems guarantee stability of seeded insecure RNGs?

@the8472
Copy link
Member

the8472 commented Jun 11, 2024

how do you write tests for the behavior of routines that make use of the secure RNG? Do you do it by providing it with a seed?

With a pluggable RNG implementing a trait. In rand Rng is a trait

I've asked this in #393 (comment) and here the reply #393 (comment)

@joshtriplett
Copy link
Member Author

@ChrisDenton wrote:

And if this is to be stable for all time then I think that the rng algorithm becomes an API question.

The intent is that the seeded, insecure version is an API guarantee. The secure version makes no guarantees about algorithm, but the seeded version does.

Sure, picking an algorithm for the insecure version is what I mean. That would need to be part of the proposal as it's part of the API, no?

You're right, I should spell that out more explicitly. My intention was that we initially use the same RNG implementation for both, but if any security issue ever arises that requires us to change the RNG implementation, we'll change the secure one and leave the insecure one.

@joshtriplett
Copy link
Member Author

@BurntSushi wrote:

A domain question for folks using secure RNG: how do you write tests for the behavior of routines that make use of the secure RNG? Do you do it by providing it with a seed? I think the API as proposed would be insufficient for that because there's no way to create a secure RNG with a seed.

As @the8472 wrote, likely by using a trait.

I have some initial thoughts:

(This would require deciding whether to have type-level separation between secure and seeded Rng states.)

I kinda lean towards this in the affirmative. That is, that we should have type-level separation. The use cases are different enough that being able to say "this type is always going to do 'secure' rng" is likely quite valuable. I'm not sure we need to commit to this in the initial design, but I think we should leave room for it. For example, perhaps by renaming Rng to InsecureRng or whatever.

I think I'm convinced by this, yeah: let's keep the types distinct, and then we can provide a trait for any code that wants to be generic over the RNG. But I still think this should be future work, to avoid increasing the initial surface area we need to get consensus on.

Stability between versions of Rust when seeded

This is a little concerning to me because it feels like a very strong guarantee that will lock us into something for eternity. But, I confess, I am a bit ignorant here, and perhaps this isn't as big of a deal as I think it is. Do other language ecosystems guarantee stability of seeded insecure RNGs?

To the extent other languages provide stability guarantees, yes, some other languages/libraries do provide seeded RNGs and guarantee that the same seed produces the same sequence of values. For instance, Python provides an insecure seedable RNG in the random module, and separately provides secure random number generation in the secrets module which does not support seeding.

@pitaj
Copy link

pitaj commented Jun 11, 2024

Should the random function take a range, or is that left to future additions? Generating a uniform random value within a range is important especially for secure rng because otherwise people might use % (introducing bias).

@CryZe
Copy link

CryZe commented Jun 11, 2024

To me it feels like we are trying to go too high level too quickly, especially with the Random trait that opens up all sorts of questions about ranges and distributions support. And even how slices would work with it. Wouldn't it make sense to cut this down (for now) to just upstream the getrandom crate, or in other words the following API for just filling a buffer with cryptographically secure randomness?

pub fn getrandom(dest: &mut [u8]) -> Result<(), Error>;

and / or:

pub fn getrandom_uninit(dest: &mut [MaybeUninit<u8>]) -> Result<&mut [u8], Error>;

It feels like a big mistake not to support slices, as you would otherwise need to create a loop to fill the buffer, causing a huge syscall overhead that isn't necessary, because the underlying OS primitives all (from what I'm seeing) support buffers. In fact std internally already has its own getrandom abstraction (imp::fill_bytes) that it uses for secure randomness for all the platforms. Together with the rand crate being built on top of the getrandom crate this highlights that this is really the core cross platform abstraction that you really need in the end. Everything else is just nice to have on top.

I'm certainly not opposed to having more higher level APIs available in the initial API as well, but a getrandom equivalent is imo an absolute must.

@joshtriplett
Copy link
Member Author

@pitaj wrote:

Should the random function take a range, or is that left to future additions? Generating a uniform random value within a range is important especially for secure rng because otherwise people might use % (introducing bias).

That's in the future work section. I absolutely think we should, but I'm trying to minimize the initial surface area.

@joshtriplett
Copy link
Member Author

To me it feels like we are trying to go too high level to quickly, especially with the Random trait that opens up all sorts of questions about ranges and distributions support. Wouldn't it make sense to cut this down (for now) to just upstream the getrandom crate, or in other words the following API for just filling a buffer with cryptographically secure randomness?

pub fn getrandom(dest: &mut [u8]) -> Result<(), Error>;

That solves a completely different problem. That's a building block that would live underneath the higher-level interface most people are actually looking for, and it wouldn't solve the problem most people are pulling in crates for. An order of magnitude more crates depend on rand than on getrandom.

@joshtriplett
Copy link
Member Author

joshtriplett commented Jun 11, 2024

It feels like a big mistake not to support slices

The initial RFC does support arrays: you can do let array: [u8; 4096] = random();. And we can implement that reasonably efficiently.

@cuviper
Copy link
Member

cuviper commented Jun 11, 2024

Will the stable RNG be stable across different targets? (especially bit-size and endianness)

@joshtriplett
Copy link
Member Author

@cuviper It should be, yes. I'll document that.

@joshtriplett joshtriplett changed the title Simple random number generation - secure or seedable, not both Simple secure random number generation Jun 11, 2024
@joshtriplett
Copy link
Member Author

@rust-lang/libs-api I've now split this ACP.

This ACP provides only the simple secure random number generation. It has a mention, in the "Future work" section, of a possible "phase 2" design for a seedable secure RNG.

The separate ACP #394 provides the seedable insecure random number generation that's stable across Rust versions.

@BartMassey
Copy link

Thanks @joshtriplett for this. It's a lot.

The phrase "cryptographically secure" is being used here. We should be clear what is meant. [Nakov, Wikipedia]. Prediction resistance, presumably — would we allow something weaker? State compromise resistance, probably?

@programmerjake
Copy link
Member

as a proposed algorithm, I think https://en.wikipedia.org/wiki/Fortuna_(PRNG) could be good. it would collect entropy from rdrand-style instructions (if available, with a way to disable using them since some people think they're backdoored) and from the OS. imo rdrand, even if backdoored, is just going to make the RNG more random, since they're not the only entropy source.
it might be nice to also have an api for adding more entropy (no actual entropy estimation needed, which I think is one of the benefits of Fortuna).

@scottmcm
Copy link
Member

scottmcm commented Jun 11, 2024

What led you to pick this working on integers & bool, rather than exposing something more raw? For example, one basic version of this would be to offer an impl Read that let you fill [u8]s, and that would seem helpful for things like wasm targets as it would be able to return io::Errors, since there's often no secure randomness available.

For the implementation side, do we really need to maintain something in std? Can we just be a thin layer around CryptGenRandom//dev/urandom/whatever? Obviously something user-mode with a bunch of state could be faster, but it's also bigger, so maybe it's fine to say "look, this will give you the basic entropy access, which you can use directly if you have simple needs, but if you want more complicated things or different performance trade-offs, get a crate". I'd much rather delegate the "secure" part to the OS if possible (that can be patched outside what we do), since getting a fresh "yup we're secure" audit for a standard library implementation every release seems impractical. Or, said from the other direction, if I care about security being able to pin a crate version across rust releases sounds like a good thing.

The solution sketch here mentions "thread-local" in the documentation for the function. Is that observable at all? Is there a way that the caller could tell anything different between it being thread-local or global and mutexed?

@programmerjake
Copy link
Member

relying on the OS and not implementing your own CSRNG may be best, plus some OSes go to great lengths to make getting randomness fast, e.g. https://www.phoronix.com/news/Linux-Random-vDSO-2024

@joshtriplett
Copy link
Member Author

@scottmcm wrote:

What led you to pick this working on integers & bool, rather than exposing something more raw? For example, one basic version of this would be to offer an impl Read that let you fill [u8]s,

This interface also allows writing let buf: [u8; 1024] = random(); if you'd like. ("Fill an existing buffer" is in the future work section.) The reason I'm proposing something based on trait Random is to provide usability and support a variety of potential types. (The Random trait is intentionally opaque for now.)

I don't think providing exclusively an interface for filling buffers will serve many potential users; with such an interface, most users would end up having to wrap it for usability, at which point they're using an external crate anyway, so we didn't solve the underlying problem of "why do I need an external crate just to get a random number?". I think we need

and that would seem helpful for things like wasm targets as it would be able to return io::Errors, since there's often no secure randomness available.

I would argue that most applications with a dependency on secure randomness are not prepared to do without it, and panicking is the appropriate response. We could add fallible interfaces like rand's try_fill in the future, but in practice I think those will see substantially less use.

For the implementation side, do we really need to maintain something in std? Can we just be a thin layer around CryptGenRandom//dev/urandom/whatever?

We can use any implementation we want. That includes directly using OS randomness (e.g. on a target that provides it via something like the Linux VDSO) if available, which has multiple advantages, including support for unusual things like vmfork operations (forking a virtual machine and reseeding randomness). This ACP is not attempting to commit to any particular implementation strategy. I do expect, in practice, that there may exist some targets for which we need to use a seeded CSPRNG, because we can get small amounts of randomness but not large amounts.

The solution sketch here mentions "thread-local" in the documentation for the function. Is that observable at all? Is there a way that the caller could tell anything different between it being thread-local or global and mutexed?

I rewrote the doc comment. For the secure RNG, there's no way to tell the difference. In practice, I don't think we should use global-and-mutexed for performance reasons, but that's an issue of implementation quality, not something this ACP is trying to mandate.

(The difference would only arise for seedable RNGs, which this ACP is not proposing.)

@ChrisDenton
Copy link
Member

I'm in favour of a high level API but if we then have two APIs that use an OS entropy source (Rng and RandomState) then it becomes increasingly silly that we don't expose direct access to it. So it'd be good to have a raw entropy_source API.

@joshtriplett
Copy link
Member Author

@matklad I don't think this is an incorrect category: both of those use cases are fundamentally about randomness, and both should be a RandomSource. I think this is purely a question of "should the default RandomSource be cryptographically secure". And on balance I think it's safer for the default to be cryptographically secure, because a mistake in the direction of using randomness that's "too secure" is mostly harmless, while a mistake in the direction of using randomness that's insecure is dangerous.

We can always provide an insecure, seedable random source, if people want fast or reproducible randomness.

Whether we achieve cryptographically secure randomness by getting it directly from the OS every time or by using a userspace CSPRNG seems to me like an implementation detail. The tradeoffs there are between performance (OS randomness is likely to be slower, though current Linux is in the process of fixing that), complexity (OS randomness is simpler), and some potentially useful OS features (such as support for OS-triggered reseeding for use cases like forking virtual machines).

@clarfonthey
Copy link

Since there isn't a tracking issue yet, might as well comment here:

I think that it would make a lot more sense to lean into what these APIs are designed for, seeding RNGs, rather than just treating them like a proper random number source. From that perspective, using the term "entropy" like the other ACP might make a bit more sense.

We could then potentially do something similar to #[global_allocator] and offer a #[global_entropy] attribute that mimics the allocator API, to be used by stuff like RandomState. That way, it feels a lot more like an internal API to be used by other RNG sources instead of a fully-featured RNG source in its own right.

This could also help simplify the move of HashMap to the alloc crate, since RandomState and co. could still exist there, even though you would need to use std if you want to use the system entropy source.

@joshtriplett
Copy link
Member Author

rather than just treating them like a proper random number source

The point of this ACP is to be an easily available random number source; everything else is an implementation detail.

We could certainly debate whether we should be using OS randomness directly or to seed our own CSPRNG, but whichever one we do, the point is to provide a useful default random number source, not to provide solely an ingredient for users to build their own. (We should also provide the trait that helps users build their own in an interface-compatible way, but we should have a useful default implementation of that trait.)

We could then potentially do something similar to #[global_allocator] and offer a #[global_entropy] attribute

I think it'd be a serious mistake to allow replacing the default RNG, and would lead to people not wanting to use the default RNG lest it be replaced by something insecure (or with less clear security properties). People can always use a different RandomSource if they want one, by explicitly accepting one in their API.

@matklad
Copy link
Member

matklad commented Aug 15, 2024

both should be a RandomSource.

I personally lean significantly towards no answer to this question. As I've learned from rust-lang/rust#129120, what I mentioned under "there are two ways you can implement DefaultRng" seems to be a non-theoretical concern: some underlying OS API instruct you to use the function to seed a user-space CSPRNG.

This puts us in bind with

// In `std::random`:
vary by target and by Rust version.
#[derive(Default, Copy, Clone, Debug)]
pub struct DefaultRng;

setup: if we don't cache the OS-produced entropy in an in-process CSPRNG, than we are violating API expectations, which probably leads to significantly reduced performance on some OSes. If we do cache, then we'll need to solve the nasty problem of where to cache, which requires us to have thread-local storage, needs special handling during fork, and makes people generating keypairs uneasy because, ideally, they'd prefer for there to be nothing between the key material and the underlying OS API.

Consider this alternative:

pub mod crypto {
    /// Get entropy directly from OS, corresponds to single syscall/vDSO invocation.
    pub fn get_entropy(buf: &mut MaybeUninit<[u8]>) -> &mut [u8]
}

pub mod random {
    pub struct DefaultRandomSource(ASecurePrng);
    impl DefaultRandomSource {
        // Get an instance of cryptographically secure random number generator seeded
        // from `crypto::get_entropy`.
        pub fn new() -> DefaultRandomSource {
            let mut seed_buf = [u8; 128];
            let seed = crate::crypto::get_entropy(&mut seed_buf);
            DefaultRandomSource::with_seed(&seed)
        }

        pub fn with_seed(seed: &[u8]) -> DefaultRandomSource {
        }
    }
}

It dodges the tradeoff. get_entropy is guaranteed to call into the relevant OS APIs directly, DefaultRandomSource is guaranteed to cache, and the user can be flexible about how they use this cache. They might stuff it into a thread local, they might pass it as a parameter, or they might just call DefaultRandomSource::new() every time they need randmoness.

@joboet
Copy link
Member

joboet commented Aug 15, 2024

(casually dropping in to mention #159 which aims to solve the "get entropy from the system" part)

@joshtriplett
Copy link
Member Author

@matklad Again, that still just seems like an argument for what the default RandomSource should be; there's no obvious reason a "guaranteed to be directly from the OS" source couldn't be a RandomSource rather than a different function.

@clarfonthey
Copy link

clarfonthey commented Aug 15, 2024

The point of this ACP is to be an easily available random number source; everything else is an implementation detail.

If that's the case, then I think that it would be wrong to simply offer "a random i32" instead of "a random integer within this range, that happens to be an i32". I also was reading this comment and under the impression that we were just using system entropy APIs, and that PRNGs or CSPRNGs were kind of out-of-spec.

Like, don't get me wrong, I like the idea of effectively incorporating the rand crate into libstd. But I don't think that we should encourage users to either:

  1. implement their own PRNG
  2. use the random numbers incorrectly (rand() % 6)

And I personally think that it's unlikely we'll get rand incorporated into the standard library any time soon, due to the large API surface.

I think it'd be a serious mistake to allow replacing the default RNG, and would lead to people not wanting to use the default RNG lest it be replaced by something insecure (or with less clear security properties). People can always use a different RandomSource if they want one, by explicitly accepting one in their API.

Note that I'm not proposing this for users to replace the OS RNG; I'm proposing it so that no_std targets can provide their own RNG source. We could just make it a function instead of a trait in that case.

@matklad
Copy link
Member

matklad commented Aug 15, 2024

I think something like

pub mod crypto {
    struct OsRandomSource;
    impl crate::random::RandomSource for OsRandomSource { ... }
}

pub mod random {
    struct DefaultRandomSource(ASecurePrng);
    impl DefaultRandomSource {
       pub fn new() -> DefaultRandomSource;
    }
    impl RandomSource for DefaultRandomSource { ... }
}

would also address my concern about mixing up two distinct uses of randomness, yeah, as long as we don't try to shove "get randomness from OS directly" and "get randomness sufficiently quickly" in literally the same function in an elf file.

I'd guess even just

pub mod random {
    struct DefaultRandomSource(ASecurePrng);
    impl DefaultRandomSource {
       pub fn new() -> DefaultRandomSource;
    }
    impl RandomSource for DefaultRandomSource { ... }
}

would address it: here we explicitly don't provide direct access to OS randomness, and instead just give you an RNG for a guessing game or hash map init.

It's just that pub struct DefaultRng; forces us to bind to OS directly (as we definitely don't want to do thread local shenanigans), but that would actually make it a poor RNG for a guessing-game style randomness (due to violating underlying platform expectations about how randomness is to be used).

@clarfonthey
Copy link

clarfonthey commented Aug 15, 2024

I mean, like I said, I think that the bar for including a proper "RNG" in the standard library would require at minimum the ability to generate a uniform integer or float in a given range, which itself is quite a lot of code from rand already.

And the default RNG should be cryptographically secure, if we're only providing one.

Since just offering bytes from the OS entropy fails both of those by default, I would say it's fit for seeding other RNGs, but not offering as a general random number generator. And having an interface for no_std users to implement this functionality would make HashMap more portable without barring all of the RandomState code to these users just because the OS entropy isn't available.

@joshtriplett
Copy link
Member Author

joshtriplett commented Aug 15, 2024

And I personally think that it's unlikely we'll get rand incorporated into the standard library any time soon, due to the large API surface.

the ability to generate a uniform integer or float in a given range

I don't think we'll end up with all of rand, but as mentioned in the future work section, we should have a way to ask for random numbers in a given range, as well as other common operations like shuffles and random selection.

It's just that pub struct DefaultRng; forces us to bind to OS directly (as we definitely don't want to do thread local shenanigans),

Having to explicitly pass around an RNG state would be painful for many common uses; having global functions like random() (and in the future, random ranges, shuffles, and similar) is convenient, and already implies the same thing that DefaultRng does.

I'm hoping that on at least some platforms we can avoid doing thread-local shenanigans, but we may end up there eventually on platforms that lack a fast getrandom or one that the OS doesn't complain about usage of.

@clarfonthey
Copy link

Yeah, I should clarify, there are plenty of things in rand that definitely aren't needed in libstd, since they have quite a few complex distributions that should stay in a dedicated crate. Although the code for generating uniform floats in particular is quite complex, which is why I implied that it was a large portion of rand's code. I haven't looked into it and could be entirely wrong.

Of course, libstd has its own handful of very complex routines too. The entire formatting circuitry comes to mind. So, it wouldn't be too much of a stretch to include that in libcore, since it should get LTO'd out if it's unused.

@joboet
Copy link
Member

joboet commented Aug 16, 2024

I'm hoping that on at least some platforms we can avoid doing thread-local shenanigans, but we may end up there eventually on platforms that lack a fast getrandom or one that the OS doesn't complain about usage of.

That should be the case on most platforms:

Platform Usable source
Windows ProcessPrng is usable for all purposes
macOS CCRandomGenerateBytes or arc4random_buf (IIRC the latter uses the former) both use AES since version 10.12 (our minimum) are recommended in the getentropy documentation for when large quantities are needed [no link, Apple doesn't upload their man pages]
FreeBSD arc4random_buf (it uses ChaCha20 since FreeBSD 12.0 which appears to be our minimum)
OpenBSD arc4random_buf (it also uses ChaCha20 since OpenBSD 5.5, no idea what our minimum is)
NetBSD arc4random_buf (same story since at least 2014)
DragonFly arc4random_buf (same story)
ESP-IDF esp_fill_random is the only source and should be fast enough it's not, we need to do our own thing
Fuchsia arc4random_buf is probably better than cprng_draw, they warn against using it for large quantities of random data
... too lazy to check

The only problem is Linux, getrandom isn't really meant to be used for large quantities and arc4random_buf was only added to glibc in 2022, so it's way beyond our minimum. We'd probably need to do our own thing as fallback.

@jeffparsons
Copy link

arc4random_buf was only added to glibc in 2022, so it's way beyond our minimum. We'd probably need to do our own thing as fallback.

Given that it's a new API in std, falling back to thread-local shenanigans for even slightly older systems seems totally fine to me as a user. Either I didn't read the docs and so I probably don't care too much, or I did read the docs and so I knew what I was getting myself into.

(I'm ignoring issues like the maintenance or testing burden of having extra code paths.)

/2c

@joboet
Copy link
Member

joboet commented Aug 17, 2024

I've opened rust-lang/rust#129201 as an alternative version that uses arc4random_buf. On Linux, that function is actually implemented as an getrandom wrapper, so I've stayed with the current version. getrandom should be fine for generating large amounts of data except for it being slow. Let's just wait until it is implemented in the vDSO, that should then speed things up. and it's just gotten a vDSO version, so speed shouldn't be a concern anymore.

@joshtriplett
Copy link
Member Author

joshtriplett commented Aug 17, 2024

@joboet I think "it's functional in all supported versions but only performant on newer versions" is sufficient to avoid maintaining the fallback, at least for Linux. In practice, even the non-VDSO version is likely to be fast enough for most users.

joboet added a commit to joboet/rust that referenced this issue Aug 22, 2024
joboet added a commit to joboet/rust that referenced this issue Aug 22, 2024
joboet added a commit to joboet/rust that referenced this issue Aug 22, 2024
joboet added a commit to joboet/rust that referenced this issue Sep 22, 2024
workingjubilee added a commit to workingjubilee/rustc that referenced this issue Sep 23, 2024
…shtriplett

std: implement the `random` feature (alternative version)

Implements the ACP rust-lang/libs-team#393.

This PR is an alternative version of rust-lang#129120 that replaces `getentropy` with `CCRandomGenerateBytes` (on macOS) and `arc4random_buf` (other BSDs), since that function is not suited for generating large amounts of data and should only be used to seed other CPRNGs. `CCRandomGenerateBytes`/`arc4random_buf` on the other hand is (on modern platforms) just as secure and uses its own, very strong CPRNG (ChaCha20 on the BSDs, AES on macOS) periodically seeded with `getentropy`.
bors added a commit to rust-lang-ci/rust that referenced this issue Sep 23, 2024
…triplett

std: implement the `random` feature (alternative version)

Implements the ACP rust-lang/libs-team#393.

This PR is an alternative version of rust-lang#129120 that replaces `getentropy` with `CCRandomGenerateBytes` (on macOS) and `arc4random_buf` (other BSDs), since that function is not suited for generating large amounts of data and should only be used to seed other CPRNGs. `CCRandomGenerateBytes`/`arc4random_buf` on the other hand is (on modern platforms) just as secure and uses its own, very strong CPRNG (ChaCha20 on the BSDs, AES on macOS) periodically seeded with `getentropy`.
joboet added a commit to joboet/rust that referenced this issue Sep 23, 2024
workingjubilee added a commit to workingjubilee/rustc that referenced this issue Sep 23, 2024
…shtriplett

std: implement the `random` feature (alternative version)

Implements the ACP rust-lang/libs-team#393.

This PR is an alternative version of rust-lang#129120 that replaces `getentropy` with `CCRandomGenerateBytes` (on macOS) and `arc4random_buf` (other BSDs), since that function is not suited for generating large amounts of data and should only be used to seed other CPRNGs. `CCRandomGenerateBytes`/`arc4random_buf` on the other hand is (on modern platforms) just as secure and uses its own, very strong CPRNG (ChaCha20 on the BSDs, AES on macOS) periodically seeded with `getentropy`.
workingjubilee added a commit to workingjubilee/rustc that referenced this issue Sep 23, 2024
…shtriplett

std: implement the `random` feature (alternative version)

Implements the ACP rust-lang/libs-team#393.

This PR is an alternative version of rust-lang#129120 that replaces `getentropy` with `CCRandomGenerateBytes` (on macOS) and `arc4random_buf` (other BSDs), since that function is not suited for generating large amounts of data and should only be used to seed other CPRNGs. `CCRandomGenerateBytes`/`arc4random_buf` on the other hand is (on modern platforms) just as secure and uses its own, very strong CPRNG (ChaCha20 on the BSDs, AES on macOS) periodically seeded with `getentropy`.
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Sep 23, 2024
Rollup merge of rust-lang#129201 - joboet:random_faster_sources, r=joshtriplett

std: implement the `random` feature (alternative version)

Implements the ACP rust-lang/libs-team#393.

This PR is an alternative version of rust-lang#129120 that replaces `getentropy` with `CCRandomGenerateBytes` (on macOS) and `arc4random_buf` (other BSDs), since that function is not suited for generating large amounts of data and should only be used to seed other CPRNGs. `CCRandomGenerateBytes`/`arc4random_buf` on the other hand is (on modern platforms) just as secure and uses its own, very strong CPRNG (ChaCha20 on the BSDs, AES on macOS) periodically seeded with `getentropy`.
github-actions bot pushed a commit to rust-lang/miri that referenced this issue Sep 25, 2024
std: implement the `random` feature (alternative version)

Implements the ACP rust-lang/libs-team#393.

This PR is an alternative version of #129120 that replaces `getentropy` with `CCRandomGenerateBytes` (on macOS) and `arc4random_buf` (other BSDs), since that function is not suited for generating large amounts of data and should only be used to seed other CPRNGs. `CCRandomGenerateBytes`/`arc4random_buf` on the other hand is (on modern platforms) just as secure and uses its own, very strong CPRNG (ChaCha20 on the BSDs, AES on macOS) periodically seeded with `getentropy`.
@tbu-
Copy link

tbu- commented Sep 27, 2024

If that's the case, then I think that it would be wrong to simply offer "a random i32" instead of "a random integer within this range, that happens to be an i32"

I want to highlight this comment. I also think that it's a mistake to expose the wrong kind of primitive to generate a dice roll (like the motivating example).

Is there a real-world use case for generating a uniformly random i32 that wouldn't be served with a "fill byte buffer" kind of API?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ACP-accepted API Change Proposal is accepted (seconded with no objections) api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api
Projects
None yet
Development

No branches or pull requests