Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AllocError does not store additional information #106

Open
Zymlex opened this issue Sep 8, 2022 · 6 comments
Open

AllocError does not store additional information #106

Zymlex opened this issue Sep 8, 2022 · 6 comments

Comments

@Zymlex
Copy link

Zymlex commented Sep 8, 2022

It would be nice if AllocError became a trait (or associated type) for which each implementation could add additional information, such as the cause of the error and something else.

@CAD97
Copy link

CAD97 commented Sep 8, 2022

Not a trait, but associated type.

The error type for collections' try_reserve contains reserved API space for adding an associated allocation failure error type, so this is at least possible.

The main questions then are

  • object safety — dyn Allocator for dynamic allocators working is a key design goal;
  • defaulting — default provided methods may want to create an error, so either they need to only be defaulted when the error type is defaulted, or there needs to be a default trait bound guaranteeing default constructability; and
  • what trait bounds do we put on the associated error type and where do we require them — notably, the Error trait only just very recently was made available in core.

Plus, for the majority of cases, allocators won't have any context to provide beyond just the size of the requested allocation, which shouldn't be part of the error type in order to avoid the performance overhead in the typically very hot code.

Plus, by construction, this wouldn't be available to anything using the global allocator. The utility of custom errors is marginal due to that.

@Urhengulas
Copy link

It would be nice if AllocError became a trait (or associated type) for which each implementation could add additional information, such as the cause of the error and something else.

@Zymlex Out of interest: What additional information are you thinking about? What is your context for that? I am recently thinking about this a lot in the context of safety-critical software.

@Zymlex
Copy link
Author

Zymlex commented Sep 13, 2022

At least the error details.
For example, on Windows, HeapAlloc provides two error causes STATUS_NO_MEMORY and STATUS_ACCESS_VIOLATION. The first in some cases is not critical and can be solved, the second is hinting at a possible bug and problems.
I do not exclude that some allocators may provide additional information about the error.

In cases where the data size is unknown, necessary to allocate space for a future error in advance (on application launch). It would be nice if it were possible to specify whether this feature is enabled and how much memory to allocate in advance.

@Urhengulas
Copy link

Thanks.

You might want to have a look into the previous discussion about an associated error type #23. It kind of ended in "sounds generally interesting, but no one has a concrete usecase" (see #23 (comment)).

@kornelski
Copy link

kornelski commented Jan 31, 2024

In terms of use-cases, I would like to remove Layout from TryReserveError{Kind} types (to reduce code bloat from large Results around finish_grow and try_ functions), but still leave possibility of getting more detailed error information for users who really want this.

From what I've seen, most OOM handlers only care about size, and even then only as an FYI to print it as a potential clue for analyzing an OOM abort. Use-cases for including Layout.align are even weaker — mainly as a clue for guessing that OOM happened due to types having alignment is so incredibly large it can't be satisfied due to technical limits of the allocator, or address space fragmentation, but that case is so exceptionally rare, that anybody using such trick will already know that at the call site anyway.

https://internals.rust-lang.org/t/questioning-usefulness-of-alignment-in-oom-errors-and-handlers/20233/1

But one-size-fits-all TryReserveError is not going to work for everyone. The Layout despite being relatively large, is not actually that informative, because it's only for guessing the real reason. It's not supplied by the allocator itself. So guessing that maybe OOM was due to large alignment is not as good as having allocator return some CustomError::AlignmentWasTooLargeIndeed.

I can imagine users adding verbose information to allocation errors in debug or beta-testing builds, and switching it to zero-sized types for production builds. Having a custom type for errors would make it possible, instead of TryReserveError being too big for some, and not detailed enough for others.

#121

@amunra
Copy link

amunra commented Sep 27, 2024

QuestDB developer here. We're beginning to compile with nightly specifically for the availability of allocators.

Traking details in the allocation error would be very very useful indeed. I'll provide some context, in hopes it can be of use.

A database needs to continue working, even when it's close to running out of memory. The way we achieve this is by internally tracking how much memory we've allocated from the global allocator in an atomic counter.

We then calculate a soft limit that we avoid mallocing beyond. This means that we can prevent allocations early before the system allocator fails and avoid the OOM killer. When the soft memory limit is hit, we end up failing the inividual query or database operation that caused the issue. This is all implemented in Java as a bunch of atomic counters: We're now exposing it to Rust.

To do this, we're using multiple dedicated allocators, one per database component (querying, data ingestion, data compaction, etc). When things fail we want to track details of what went wrong:

  • Which component?
  • Which how much memory was requested?
  • What was the configured soft limit?

Our current hacky work-around is to place these details in a thread local and take its contents when we map the AllocError or TryReserveError in our own error type. It's not 100% reliable, as it relies on no further allocations between the Rust allocator's error and the conversion to our own type. It's got the same reliability semantics as libc's errno :-)

The thread local hack also relies on our code not being async. We have parts of the code that use async as well. It'll get even messier there.

I wonder if the error_generic_member_access feature might help (or not) in this context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants