-
Notifications
You must be signed in to change notification settings - Fork 524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mimir query engine: add memory consumption per query limit #8230
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
jhesketh
approved these changes
Jun 3, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome stuff, looks really good to me. I agree with the planned work (specifically exposing the metric), but makes sense to split it up like this.
2 tasks
charleskorn
added a commit
that referenced
this pull request
Jun 4, 2024
2 tasks
narqo
pushed a commit
to narqo/grafana-mimir
that referenced
this pull request
Jun 6, 2024
) * Make formatting consistent * Initial version of `LimitingPool` * Move to `operator` package * Move `FPoint` and `HPoint` pools next to `LimitingPool` * Add methods for slices of `HPoint` to `LimitingPool` * Use `LimitingPool` everywhere * Move pool to its own package and introduce interface * Move pool interface to `types` package * Add documentation for `err-mimir-max-in-memory-samples-per-query` * Add limit CLI flag and config option * Add (failing) tests * Fix linting warnings * Add another test case * Add more slice types to `LimitedPool`. * Rework limit to use estimated memory consumption, rather than a number of samples * Ensure float and bool slices are cleared. * Update tests to use bytes rather than samples limit * Add limit to list of experimental features * Add changelog entry * Fix linting warning * Fix description of error * Remove unnecessary interface and early enforcement of limit * Fix flag name * Remove unnecessary interface * Remove unused methods * Address PR feedback
narqo
pushed a commit
to narqo/grafana-mimir
that referenced
this pull request
Jun 6, 2024
…afana#8247) * Move interface definitions to `types` package * Move operators to `operators` package * Add changelog entry
2 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does
This PR implements a per-query estimated memory consumption limit for the Mimir query engine.
The estimate is based on the primary contributors to memory consumption: samples (eg.
promql.FPoint
s) and running totals (eg. the slices offloat64
s used bysum()
aggregations).The estimate ignores other contributions to a query's memory consumption like chunks and series labels. These could be added in the future if need be.
The limit is enforced as slices are created during the query, and is based on the capacity of the slice created, not the size requested. These are not necessarily the same: we use bucketed pools for each of these slice types, and the pool will allocate a slice of capacity equal to the bucket that will hold the requested size, which will always be greater or equal to the requested size. This means the limit more closely tracks the actual memory utilisation of the query, but may be slightly higher than otherwise expected.
The estimate is generally accurate, except for:
nativeHistogramSampleSizeFactor
inlimiting_pool.go
), as tracking the true size of each native histogram would be very expensive. A future improvement would be to makenativeHistogramSampleSizeFactor
configurable, but I think this is fine for now.Enforcing the limit adds up to 1% latency overhead to some benchmarks, but this seems worthwhile.
This change also required some shuffling of types between packages to help ensure that the underlying pools are not accessed directly and all allocations go through the limit-enforcing methods. In the interests of keeping this PR as small as possible, I haven't done all the refactoring I'd like to do and will do this in a future PR. In particular, I'd like to move the
Operator
interface andRingBuffer
type to thetypes
package, and rename theoperator
package tooperators
.I'd also like to use the
PeakEstimatedMemoryConsumptionBytes
in a metric and log it on query traces, but this too will come in a follow-up PR.Which issue(s) this PR fixes or relates to
(none)
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.