-
Notifications
You must be signed in to change notification settings - Fork 524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a generic chunk pool for batch.NewChunkMergeIterator #5110
Conversation
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. Is it worth benchmarking the case where the number of chunks varies to confirm there are no issues with the expected pattern of behaviour?
@@ -28,6 +28,8 @@ type mergeIterator struct { | |||
currErr error | |||
} | |||
|
|||
// newMergeIterator returns an iterator that merges generic chunks in batches. | |||
// This functions must not hold a reference to the `cs` slice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit]
// This functions must not hold a reference to the `cs` slice. | |
// This function must not hold a reference to the `cs` slice. |
|
||
func getGenericChunkSlice(n int) []GenericChunk { | ||
sn := genericChunkSlicePool.Get() | ||
if sn == nil || cap(sn) < n { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would some kind of pool with different buckets for different ranges of n
be better here, to avoid the pool only ever having large slices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 in a multi tenant installation I assume you get all kinds of different size queries so you'll end up increasing the memory footprint over time by having more and more large slices in the pool. Maybe measure first with a histogram in some busy environment to get a good idea on buckets?
Note: I've checked that this is invoked after applying the max chunks per query limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we use this code in the store-gateways. The way we size slabs is to account for the worst-case or p99 case of number of chunks per series. This is an example for slices of chunks.
So if a series can have at most 300 chunks, have a slab size of 300. If all series have fewer chunks, e.g. 3, then one slab can fit 100 series. The overhead isn't huge, but you also get reduced number of overall allocations and maybe even better CPU cache locality since sequential slices are sequential in memory.
I converted this PR to a draft since from the above comment thread there seem to be multiple different ways this can be done. Need to determine which is better here. This PR is not my top priority at the moment, so I will come back to this sometime later (but if anyone wants to pursue this further, feel free!) |
I'm closing this PR since I believe none is actively working on it. We can reopen anytime. |
What this PR does
Adds a chunk pool to
batch.NewChunkMergeIterator
. Here are the benchmark results:Note that the benchmark is an ideal case where the length of the slice likely remains the same. In the real world where the slice length will differ between queries, the gains might be different, but I expect net gains overall.
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]