Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a generic chunk pool for batch.NewChunkMergeIterator #5110

Closed
wants to merge 1 commit into from

Conversation

codesome
Copy link
Member

@codesome codesome commented May 31, 2023

What this PR does

Adds a chunk pool to batch.NewChunkMergeIterator. Here are the benchmark results:

BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1000_samples_per_chunk:_100_duplication_factor:_1-10     3194638       3174288       -0.64%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1000_samples_per_chunk:_100_duplication_factor:_3-10     9529323       9468404       -0.64%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_100_samples_per_chunk:_100_duplication_factor:_1-10      321109        318679        -0.76%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_100_samples_per_chunk:_100_duplication_factor:_3-10      952047        946160        -0.62%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1_samples_per_chunk:_100_duplication_factor:_1-10        3695          3652          -1.16%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1_samples_per_chunk:_100_duplication_factor:_3-10        10721         10670         -0.48%

benchmark                                                                                                        old allocs     new allocs     delta
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1000_samples_per_chunk:_100_duplication_factor:_1-10     1004           1003           -0.10%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1000_samples_per_chunk:_100_duplication_factor:_3-10     3008           3007           -0.03%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_100_samples_per_chunk:_100_duplication_factor:_1-10      104            103            -0.96%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_100_samples_per_chunk:_100_duplication_factor:_3-10      308            307            -0.32%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1_samples_per_chunk:_100_duplication_factor:_1-10        5              4              -20.00%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1_samples_per_chunk:_100_duplication_factor:_3-10        11             10             -9.09%

benchmark                                                                                                        old bytes     new bytes     delta
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1000_samples_per_chunk:_100_duplication_factor:_1-10     73200         48848         -33.27%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1000_samples_per_chunk:_100_duplication_factor:_3-10     285186        212123        -25.62%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_100_samples_per_chunk:_100_duplication_factor:_1-10      7824          5140          -34.30%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_100_samples_per_chunk:_100_duplication_factor:_3-10      30560         22402         -26.70%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1_samples_per_chunk:_100_duplication_factor:_1-10        120           96            -20.00%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1_samples_per_chunk:_100_duplication_factor:_3-10        472           392           -16.95%

Note that the benchmark is an ideal case where the length of the slice likely remains the same. In the real world where the slice length will differ between queries, the gains might be different, but I expect net gains overall.

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@codesome codesome requested a review from a team as a code owner May 31, 2023 05:51
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Copy link
Contributor

@charleskorn charleskorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Is it worth benchmarking the case where the number of chunks varies to confirm there are no issues with the expected pattern of behaviour?

@@ -28,6 +28,8 @@ type mergeIterator struct {
currErr error
}

// newMergeIterator returns an iterator that merges generic chunks in batches.
// This functions must not hold a reference to the `cs` slice.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit]

Suggested change
// This functions must not hold a reference to the `cs` slice.
// This function must not hold a reference to the `cs` slice.


func getGenericChunkSlice(n int) []GenericChunk {
sn := genericChunkSlicePool.Get()
if sn == nil || cap(sn) < n {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would some kind of pool with different buckets for different ranges of n be better here, to avoid the pool only ever having large slices?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 in a multi tenant installation I assume you get all kinds of different size queries so you'll end up increasing the memory footprint over time by having more and more large slices in the pool. Maybe measure first with a histogram in some busy environment to get a good idea on buckets?

Note: I've checked that this is invoked after applying the max chunks per query limit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can maybe use SlabPool (code) which can pack together smaller slices into one big slice. We used to have bucketed pool, but it was unused and removed in #4996

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use this code in the store-gateways. The way we size slabs is to account for the worst-case or p99 case of number of chunks per series. This is an example for slices of chunks.

So if a series can have at most 300 chunks, have a slab size of 300. If all series have fewer chunks, e.g. 3, then one slab can fit 100 series. The overhead isn't huge, but you also get reduced number of overall allocations and maybe even better CPU cache locality since sequential slices are sequential in memory.

@krajorama krajorama self-requested a review May 31, 2023 09:09
@codesome codesome marked this pull request as draft May 31, 2023 15:00
@codesome
Copy link
Member Author

I converted this PR to a draft since from the above comment thread there seem to be multiple different ways this can be done. Need to determine which is better here. This PR is not my top priority at the moment, so I will come back to this sometime later (but if anyone wants to pursue this further, feel free!)

@pracucci
Copy link
Collaborator

I'm closing this PR since I believe none is actively working on it. We can reopen anytime.

@pracucci pracucci closed this Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants