Add a generic chunk pool for batch.NewChunkMergeIterator #5110

codesome · 2023-05-31T05:51:37Z

What this PR does

Adds a chunk pool to batch.NewChunkMergeIterator. Here are the benchmark results:

BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1000_samples_per_chunk:_100_duplication_factor:_1-10     3194638       3174288       -0.64%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1000_samples_per_chunk:_100_duplication_factor:_3-10     9529323       9468404       -0.64%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_100_samples_per_chunk:_100_duplication_factor:_1-10      321109        318679        -0.76%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_100_samples_per_chunk:_100_duplication_factor:_3-10      952047        946160        -0.62%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1_samples_per_chunk:_100_duplication_factor:_1-10        3695          3652          -1.16%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1_samples_per_chunk:_100_duplication_factor:_3-10        10721         10670         -0.48%

benchmark                                                                                                        old allocs     new allocs     delta
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1000_samples_per_chunk:_100_duplication_factor:_1-10     1004           1003           -0.10%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1000_samples_per_chunk:_100_duplication_factor:_3-10     3008           3007           -0.03%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_100_samples_per_chunk:_100_duplication_factor:_1-10      104            103            -0.96%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_100_samples_per_chunk:_100_duplication_factor:_3-10      308            307            -0.32%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1_samples_per_chunk:_100_duplication_factor:_1-10        5              4              -20.00%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1_samples_per_chunk:_100_duplication_factor:_3-10        11             10             -9.09%

benchmark                                                                                                        old bytes     new bytes     delta
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1000_samples_per_chunk:_100_duplication_factor:_1-10     73200         48848         -33.27%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1000_samples_per_chunk:_100_duplication_factor:_3-10     285186        212123        -25.62%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_100_samples_per_chunk:_100_duplication_factor:_1-10      7824          5140          -34.30%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_100_samples_per_chunk:_100_duplication_factor:_3-10      30560         22402         -26.70%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1_samples_per_chunk:_100_duplication_factor:_1-10        120           96            -20.00%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1_samples_per_chunk:_100_duplication_factor:_3-10        472           392           -16.95%

Note that the benchmark is an ideal case where the length of the slice likely remains the same. In the real world where the slice length will differ between queries, the gains might be different, but I expect net gains overall.

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

charleskorn

Overall LGTM. Is it worth benchmarking the case where the number of chunks varies to confirm there are no issues with the expected pattern of behaviour?

charleskorn · 2023-05-31T07:38:40Z

pkg/querier/batch/merge.go

@@ -28,6 +28,8 @@ type mergeIterator struct {
 	currErr error
 }

+// newMergeIterator returns an iterator that merges generic chunks in batches.
+// This functions must not hold a reference to the `cs` slice.


[nit]

Suggested change

// This functions must not hold a reference to the `cs` slice.

// This function must not hold a reference to the `cs` slice.

charleskorn · 2023-05-31T07:41:23Z

pkg/querier/batch/batch.go

+
+func getGenericChunkSlice(n int) []GenericChunk {
+	sn := genericChunkSlicePool.Get()
+	if sn == nil || cap(sn) < n {


Would some kind of pool with different buckets for different ranges of n be better here, to avoid the pool only ever having large slices?

+1 in a multi tenant installation I assume you get all kinds of different size queries so you'll end up increasing the memory footprint over time by having more and more large slices in the pool. Maybe measure first with a histogram in some busy environment to get a good idea on buckets?

Note: I've checked that this is invoked after applying the max chunks per query limit.

you can maybe use SlabPool (code) which can pack together smaller slices into one big slice. We used to have bucketed pool, but it was unused and removed in #4996

we use this code in the store-gateways. The way we size slabs is to account for the worst-case or p99 case of number of chunks per series. This is an example for slices of chunks.

So if a series can have at most 300 chunks, have a slab size of 300. If all series have fewer chunks, e.g. 3, then one slab can fit 100 series. The overhead isn't huge, but you also get reduced number of overall allocations and maybe even better CPU cache locality since sequential slices are sequential in memory.

codesome · 2023-05-31T15:02:57Z

I converted this PR to a draft since from the above comment thread there seem to be multiple different ways this can be done. Need to determine which is better here. This PR is not my top priority at the moment, so I will come back to this sometime later (but if anyone wants to pursue this further, feel free!)

pracucci · 2023-08-24T08:08:17Z

I'm closing this PR since I believe none is actively working on it. We can reopen anytime.

codesome requested a review from a team as a code owner May 31, 2023 05:51

codesome mentioned this pull request May 31, 2023

Reuse iterator in batch.NewChunkMergeIterator #5097

Merged

3 tasks

Add a generic chunk pool for batch.NewChunkMergeIterator

4e7bc2e

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

codesome force-pushed the codesome/pool branch from 4ea6020 to 4e7bc2e Compare May 31, 2023 07:29

codesome requested review from pracucci and charleskorn May 31, 2023 07:30

charleskorn reviewed May 31, 2023

View reviewed changes

krajorama self-requested a review May 31, 2023 09:09

codesome marked this pull request as draft May 31, 2023 15:00

pracucci closed this Aug 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a generic chunk pool for batch.NewChunkMergeIterator #5110

Add a generic chunk pool for batch.NewChunkMergeIterator #5110

codesome commented May 31, 2023 •

edited

Loading

charleskorn left a comment

charleskorn May 31, 2023

charleskorn May 31, 2023

krajorama May 31, 2023

dimitarvdimitrov May 31, 2023

dimitarvdimitrov May 31, 2023

codesome commented May 31, 2023

pracucci commented Aug 24, 2023

	// This functions must not hold a reference to the `cs` slice.
	// This function must not hold a reference to the `cs` slice.

Add a generic chunk pool for batch.NewChunkMergeIterator #5110

Add a generic chunk pool for batch.NewChunkMergeIterator #5110

Conversation

codesome commented May 31, 2023 • edited Loading

What this PR does

Checklist

charleskorn left a comment

Choose a reason for hiding this comment

charleskorn May 31, 2023

Choose a reason for hiding this comment

charleskorn May 31, 2023

Choose a reason for hiding this comment

krajorama May 31, 2023

Choose a reason for hiding this comment

dimitarvdimitrov May 31, 2023

Choose a reason for hiding this comment

dimitarvdimitrov May 31, 2023

Choose a reason for hiding this comment

codesome commented May 31, 2023

pracucci commented Aug 24, 2023

codesome commented May 31, 2023 •

edited

Loading