Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse iterator in batch.NewChunkMergeIterator #5097

Merged
merged 9 commits into from
May 31, 2023
Merged

Conversation

codesome
Copy link
Member

@codesome codesome commented May 29, 2023

What this PR does

Implements the re-use of iterator for batch.NewChunkMergeIterator. Although querier uses two other chunk iterators, since this one is enabled by default (and I guess most commonly), I gave this one a try first.

While this benchmark verifies the re-use of iterators is working for this one, it was not trivial to verify that at the query level, it is indeed passing down the used iterator for re-use - I need some help verifying this. See #5097 (comment)

Here are the benchmark results (without the chunk pool that was added and later removed from this PR):

benchmark                                                                                                        old ns/op     new ns/op     delta
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1000_samples_per_chunk:_100_duplication_factor:_1-10     3184123       3214727       +0.96%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1000_samples_per_chunk:_100_duplication_factor:_3-10     9495067       9539729       +0.47%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_100_samples_per_chunk:_100_duplication_factor:_1-10      320369        321496        +0.35%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_100_samples_per_chunk:_100_duplication_factor:_3-10      949704        955471        +0.61%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1_samples_per_chunk:_100_duplication_factor:_1-10        4313          3685          -14.56%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1_samples_per_chunk:_100_duplication_factor:_3-10        12090         10822         -10.49%

benchmark                                                                                                        old allocs     new allocs     delta
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1000_samples_per_chunk:_100_duplication_factor:_1-10     1013           1004           -0.89%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1000_samples_per_chunk:_100_duplication_factor:_3-10     3023           3008           -0.50%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_100_samples_per_chunk:_100_duplication_factor:_1-10      113            104            -7.96%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_100_samples_per_chunk:_100_duplication_factor:_3-10      323            308            -4.64%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1_samples_per_chunk:_100_duplication_factor:_1-10        14             5              -64.29%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1_samples_per_chunk:_100_duplication_factor:_3-10        26             11             -57.69%

benchmark                                                                                                        old bytes     new bytes     delta
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1000_samples_per_chunk:_100_duplication_factor:_1-10     75192         73200         -2.65%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1000_samples_per_chunk:_100_duplication_factor:_3-10     289689        285184        -1.56%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_100_samples_per_chunk:_100_duplication_factor:_1-10      9816          7824          -20.29%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_100_samples_per_chunk:_100_duplication_factor:_3-10      35064         30560         -12.85%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1_samples_per_chunk:_100_duplication_factor:_1-10        2112          120           -94.32%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1_samples_per_chunk:_100_duplication_factor:_3-10        4976          472           -90.51%

Which issue(s) this PR fixes or relates to

Addresses the useful part of https://github.com/grafana/mimir/pull/4886/files/6c4ec7372d19c84ee2072285fd653333f7338589#r1201931006

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@codesome codesome requested a review from a team as a code owner May 29, 2023 11:15
@codesome codesome requested a review from charleskorn May 29, 2023 11:15
benchmark                                                                                                        old ns/op     new ns/op     delta
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1000_samples_per_chunk:_100_duplication_factor:_1-10     3184123       3214727       +0.96%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1000_samples_per_chunk:_100_duplication_factor:_3-10     9495067       9539729       +0.47%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_100_samples_per_chunk:_100_duplication_factor:_1-10      320369        321496        +0.35%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_100_samples_per_chunk:_100_duplication_factor:_3-10      949704        955471        +0.61%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1_samples_per_chunk:_100_duplication_factor:_1-10        4313          3685          -14.56%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1_samples_per_chunk:_100_duplication_factor:_3-10        12090         10822         -10.49%

benchmark                                                                                                        old allocs     new allocs     delta
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1000_samples_per_chunk:_100_duplication_factor:_1-10     1013           1004           -0.89%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1000_samples_per_chunk:_100_duplication_factor:_3-10     3023           3008           -0.50%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_100_samples_per_chunk:_100_duplication_factor:_1-10      113            104            -7.96%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_100_samples_per_chunk:_100_duplication_factor:_3-10      323            308            -4.64%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1_samples_per_chunk:_100_duplication_factor:_1-10        14             5              -64.29%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1_samples_per_chunk:_100_duplication_factor:_3-10        26             11             -57.69%

benchmark                                                                                                        old bytes     new bytes     delta
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1000_samples_per_chunk:_100_duplication_factor:_1-10     75192         73200         -2.65%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1000_samples_per_chunk:_100_duplication_factor:_3-10     289689        285184        -1.56%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_100_samples_per_chunk:_100_duplication_factor:_1-10      9816          7824          -20.29%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_100_samples_per_chunk:_100_duplication_factor:_3-10      35064         30560         -12.85%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1_samples_per_chunk:_100_duplication_factor:_1-10        2112          120           -94.32%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1_samples_per_chunk:_100_duplication_factor:_3-10        4976          472           -90.51%

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
@codesome
Copy link
Member Author

I modified BenchmarkDistributorQuerier_Select with this diff

The diff
--- a/pkg/querier/distributor_queryable_test.go
+++ b/pkg/querier/distributor_queryable_test.go
@@ -8,6 +8,7 @@ package querier
import (
  "context"
  "fmt"
+	"github.com/grafana/mimir/pkg/querier/batch"
  "math"
  "strconv"
  "testing"
@@ -572,17 +573,27 @@ func BenchmarkDistributorQuerier_Select(b *testing.B) {
  d.On("QueryStream", mock.Anything, mock.Anything, mock.Anything, mock.Anything).Return(response, nil)

  ctx := user.InjectOrgID(context.Background(), "0")
-	queryable := newDistributorQueryable(d, mergeChunks, newMockConfigProvider(0), nil, log.NewNopLogger())
+	queryable := newDistributorQueryable(d, batch.NewChunkMergeIterator, newMockConfigProvider(0), nil, log.NewNopLogger())
  querier, err := queryable.Querier(ctx, math.MinInt64, math.MaxInt64)
  require.NoError(b, err)

  b.ResetTimer()
+	b.ReportAllocs()

  for n := 0; n < b.N; n++ {
  	seriesSet := querier.Select(true, &storage.SelectHints{Start: math.MinInt64, End: math.MaxInt64})
  	if seriesSet.Err() != nil {
  		b.Fatal(seriesSet.Err())
  	}
+
+		var it chunkenc.Iterator
+		for seriesSet.Next() {
+			ss := seriesSet.At()
+			it = ss.Iterator(it)
+			if it.Err() != nil {
+				b.Fatal(it.Err())
+			}
+		}
  }
}

and here are the benchmark results. The big difference is likely because there is only 1 chunk with 1 sample. So it does not represent real-world load. But it confirms that the re-use is happening properly.

benchmark                                 old ns/op     new ns/op     delta
BenchmarkDistributorQuerier_Select-10     25454832      21200190      -16.71%

benchmark                                 old allocs     new allocs     delta
BenchmarkDistributorQuerier_Select-10     180129         90129          -49.96%

benchmark                                 old bytes     new bytes     delta
BenchmarkDistributorQuerier_Select-10     23246433      3300197       -85.80%

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
@codesome codesome requested a review from pracucci May 29, 2023 12:40
@charleskorn
Copy link
Contributor

Implements the re-use of iterator for batch.NewChunkMergeIterator. Although querier uses two other chunk iterators, since this one is enabled by default (and I guess most commonly), I gave this one a try first.

Under what circumstances would we want to enable the other kinds of iterators? (I'm wondering if we should separately deprecate and eventually remove the others)

Copy link
Contributor

@charleskorn charleskorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM.

It's interesting that BenchmarkNewChunkMergeIterator_CreateAndIterate shows a slight CPU time degradation for the test cases with larger numbers of chunks - any ideas why this is the case? I can't see an explanation for this.

pkg/querier/batch/batch.go Show resolved Hide resolved
pkg/querier/batch/batch.go Outdated Show resolved Hide resolved
pkg/querier/batch/non_overlapping.go Outdated Show resolved Hide resolved
pkg/querier/batch/non_overlapping.go Outdated Show resolved Hide resolved
pkg/querier/batch/non_overlapping_test.go Outdated Show resolved Hide resolved
@codesome
Copy link
Member Author

Under what circumstances would we want to enable the other kinds of iterators?

I personally don't know. I saw that this one is mostly used, so ignored the other two in the meantime.

It's interesting that BenchmarkNewChunkMergeIterator_CreateAndIterate shows a slight CPU time degradation for the test cases with larger numbers of chunks - any ideas why this is the case?

The first run showed slight decrease, but it was slightly higher. Don't know the reason myself as well.

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job! The code changes look safe to me, modulo the pool. Not saying the pool is unsafe, but a bit hard to reason about and I would defer it to a dedicated PR (see related comment about it).

pkg/querier/iterators/chunk_merge_iterator.go Outdated Show resolved Hide resolved
pkg/querier/matrix.go Outdated Show resolved Hide resolved
batches: make(batchStream, 0, len(its)),
batchesBuf: make(batchStream, len(its)),
css := partitionChunks(gc)
putGenericChunkSlice(gc)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Design-wise I'm not a big fan of putting the slice to the pool here. This gives for granted the caller hasn't retained a reference to the slice and won't use it anymore, but this is not defined anywhere (e.g. not commented in this function and bubbled up in the callers chain).

Before we deep dive into how to make this more robust: how much benefit does this pool give? Do we really need it, in particular do we really need it in this PR or it's something we can address in a separate PR to have a dedicated conversation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't need it in this PR, but the benchmark show some gains. The difference is just the pool (both the benchmarks re-used the iterators). I can open a different PR for this.

benchmark                                                                                                        old ns/op     new ns/op     delta
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1000_samples_per_chunk:_100_duplication_factor:_1-10     3232329       3207756       -0.76%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1000_samples_per_chunk:_100_duplication_factor:_3-10     9581962       9544844       -0.39%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_100_samples_per_chunk:_100_duplication_factor:_1-10      323507        321477        -0.63%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_100_samples_per_chunk:_100_duplication_factor:_3-10      956364        952313        -0.42%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1_samples_per_chunk:_100_duplication_factor:_1-10        3703          3681          -0.59%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1_samples_per_chunk:_100_duplication_factor:_3-10        10817         10755         -0.57%

benchmark                                                                                                        old allocs     new allocs     delta
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1000_samples_per_chunk:_100_duplication_factor:_1-10     1004           1003           -0.10%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1000_samples_per_chunk:_100_duplication_factor:_3-10     3008           3007           -0.03%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_100_samples_per_chunk:_100_duplication_factor:_1-10      104            103            -0.96%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_100_samples_per_chunk:_100_duplication_factor:_3-10      308            307            -0.32%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1_samples_per_chunk:_100_duplication_factor:_1-10        5              4              -20.00%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1_samples_per_chunk:_100_duplication_factor:_3-10        11             10             -9.09%

benchmark                                                                                                        old bytes     new bytes     delta
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1000_samples_per_chunk:_100_duplication_factor:_1-10     73205         48651         -33.54%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1000_samples_per_chunk:_100_duplication_factor:_3-10     285187        212730        -25.41%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_100_samples_per_chunk:_100_duplication_factor:_1-10      7824          5138          -34.33%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_100_samples_per_chunk:_100_duplication_factor:_3-10      30560         22402         -26.70%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1_samples_per_chunk:_100_duplication_factor:_1-10        120           96            -20.00%
BenchmarkNewChunkMergeIterator_CreateAndIterate/chunks:_1_samples_per_chunk:_100_duplication_factor:_3-10        472           392           -16.95%

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have opened #5110 for the pool and I have made the pool usage a little more robust.

pkg/querier/batch/merge.go Show resolved Hide resolved
@pracucci
Copy link
Collaborator

Under what circumstances would we want to enable the other kinds of iterators?

At Grafana Labs we only use the default -querier.batch-iterators=true. I'm also not aware of anyone in the community using any other iterator on purpose. The other iterators are there since years and I think we could just deprecated them and remove them (also I don't see much value spending time implementing the iterators reuse for them).

@charleskorn
Copy link
Contributor

At Grafana Labs we only use the default -querier.batch-iterators=true. I'm also not aware of anyone in the community using any other iterator on purpose. The other iterators are there since years and I think we could just deprecated them and remove them (also I don't see much value spending time implementing the iterators reuse for them).

I've created #5107 to track this.

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@pracucci pracucci merged commit 1826aea into main May 31, 2023
@pracucci pracucci deleted the codesome/reuse-iter branch May 31, 2023 06:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants