Store-gateway: Add metrics for fine-grained chunks caching #4213

dimitarvdimitrov · 2023-02-09T15:14:51Z

Signed-off-by: Dimitar Dimitrov dimitar.dimitrov@grafana.com

Which issue(s) this PR fixes or relates to

Related to #3939; upstreams part of #3968

What this PR does

Proposes changes to the metrics that the store-gateway exposes to give more granular insight into the sizes of chunks through the loading process. At this point this PR should be a noop. I'm opening more as a preliminary discussion for the followup PR which will also record these metrics.

This PR adds a stage label to these four metrics

cortex_bucket_store_series_data_fetched
cortex_bucket_store_series_data_size_fetched_bytes
cortex_bucket_store_series_data_touched
cortex_bucket_store_series_data_size_touched_bytes

The stage label doesn't have a value for series and postings, so for them these metrics work the same way. The stage label is also empty even for chunks when the store-gateway is non-streaming and when the fine-grained chunks caching for the streaming store-gateway is disabled (as of this PR this caching is still not implemented).

When the fine-grained caching is enabled the stage label will have the following values for data_type="chunks"

cortex_bucket_store_series_data_fetched
- normal - the number of chunks that we had to fetch; this doesn't include refetched chunks; this includes chunks outside of the request minT/maxT
- refetch - the number of chunks from ranges that we had to refetch because they were underfetched
cortex_bucket_store_series_data_size_fetched_bytes
- normal - the raw size of chunk ranges fetched from both the cache and the bucket; this doesn't include overfetched bytes caused by the partitioner
  - I am not sure whether or not we should include the partitioner effect into this or not. If we include it, then we can make weaker conclusions about how effective our chunk length estimation is because the fetched bytes can include almost arbitrary bytes that we have little control over
- refetch - same as normal, but includes only the raw range sizes that had to be refetched
cortex_bucket_store_series_data_touched
- parsed - the number of chunks that we parsed; this is equal to cortex_bucket_store_series_data_fetched{stage="normal"} and is here for completeness
- selected - the number of chunks that were selected because they overlap with the request's minT/maxT; the value of this is included in parsed
cortex_bucket_store_series_data_size_touched_bytes
- parsed - the total bytes from the raw ranges that were enough to parse the range; this is the total size of raw ranges minus the bytes we overfetched due to overestimating the chunk size
- selected - the raw size of chunks that were selected because they overlap with the request's minT/maxT

To make these changes I also had to change the dashboards so that we don't double-count stage="parsed" and stage="selected" as their records overlap.

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

dimitarvdimitrov · 2023-02-09T15:16:27Z

i'll keep in draft until i get some feedback. Will also add a changelog entry after that.

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

pracucci

I'm 👍 👍 👍 to improve these metrics. I left few comments, but overall LGTM.

pracucci · 2023-02-09T18:07:11Z

pkg/storegateway/bucket.go

-	s.metrics.seriesDataSizeTouched.WithLabelValues("chunks").Observe(float64(stats.chunksTouchedSizeSum))
-	s.metrics.seriesDataSizeFetched.WithLabelValues("chunks").Observe(float64(stats.chunksFetchedSizeSum))
+	s.metrics.seriesDataTouched.WithLabelValues("postings", "").Observe(float64(stats.postingsTouched))
+	s.metrics.seriesDataFetched.WithLabelValues("postings", "").Observe(float64(stats.postingsFetched))


In the metric description of the "fetched" metric, can you clarify it's "fetched from the object storage", please? And in the metric of "touched" can you clarify it's "either fetched from the object storage or cache"?

pracucci · 2023-02-09T18:07:45Z

pkg/storegateway/bucket.go

+	// Currently fineGrainedChunksCachingEnabled is always false. After it is configurable,
+	// these stats will be recorded by the new caching implementation.
+	if s.fineGrainedChunksCachingEnabled && s.maxSeriesPerBatch > 0 {
+		s.metrics.seriesDataFetched.WithLabelValues("chunks", "normal").Observe(float64(stats.chunksFetched))


Have you considered renaming "normal" into "fetch"? Basically it's a normal "fetch".

[nit] Also, does it make sense to call it "fetched" and "refetched", given we have "parsed" and "selected"?

pracucci · 2023-02-09T18:08:59Z

pkg/storegateway/stats.go

+	chunksParsed           int
+	chunksParsedSizeSum    int
+	chunksRefetched        int
+	chunksRefetchedSizeSum int


It's hard to review this PR because I would like to see how these are tracked.

pracucci · 2023-02-09T18:11:19Z

pkg/storegateway/bucket.go

+		s.metrics.seriesDataTouched.WithLabelValues("chunks", "parsed").Observe(float64(stats.chunksParsed))
+		s.metrics.seriesDataSizeTouched.WithLabelValues("chunks", "parsed").Observe(float64(stats.chunksParsedSizeSum))
+
+		s.metrics.seriesDataTouched.WithLabelValues("chunks", "selected").Observe(float64(stats.chunksTouched))


Have you considered renaming "selected" to "returned"? To my understanding these are the chunks returned because within the query min/max time range.

Store-gateway: Add metrics for fine-grained chunks caching

6661bd5

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

dimitarvdimitrov added the component/store-gateway label Feb 9, 2023

dimitarvdimitrov requested a review from a team as a code owner February 9, 2023 15:14

dimitarvdimitrov marked this pull request as draft February 9, 2023 15:16

Update compiled mixin

2efaac0

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

pracucci approved these changes Feb 9, 2023

View reviewed changes

dimitarvdimitrov mentioned this pull request Feb 11, 2023

Store-gateway: cache chunks ranges #4227

Merged

3 tasks

dimitarvdimitrov closed this in #4227 Feb 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store-gateway: Add metrics for fine-grained chunks caching #4213

Store-gateway: Add metrics for fine-grained chunks caching #4213

dimitarvdimitrov commented Feb 9, 2023 •

edited

Loading

dimitarvdimitrov commented Feb 9, 2023

pracucci left a comment

pracucci Feb 9, 2023

pracucci Feb 9, 2023

pracucci Feb 9, 2023

pracucci Feb 9, 2023

pracucci Feb 9, 2023

Store-gateway: Add metrics for fine-grained chunks caching #4213

Store-gateway: Add metrics for fine-grained chunks caching #4213

Conversation

dimitarvdimitrov commented Feb 9, 2023 • edited Loading

Which issue(s) this PR fixes or relates to

What this PR does

Checklist

dimitarvdimitrov commented Feb 9, 2023

pracucci left a comment

Choose a reason for hiding this comment

pracucci Feb 9, 2023

Choose a reason for hiding this comment

pracucci Feb 9, 2023

Choose a reason for hiding this comment

pracucci Feb 9, 2023

Choose a reason for hiding this comment

pracucci Feb 9, 2023

Choose a reason for hiding this comment

pracucci Feb 9, 2023

Choose a reason for hiding this comment

dimitarvdimitrov commented Feb 9, 2023 •

edited

Loading