Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/main' into yuri/native-hist-remo…
Browse files Browse the repository at this point in the history
…te-ruler-reads-dashboard
  • Loading branch information
duricanikolic committed Jul 23, 2024
2 parents 588ceff + 4410382 commit 632cda5
Show file tree
Hide file tree
Showing 90 changed files with 761 additions and 1,344 deletions.
17 changes: 16 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@
* [CHANGE] Query-frontend: Remove deprecated `frontend.align_queries_with_step` YAML configuration. The configuration option has been moved to per-tenant and default `limits` since Mimir 2.12. #8733 #8735
* [CHANGE] Store-gateway: Change default of `-blocks-storage.bucket-store.max-concurrent` to 200. #8768
* [CHANGE] Added new metric `cortex_compactor_disk_out_of_space_errors_total` which counts how many times a compaction failed due to the compactor being out of disk, alert if there is a single increase. #8237 #8278
* [CHANGE] Store-gateway: Remove experimental parameter `-blocks-storage.bucket-store.series-selection-strategy`. The default strategy is now `worst-case`. #8702
* [CHANGE] Store-gateway: Rename `-blocks-storage.bucket-store.series-selection-strategies.worst-case-series-preference` to `-blocks-storage.bucket-store.series-fetch-preference` and promote to stable. #8702
* [CHANGE] Querier, store-gateway: remove deprecated `-querier.prefer-streaming-chunks-from-store-gateways=true`. Streaming from store-gateways is now always enabled. #8696
* [CHANGE] Ingester: remove deprecated `-ingester.return-only-grpc-errors`. #8699
* [FEATURE] Querier: add experimental streaming PromQL engine, enabled with `-querier.query-engine=mimir`. #8422 #8430 #8454 #8455 #8360 #8490 #8508 #8577 #8671
* [FEATURE] Experimental Kafka-based ingest storage. #6888 #6894 #6929 #6940 #6951 #6974 #6982 #7029 #7030 #7091 #7142 #7147 #7148 #7153 #7160 #7193 #7349 #7376 #7388 #7391 #7393 #7394 #7402 #7404 #7423 #7424 #7437 #7486 #7503 #7508 #7540 #7621 #7682 #7685 #7694 #7695 #7696 #7697 #7701 #7733 #7734 #7741 #7752 #7838 #7851 #7871 #7877 #7880 #7882 #7887 #7891 #7925 #7955 #7967 #8031 #8063 #8077 #8088 #8135 #8176 #8184 #8194 #8216 #8217 #8222 #8233 #8503 #8542 #8579 #8657 #8686 #8688 #8703 #8706 #8708 #8738 #8750
* What it is:
Expand Down Expand Up @@ -67,14 +71,19 @@
* Overview dashboard: status, read/write latency and queries/ingestion per sec panels, `cortex_request_duration_seconds` metric. #7674 #8502 #8791
* Writes dashboard: `cortex_request_duration_seconds` metric. #8757 #8791
* Reads dashboard: `cortex_request_duration_seconds` metric. #8752
* Rollout progress dashboard. #8779
* Rollout progress dashboard: `cortex_request_duration_seconds` metric. #8779
* Alertmanager dashboard: `cortex_request_duration_seconds` metric. #8792
* Ruler dashboard: `cortex_request_duration_seconds` metric. #8795
* Queries dashboard: `cortex_request_duration_seconds` metric. #8800
* Remote ruler reads dashboard: `cortex_request_duration_seconds` metric.
* [ENHANCEMENT] Alerts: `MimirRunningIngesterReceiveDelayTooHigh` alert has been tuned to be more reactive to high receive delay. #8538
* [ENHANCEMENT] Dashboards: improve end-to-end latency and strong read consistency panels when experimental ingest storage is enabled. #8543
* [ENHANCEMENT] Dashboards: Add panels for monitoring ingester autoscaling when not using ingest-storage. These panels are disabled by default, but can be enabled using the `autoscaling.ingester.enabled: true` config option. #8484
* [ENHANCEMENT] Dashboards: add panels to show writes to experimental ingest storage backend in the "Mimir / Ruler" dashboard, when `_config.show_ingest_storage_panels` is enabled. #8732
* [ENHANCEMENT] Dashboards: show all series in tooltips on time series dashboard panels. #8748
* [ENHANCEMENT] Dashboards: add compactor autoscaling panels to "Mimir / Compactor" dashboard. The panels are disabled by default, but can be enabled setting `_config.autoscaling.compactor.enabled` to `true`. #8777
* [ENHANCEMENT] Alerts: added `MimirKafkaClientBufferedProduceBytesTooHigh` alert. #8763
* [ENHANCEMENT] Dashboards: added "Kafka produced records / sec" panel to "Mimir / Writes" dashboard. #8763
* [BUGFIX] Dashboards: fix "current replicas" in autoscaling panels when HPA is not active. #8566
* [BUGFIX] Alerts: do not fire `MimirRingMembersMismatch` during the migration to experimental ingest storage. #8727

Expand All @@ -94,6 +103,12 @@
### Mimirtool

* [CHANGE] Analyze Rules: Count recording rules used in rules group as used. #6133
* [CHANGE] Remove deprecated `--rule-files` flag in favor of CLI arguments for the following commands: #8701
* `mimirtool rules load`
* `mimirtool rules sync`
* `mimirtool rules diff`
* `mimirtool rules check`
* `mimirtool rules prepare`

### Mimir Continuous Test

Expand Down
55 changes: 6 additions & 49 deletions cmd/mimir/config-descriptor.json
Original file line number Diff line number Diff line change
Expand Up @@ -1865,17 +1865,6 @@
"fieldType": "boolean",
"fieldCategory": "advanced"
},
{
"kind": "field",
"name": "prefer_streaming_chunks_from_store_gateways",
"required": false,
"desc": "Request store-gateways stream chunks. Store-gateways will only respond with a stream of chunks if the target store-gateway supports this, and this preference will be ignored by store-gateways that do not support this.",
"fieldValue": null,
"fieldDefaultValue": true,
"fieldFlag": "querier.prefer-streaming-chunks-from-store-gateways",
"fieldType": "boolean",
"fieldCategory": "experimental"
},
{
"kind": "field",
"name": "streaming_chunks_per_ingester_series_buffer_size",
Expand Down Expand Up @@ -3469,17 +3458,6 @@
"fieldType": "int",
"fieldCategory": "advanced"
},
{
"kind": "field",
"name": "return_only_grpc_errors",
"required": false,
"desc": "When enabled only gRPC errors will be returned by the ingester.",
"fieldValue": null,
"fieldDefaultValue": true,
"fieldFlag": "ingester.return-only-grpc-errors",
"fieldType": "boolean",
"fieldCategory": "deprecated"
},
{
"kind": "field",
"name": "use_ingester_owned_series_for_limits",
Expand Down Expand Up @@ -9492,35 +9470,14 @@
},
{
"kind": "field",
"name": "series_selection_strategy",
"required": false,
"desc": "This option controls the strategy to selection of series and deferring application of matchers. A more aggressive strategy will fetch less posting lists at the cost of more series. This is useful when querying large blocks in which many series share the same label name and value. Supported values (most aggressive to least aggressive): speculative, worst-case, worst-case-small-posting-lists, all.",
"fieldValue": null,
"fieldDefaultValue": "worst-case",
"fieldFlag": "blocks-storage.bucket-store.series-selection-strategy",
"fieldType": "string",
"fieldCategory": "experimental"
},
{
"kind": "block",
"name": "series_selection_strategies",
"name": "series_fetch_preference",
"required": false,
"desc": "",
"blockEntries": [
{
"kind": "field",
"name": "worst_case_series_preference",
"required": false,
"desc": "This option is only used when blocks-storage.bucket-store.series-selection-strategy=worst-case. Increasing the series preference results in fetching more series than postings. Must be a positive floating point number.",
"fieldValue": null,
"fieldDefaultValue": 0.75,
"fieldFlag": "blocks-storage.bucket-store.series-selection-strategies.worst-case-series-preference",
"fieldType": "float",
"fieldCategory": "experimental"
}
],
"desc": "This parameter controls the trade-off in fetching series versus fetching postings to fulfill a series request. Increasing the series preference results in fetching more series and reducing the volume of postings fetched. Reducing the series preference results in the opposite. Increase this parameter to reduce the rate of fetched series bytes (see \"Mimir / Queries\" dashboard) or API calls to the object store. Must be a positive floating point number.",
"fieldValue": null,
"fieldDefaultValue": null
"fieldDefaultValue": 0.75,
"fieldFlag": "blocks-storage.bucket-store.series-fetch-preference",
"fieldType": "float",
"fieldCategory": "advanced"
}
],
"fieldValue": null,
Expand Down
10 changes: 2 additions & 8 deletions cmd/mimir/help-all.txt.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -671,12 +671,10 @@ Usage of ./cmd/mimir/mimir:
Max size - in bytes - of a gap for which the partitioner aggregates together two bucket GET object requests. (default 524288)
-blocks-storage.bucket-store.posting-offsets-in-mem-sampling int
Controls what is the ratio of postings offsets that the store will hold in memory. (default 32)
-blocks-storage.bucket-store.series-fetch-preference float
This parameter controls the trade-off in fetching series versus fetching postings to fulfill a series request. Increasing the series preference results in fetching more series and reducing the volume of postings fetched. Reducing the series preference results in the opposite. Increase this parameter to reduce the rate of fetched series bytes (see "Mimir / Queries" dashboard) or API calls to the object store. Must be a positive floating point number. (default 0.75)
-blocks-storage.bucket-store.series-hash-cache-max-size-bytes uint
Max size - in bytes - of the in-memory series hash cache. The cache is shared across all tenants and it's used only when query sharding is enabled. (default 1073741824)
-blocks-storage.bucket-store.series-selection-strategies.worst-case-series-preference float
[experimental] This option is only used when blocks-storage.bucket-store.series-selection-strategy=worst-case. Increasing the series preference results in fetching more series than postings. Must be a positive floating point number. (default 0.75)
-blocks-storage.bucket-store.series-selection-strategy string
[experimental] This option controls the strategy to selection of series and deferring application of matchers. A more aggressive strategy will fetch less posting lists at the cost of more series. This is useful when querying large blocks in which many series share the same label name and value. Supported values (most aggressive to least aggressive): speculative, worst-case, worst-case-small-posting-lists, all. (default "worst-case")
-blocks-storage.bucket-store.sync-dir string
Directory to store synchronized TSDB index headers. This directory is not required to be persisted between restarts, but it's highly recommended in order to improve the store-gateway startup time. (default "./tsdb-sync/")
-blocks-storage.bucket-store.sync-interval duration
Expand Down Expand Up @@ -1553,8 +1551,6 @@ Usage of ./cmd/mimir/mimir:
[experimental] CPU utilization limit, as CPU cores, for CPU/memory utilization based read request limiting. Use 0 to disable it.
-ingester.read-path-memory-utilization-limit uint
[experimental] Memory limit, in bytes, for CPU/memory utilization based read request limiting. Use 0 to disable it.
-ingester.return-only-grpc-errors
[deprecated] When enabled only gRPC errors will be returned by the ingester. (default true)
-ingester.ring.consul.acl-token string
ACL Token used to interact with Consul.
-ingester.ring.consul.cas-retry-delay duration
Expand Down Expand Up @@ -1913,8 +1909,6 @@ Usage of ./cmd/mimir/mimir:
If true, when querying ingesters, only the minimum required ingesters required to reach quorum will be queried initially, with other ingesters queried only if needed due to failures from the initial set of ingesters. Enabling this option reduces resource consumption for the happy path at the cost of increased latency for the unhappy path. (default true)
-querier.minimize-ingester-requests-hedging-delay duration
Delay before initiating requests to further ingesters when request minimization is enabled and the initially selected set of ingesters have not all responded. Ignored if -querier.minimize-ingester-requests is not enabled. (default 3s)
-querier.prefer-streaming-chunks-from-store-gateways
[experimental] Request store-gateways stream chunks. Store-gateways will only respond with a stream of chunks if the target store-gateway supports this, and this preference will be ignored by store-gateways that do not support this. (default true)
-querier.promql-experimental-functions-enabled
[experimental] Enable experimental PromQL functions. This config option should be set on query-frontend too when query sharding is enabled.
-querier.query-engine string
Expand Down
8 changes: 0 additions & 8 deletions docs/sources/mimir/configure/about-versioning.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,6 @@ The following features are currently experimental:
- `-ingester.client.circuit-breaker.cooldown-period`
- Querier
- Use of Redis cache backend (`-blocks-storage.bucket-store.metadata-cache.backend=redis`)
- Streaming chunks from store-gateway to querier (`-querier.prefer-streaming-chunks-from-store-gateways`)
- Limiting queries based on the estimated number of chunks that will be used (`-querier.max-estimated-fetched-chunks-per-query-multiplier`)
- Max concurrency for tenant federated queries (`-tenant-federation.max-concurrent`)
- Maximum response size for active series queries (`-querier.active-series-results-max-size-bytes`)
Expand All @@ -167,7 +166,6 @@ The following features are currently experimental:
- `-query-scheduler.querier-forget-delay`
- Store-gateway
- Use of Redis cache backend (`-blocks-storage.bucket-store.chunks-cache.backend=redis`, `-blocks-storage.bucket-store.index-cache.backend=redis`, `-blocks-storage.bucket-store.metadata-cache.backend=redis`)
- `-blocks-storage.bucket-store.series-selection-strategy`
- Eagerly loading some blocks on startup even when lazy loading is enabled `-blocks-storage.bucket-store.index-header.eager-loading-startup-enabled`
- Read-write deployment mode
- API endpoints:
Expand Down Expand Up @@ -212,14 +210,8 @@ For details about what _deprecated_ means, see [Parameter lifecycle]({{< relref

The following features or configuration parameters are currently deprecated and will be **removed in Mimir 2.14**:

- Ingester
- `-ingester.return-only-grpc-errors`
- Ingester client
- `-ingester.client.report-grpc-codes-in-instrumentation-label-enabled`
- Mimirtool
- the flag `--rule-files`
- Querier
- the flag `-querier.prefer-streaming-chunks-from-store-gateways`

The following features or configuration parameters are currently deprecated and will be **removed in a future release (to be announced)**:

Expand Down
35 changes: 9 additions & 26 deletions docs/sources/mimir/configure/configuration-parameters/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -1265,10 +1265,6 @@ instance_limits:
# CLI flag: -ingester.error-sample-rate
[error_sample_rate: <int> | default = 10]
# (deprecated) When enabled only gRPC errors will be returned by the ingester.
# CLI flag: -ingester.return-only-grpc-errors
[return_only_grpc_errors: <boolean> | default = true]
# (experimental) When enabled, only series currently owned by ingester according
# to the ring are used when checking user per-tenant series limit.
# CLI flag: -ingester.use-ingester-owned-series-for-limits
Expand Down Expand Up @@ -1452,12 +1448,6 @@ store_gateway_client:
# CLI flag: -querier.shuffle-sharding-ingesters-enabled
[shuffle_sharding_ingesters_enabled: <boolean> | default = true]
# (experimental) Request store-gateways stream chunks. Store-gateways will only
# respond with a stream of chunks if the target store-gateway supports this, and
# this preference will be ignored by store-gateways that do not support this.
# CLI flag: -querier.prefer-streaming-chunks-from-store-gateways
[prefer_streaming_chunks_from_store_gateways: <boolean> | default = true]
# (advanced) Number of series to buffer per ingester when streaming chunks from
# ingesters.
# CLI flag: -querier.streaming-chunks-per-ingester-buffer-size
Expand Down Expand Up @@ -4151,22 +4141,15 @@ bucket_store:
# CLI flag: -blocks-storage.bucket-store.batch-series-size
[streaming_series_batch_size: <int> | default = 5000]
# (experimental) This option controls the strategy to selection of series and
# deferring application of matchers. A more aggressive strategy will fetch
# less posting lists at the cost of more series. This is useful when querying
# large blocks in which many series share the same label name and value.
# Supported values (most aggressive to least aggressive): speculative,
# worst-case, worst-case-small-posting-lists, all.
# CLI flag: -blocks-storage.bucket-store.series-selection-strategy
[series_selection_strategy: <string> | default = "worst-case"]
series_selection_strategies:
# (experimental) This option is only used when
# blocks-storage.bucket-store.series-selection-strategy=worst-case.
# Increasing the series preference results in fetching more series than
# postings. Must be a positive floating point number.
# CLI flag: -blocks-storage.bucket-store.series-selection-strategies.worst-case-series-preference
[worst_case_series_preference: <float> | default = 0.75]
# (advanced) This parameter controls the trade-off in fetching series versus
# fetching postings to fulfill a series request. Increasing the series
# preference results in fetching more series and reducing the volume of
# postings fetched. Reducing the series preference results in the opposite.
# Increase this parameter to reduce the rate of fetched series bytes (see
# "Mimir / Queries" dashboard) or API calls to the object store. Must be a
# positive floating point number.
# CLI flag: -blocks-storage.bucket-store.series-fetch-preference
[series_fetch_preference: <float> | default = 0.75]
tsdb:
# Directory to store TSDBs (including WAL) in the ingesters. This directory is
Expand Down
18 changes: 18 additions & 0 deletions docs/sources/mimir/manage/mimir-runbooks/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -1485,6 +1485,24 @@ How to **investigate**:
- Check if ingesters are processing too many records, and they need to be scaled up (vertically or horizontally).
- Check actual error in logs to see whether the `-ingest-storage.kafka.wait-strong-read-consistency-timeout` or the request timeout has been hit first.

### MimirKafkaClientBufferedProduceBytesTooHigh

This alert fires when the Kafka client buffer, used to write incoming write requests to Kafka, is getting full.

How it **works**:

- Distributor and ruler encapsulate write requests into Kafka records and send them to Kafka.
- The Kafka client has a limit on the total byte size of buffered records either sent to Kafka or sent to Kafka but not acknowledged yet.
- When the limit is reached, the Kafka client stops producing more records and fast fails.
- The limit is configured via `-ingest-storage.kafka.producer-max-buffered-bytes`.
- The default limit is configured intentionally high, so that when the buffer utilization gets close to the limit, this indicates that there's probably an issue.

How to **investigate**:

- Query `cortex_ingest_storage_writer_buffered_produce_bytes{quantile="1.0"}` metrics to see the actual buffer utilization peaks.
- If the high buffer utilization is isolated to a small set of pods, then there might be an issue in the client pods.
- If the high buffer utilization is spread across all or most pods, then there might be an issue in Kafka.

### Ingester is overloaded when consuming from Kafka

This runbook covers the case an ingester is overloaded when ingesting metrics data (consuming) from Kafka.
Expand Down
Loading

0 comments on commit 632cda5

Please sign in to comment.