Skip to content

Commit

Permalink
Jsonnet: add support to autoscale ruler-querier replicas based on in-…
Browse files Browse the repository at this point in the history
…flight queries (#8060)

Signed-off-by: Marco Pracucci <marco@pracucci.com>
  • Loading branch information
pracucci authored May 6, 2024
1 parent 5f2d280 commit 4b38a8b
Show file tree
Hide file tree
Showing 5 changed files with 44 additions and 1 deletion.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@
* [ENHANCEMENT] Shuffle-sharding: add `$._config.shuffle_sharding.ingest_storage_partitions_enabled` and `$._config.shuffle_sharding.ingester_partitions_shard_size` options, that allow configuring partitions shard size in ingest-storage mode. #7804
* [ENHANCEMENT] Rollout-operator: upgrade to v0.14.0.
* [ENHANCEMENT] Add `_config.autoscaling_querier_predictive_scaling_enabled` to scale querier based on inflight queries 7 days ago. #7775
* [ENHANCEMENT] Add support to autoscale ruler-querier replicas based on in-flight queries too (in addition to CPU and memory based scaling). #8060
* [BUGFIX] Guard against missing samples in KEDA queries. #7691

### Mimirtool
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2382,6 +2382,14 @@ spec:
threshold: "955630223"
name: ruler_querier_memory_hpa_default
type: prometheus
- metadata:
ignoreNullValues: "false"
metricName: cortex_ruler_querier_queries_hpa_default
query: sum(max_over_time(cortex_query_scheduler_inflight_requests{container="ruler-query-scheduler",namespace="default",quantile="0.5"}[1m]))
serverAddress: http://prometheus.default:9090/prometheus
threshold: "7"
name: cortex_ruler_querier_queries_hpa_default
type: prometheus
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
autoscaling_querier_target_utilization: targetUtilization,
autoscaling_ruler_querier_cpu_target_utilization: targetUtilization,
autoscaling_ruler_querier_memory_target_utilization: targetUtilization,
autoscaling_ruler_querier_workers_target_utilization: targetUtilization,
autoscaling_distributor_cpu_target_utilization: targetUtilization,
autoscaling_distributor_memory_target_utilization: targetUtilization,
autoscaling_ruler_cpu_target_utilization: targetUtilization,
Expand Down
8 changes: 8 additions & 0 deletions operations/mimir-tests/test-autoscaling-generated.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2382,6 +2382,14 @@ spec:
threshold: "1073741824"
name: ruler_querier_memory_hpa_default
type: prometheus
- metadata:
ignoreNullValues: "false"
metricName: cortex_ruler_querier_queries_hpa_default
query: sum(max_over_time(cortex_query_scheduler_inflight_requests{container="ruler-query-scheduler",namespace="default",quantile="0.5"}[1m]))
serverAddress: http://prometheus.default:9090/prometheus
threshold: "6"
name: cortex_ruler_querier_queries_hpa_default
type: prometheus
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
Expand Down
27 changes: 26 additions & 1 deletion operations/mimir/autoscaling.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
autoscaling_ruler_querier_max_replicas: error 'you must set autoscaling_ruler_querier_max_replicas in the _config',
autoscaling_ruler_querier_cpu_target_utilization: 1,
autoscaling_ruler_querier_memory_target_utilization: 1,
autoscaling_ruler_querier_workers_target_utilization: 0.75, // Target to utilize 75% ruler-querier workers on peak traffic, so we have 25% room for higher peaks.

autoscaling_distributor_enabled: false,
autoscaling_distributor_min_replicas: error 'you must set autoscaling_distributor_min_replicas in the _config',
Expand Down Expand Up @@ -342,6 +343,7 @@
with_cortex_prefix=false,
weight=1,
scale_down_period=null,
extra_triggers=[],
):: self.newScaledObject(
name, $._config.namespace, {
min_replica_count: replicasWithWeight(min_replicas, weight),
Expand Down Expand Up @@ -380,7 +382,7 @@
// up or down unexpectedly. See https://keda.sh/docs/2.13/scalers/prometheus/ for more info.
ignore_null_values: false,
},
],
] + extra_triggers,
},
),

Expand Down Expand Up @@ -471,6 +473,29 @@
max_replicas=$._config.autoscaling_ruler_querier_max_replicas,
cpu_target_utilization=$._config.autoscaling_ruler_querier_cpu_target_utilization,
memory_target_utilization=$._config.autoscaling_ruler_querier_memory_target_utilization,
extra_triggers=if $._config.autoscaling_ruler_querier_workers_target_utilization <= 0 then [] else [
{
local name = 'ruler-querier-queries',
local querier_max_concurrent = $.ruler_querier_args['querier.max-concurrent'],

metric_name: 'cortex_%s_hpa_%s' % [std.strReplace(name, '-', '_'), $._config.namespace],

// Each ruler-query-scheduler tracks *at regular intervals* the number of inflight requests
// (both enqueued and processing queries) as a summary. With the following query we target
// to have enough querier workers to run the max observed inflight requests 50% of time.
//
// This metric covers the case queries are piling up in the ruler-query-scheduler queue,
// but ruler-querier replicas are not scaled up by other scaling metrics (e.g. CPU and memory)
// because resources utilization is not increasing significantly.
query: 'sum(max_over_time(cortex_query_scheduler_inflight_requests{container="ruler-query-scheduler",namespace="%s",quantile="0.5"}[1m]))' % [$._config.namespace],

threshold: '%d' % std.floor(querier_max_concurrent * $._config.autoscaling_ruler_querier_workers_target_utilization),

// Do not let KEDA use the value "0" as scaling metric if the query returns no result
// (e.g. query-scheduler is crashing).
ignore_null_values: false,
},
],
),

ruler_querier_deployment: overrideSuperIfExists(
Expand Down

0 comments on commit 4b38a8b

Please sign in to comment.