Mimir HPA/autoscaling #3379

QuentinBisson · 2024-04-02T13:01:51Z

Motivation

One reason that we switched to mimir is because it's way better when it comes to scalability and reliability. To fully use that advantage though we have to add autoscaling to our mimir instances. So let's learn how to do this.

Todo

introduce autoscaling for mimir to scale horizontally
check for feasible thresholds and limits and test it

Motivation

We know how to setup autoscaling for mimir and what makes sense and what doesn't to achieve Mimir High Availability.
We have a working PoC of autoscaling mimir that we can decide to keep or scrap.

QuantumEnigmaa · 2024-04-05T09:56:52Z

As mentioned in this issue, Mimir currently doesn't have any hpa in the chart except for the gateway (which is stateless). It does support Keda though.

Currently the scale down process must be down manually by removing the desired replica from the ingress so that it doesn't receive any new data, flush all its data to the object storage and then finally delete it. But as all manual process this could take longer than expected, especially if we happen to be in a situation where we have to regularly upscale / downscale the workload on several installations.

Because of this, I would suggest we contribute upstream to add hpa for the ingesters in the right way by taking example on the way it was done for loki : grafana/loki#8684

QuantumEnigmaa · 2024-04-08T12:35:16Z

PR for the distributor created : grafana/mimir#7839

Now heading for the ingesters PR

QuantumEnigmaa · 2024-04-08T14:22:41Z

PR for the ingester : grafana/mimir#7843

QuantumEnigmaa · 2024-04-10T11:35:41Z

PR for the querier : grafana/mimir#7870

QuantumEnigmaa · 2024-04-25T10:13:51Z

Updates concerning the upstream PRs :

Ingester : the PR is blocked due to the fact that other contributors are still laying the foundations needed to perform horizontal autoscaling for the ingester. Therefore it's not possible currently to add hpa for the ingester and we'll need to wait for the advancement of the coding part.
querier : the PR was rejected because according to upstream maintainers, as Keda is already proposed as an horizontal autoscaling solution, adding yet another autoscaling solution in the form of the basic hpa is not necessary and would only come with additional maintenance work for upstream. Therefore they prefer users to rely on Keda when it comes to horizontal autoscaling for this component.
distributor : no activity on this PR but as the component is similar to the querier in terms of autoscaling (i.e already relying on keda), I guess the outcome will be the same.

Concerning the gateway component though, as the basic hpa is already supported upstream, I enabled it on golem to check whether its deployment would go smoothly and it appears to be the case. However, if we want to be sure we can rely on it, we should perform some load testing on the gateway while the hpa is deployed. @Rotfuks is taking care of creating a dedicated issue.

QuantumEnigmaa · 2024-05-13T09:36:50Z

Since upstream PRs have been rejected, we decided to create our own hpas for the distributor and the querier and have those enabled by default.

QuentinBisson mentioned this issue Apr 2, 2024

Setup Mimir on CAPA installation #3039

Closed

Rotfuks added the team/atlas Team Atlas label Apr 3, 2024

QuantumEnigmaa self-assigned this Apr 8, 2024

QuentinBisson added the blocked-upstream label Apr 15, 2024

QuantumEnigmaa closed this as completed May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mimir HPA/autoscaling #3379

Mimir HPA/autoscaling #3379

QuentinBisson commented Apr 2, 2024 •

edited by Rotfuks

Loading

QuantumEnigmaa commented Apr 5, 2024

QuantumEnigmaa commented Apr 8, 2024

QuantumEnigmaa commented Apr 8, 2024

QuantumEnigmaa commented Apr 10, 2024

QuantumEnigmaa commented Apr 25, 2024

QuantumEnigmaa commented May 13, 2024

Mimir HPA/autoscaling #3379

Mimir HPA/autoscaling #3379

Comments

QuentinBisson commented Apr 2, 2024 • edited by Rotfuks Loading

Motivation

Todo

Motivation

QuantumEnigmaa commented Apr 5, 2024

QuantumEnigmaa commented Apr 8, 2024

QuantumEnigmaa commented Apr 8, 2024

QuantumEnigmaa commented Apr 10, 2024

QuantumEnigmaa commented Apr 25, 2024

QuantumEnigmaa commented May 13, 2024

QuentinBisson commented Apr 2, 2024 •

edited by Rotfuks

Loading