Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: add support for not setting replicas helm chart #6373

Conversation

jmichalek132
Copy link
Contributor

@jmichalek132 jmichalek132 commented Oct 13, 2023

What this PR does

Introduces support for not setting replicas, specifically so you can use autoscaling with components which are safe to autoscale.

As I mentioned here #3430 (comment) I did some testing,

I started with

  {{- if ne .Values.distributor.replicas "null"  }}
  replicas: {{ .Values.distributor.replicas }}
  {{- end }}

but helm complains when it's not null:

Error: template: mimir-distributed/templates/distributor/distributor-dep.yaml:11:9: executing "mimir-distributed/templates/distributor/distributor-dep.yaml" at <ne .Values.distributor.replicas "null">: error calling ne: incompatible types for comparison

So I tried:

  {{- if .Values.distributor.replicas }}
  replicas: {{ .Values.distributor.replicas }}
  {{- end }}

when replicas is a number i.e. the default 1 it works.
When it's set to null the replicas is not set, good.
But when it's set to 0 it's also unset. Now this is not a common case imho but it would break.
Would that be okay or should I look for another way to do it?

Which issue(s) this PR fixes or relates to

Fixes # #3430 (comment)

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@jmichalek132
Copy link
Contributor Author

So opened the draft PR as you requested @dimitarvdimitrov. For reference this is how loki helm chart handles it https://github.com/grafana/helm-charts/blob/main/charts/loki-distributed/templates/ingester/statefulset-ingester.yaml#L15
which is an option too, but would require some documentation about the fact that setting it to enabled will just allow you to add the HPA but will not create it.

@dimitarvdimitrov
Copy link
Contributor

but would require some documentation about the fact that setting it to enabled will just allow you to add the HPA but will not create it.

thanks for pointing this out. I think having autoscaling.enabled: true, but not actually creating any VPA/HPA, is counterintuitive. I prefer the path that you've taken in this PR

Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current approach LGTM.

Are we doing this only for the distributor deployment? I think the next steps are to document the behaviour in the values.yaml and add maybe a comment over that complex if expression to explain the purpose of it (especially the check for int64 and float64, which isn't trivial). And finally a Changelog entry in the operations/helm/..../CHANGELOG.md

@jmichalek132
Copy link
Contributor Author

jmichalek132 commented Oct 26, 2023

The current approach LGTM.

Are we doing this only for the distributor deployment? I think the next steps are to document the behaviour in the values.yaml and add maybe a comment over that complex if expression to explain the purpose of it (especially the check for int64 and float64, which isn't trivial). And finally a Changelog entry in the operations/helm/..../CHANGELOG.md

Will do, I was planning to add it for distributor, querier, and query-frontend. Just wanted to get the logic right first.

@jmichalek132 jmichalek132 force-pushed the jm-chore-add-support-for-not-setting-replicas-helm-chart branch from 61f6092 to 285ad25 Compare October 27, 2023 12:46
@jmichalek132 jmichalek132 marked this pull request as ready for review October 27, 2023 12:50
@jmichalek132 jmichalek132 requested a review from a team as a code owner October 27, 2023 12:50
@jmichalek132
Copy link
Contributor Author

Looking into the faild pipeline.

@jmichalek132
Copy link
Contributor Author

Looking into the faild pipeline.

Seems to be because I forgot to do this:
A PR check will fail if you forget to update the compiled manifests, and you can use make build-helm-tests to update them and make check-helm-tests to verify them after commit. The PR check will also fail if static checks on the intermediate YAML manifests fail.
But when I try to run make build-helm-tests it fails for me with:

test-oss-topology-spread-constraints-values (stderr): coalesce.go:286: warning: cannot overwrite table with non table for mimir-distributed.query_frontend.topologySpreadConstraints (map[maxSkew:1 topologyKey:kubernetes.io/hostname whenUnsatisfiable:ScheduleAnyway])
enterprise-https-values (stderr): /bin/sed: couldn't open temporary file operations/helm/tests/enterprise-https-values-generated/mimir-distributed/templates/sedcXB2Yq: Permission denied
scheduler-name-values (stderr): /bin/sed: couldn't open temporary file operations/helm/tests/scheduler-name-values-generated/mimir-distributed/templates/sedqmIEjb: Permission denied
small-values (stderr): /bin/sed: couldn't open temporary file operations/helm/tests/small-values-generated/mimir-distributed/templates/sed1mskPf: Permission denied
large-values (stderr): /bin/sed: couldn't open temporary file operations/helm/tests/large-values-generated/mimir-distributed/templates/sedjqmbd2: Permission denied
openshift-values (stderr): /bin/sed: couldn't open temporary file operations/helm/tests/openshift-values-generated/mimir-distributed/templates/sedc4TZO5: Permission denied
test-oss-values (stderr): /bin/sed: couldn't open temporary file operations/helm/tests/test-oss-values-generated/mimir-distributed/templates/sedNdPRYl: Permission denied
test-enterprise-configmap-values (stderr): /bin/sed: couldn't open temporary file operations/helm/tests/test-enterprise-configmap-values-generated/mimir-distributed/templates/sedoJVpx4: Permission denied
graphite-enabled-values (stderr): /bin/sed: couldn't open temporary file operations/helm/tests/graphite-enabled-values-generated/mimir-distributed/templates/sed1x1fq8: Permission denied
test-enterprise-k8s-1.25-values (stderr): /bin/sed: couldn't open temporary file operations/helm/tests/test-enterprise-k8s-1.25-values-generated/mimir-distributed/templates/sedOJUrqC: Permission denied
test-enterprise-legacy-label-values (stderr): /bin/sed: couldn't open temporary file operations/helm/tests/test-enterprise-legacy-label-values-generated/mimir-distributed/templates/sedWFsqWq: Permission denied
gateway-enterprise-values (stderr): /bin/sed: couldn't open temporary file operations/helm/tests/gateway-enterprise-values-generated/mimir-distributed/templates/sedw7HS2q: Permission denied
test-enterprise-values (stderr): /bin/sed: couldn't open temporary file operations/helm/tests/test-enterprise-values-generated/mimir-distributed/templates/sedEWPqEa: Permission denied
test-oss-topology-spread-constraints-values (stderr): /bin/sed: couldn't open temporary file operations/helm/tests/test-oss-topology-spread-constraints-values-generated/mimir-distributed/templates/sedH4MtHc: Permission denied
gateway-nginx-values (stderr): /bin/sed: couldn't open temporary file operations/helm/tests/gateway-nginx-values-generated/mimir-distributed/templates/sedch06mB: Permission denied
test-oss-k8s-1.25-values (stderr): /bin/sed: couldn't open temporary file operations/helm/tests/test-oss-k8s-1.25-values-generated/mimir-distributed/templates/overrides-exporter/sedKcEHS5: Permission denied
test-vault-agent-values (stderr): /bin/sed: couldn't open temporary file operations/helm/tests/test-vault-agent-values-generated/mimir-distributed/templates/sed72x8dj: Permission denied
test-oss-logical-multizone-values (stderr): /bin/sed: couldn't open temporary file operations/helm/tests/test-oss-logical-multizone-values-generated/mimir-distributed/templates/sedqsplbA: Permission denied
test-oss-multizone-values (stderr): /bin/sed: couldn't open temporary file operations/helm/tests/test-oss-multizone-values-generated/mimir-distributed/templates/sedIeTLQY: Permission denied
metamonitoring-values (stderr): /bin/sed: couldn't open temporary file operations/helm/tests/metamonitoring-values-generated/mimir-distributed/templates/sedr870Ig: Permission denied
helm template PID 58 exited with non-zero exit code 123. Aborting.
make: *** [Makefile:495: build-helm-tests] Error 123
make: Leaving directory '/go/src/github.com/grafana/mimir'

real    0m12.229s
user    0m0.034s
sys     0m0.028s
make: *** [build-helm-tests] Error 2

Looking into it more.

@jmichalek132
Copy link
Contributor Author

Looking into the faild pipeline.

Seems to be because I forgot to do this: A PR check will fail if you forget to update the compiled manifests, and you can use make build-helm-tests to update them and make check-helm-tests to verify them after commit. The PR check will also fail if static checks on the intermediate YAML manifests fail. But when I try to run make build-helm-tests it fails for me with:

test-oss-topology-spread-constraints-values (stderr): coalesce.go:286: warning: cannot overwrite table with non table for mimir-distributed.query_frontend.topologySpreadConstraints (map[maxSkew:1 topologyKey:kubernetes.io/hostname whenUnsatisfiable:ScheduleAnyway])
enterprise-https-values (stderr): /bin/sed: couldn't open temporary file operations/helm/tests/enterprise-https-values-generated/mimir-distributed/templates/sedcXB2Yq: Permission denied
scheduler-name-values (stderr): /bin/sed: couldn't open temporary file operations/helm/tests/scheduler-name-values-generated/mimir-distributed/templates/sedqmIEjb: Permission denied
small-values (stderr): /bin/sed: couldn't open temporary file operations/helm/tests/small-values-generated/mimir-distributed/templates/sed1mskPf: Permission denied
large-values (stderr): /bin/sed: couldn't open temporary file operations/helm/tests/large-values-generated/mimir-distributed/templates/sedjqmbd2: Permission denied
openshift-values (stderr): /bin/sed: couldn't open temporary file operations/helm/tests/openshift-values-generated/mimir-distributed/templates/sedc4TZO5: Permission denied
test-oss-values (stderr): /bin/sed: couldn't open temporary file operations/helm/tests/test-oss-values-generated/mimir-distributed/templates/sedNdPRYl: Permission denied
test-enterprise-configmap-values (stderr): /bin/sed: couldn't open temporary file operations/helm/tests/test-enterprise-configmap-values-generated/mimir-distributed/templates/sedoJVpx4: Permission denied
graphite-enabled-values (stderr): /bin/sed: couldn't open temporary file operations/helm/tests/graphite-enabled-values-generated/mimir-distributed/templates/sed1x1fq8: Permission denied
test-enterprise-k8s-1.25-values (stderr): /bin/sed: couldn't open temporary file operations/helm/tests/test-enterprise-k8s-1.25-values-generated/mimir-distributed/templates/sedOJUrqC: Permission denied
test-enterprise-legacy-label-values (stderr): /bin/sed: couldn't open temporary file operations/helm/tests/test-enterprise-legacy-label-values-generated/mimir-distributed/templates/sedWFsqWq: Permission denied
gateway-enterprise-values (stderr): /bin/sed: couldn't open temporary file operations/helm/tests/gateway-enterprise-values-generated/mimir-distributed/templates/sedw7HS2q: Permission denied
test-enterprise-values (stderr): /bin/sed: couldn't open temporary file operations/helm/tests/test-enterprise-values-generated/mimir-distributed/templates/sedEWPqEa: Permission denied
test-oss-topology-spread-constraints-values (stderr): /bin/sed: couldn't open temporary file operations/helm/tests/test-oss-topology-spread-constraints-values-generated/mimir-distributed/templates/sedH4MtHc: Permission denied
gateway-nginx-values (stderr): /bin/sed: couldn't open temporary file operations/helm/tests/gateway-nginx-values-generated/mimir-distributed/templates/sedch06mB: Permission denied
test-oss-k8s-1.25-values (stderr): /bin/sed: couldn't open temporary file operations/helm/tests/test-oss-k8s-1.25-values-generated/mimir-distributed/templates/overrides-exporter/sedKcEHS5: Permission denied
test-vault-agent-values (stderr): /bin/sed: couldn't open temporary file operations/helm/tests/test-vault-agent-values-generated/mimir-distributed/templates/sed72x8dj: Permission denied
test-oss-logical-multizone-values (stderr): /bin/sed: couldn't open temporary file operations/helm/tests/test-oss-logical-multizone-values-generated/mimir-distributed/templates/sedqsplbA: Permission denied
test-oss-multizone-values (stderr): /bin/sed: couldn't open temporary file operations/helm/tests/test-oss-multizone-values-generated/mimir-distributed/templates/sedIeTLQY: Permission denied
metamonitoring-values (stderr): /bin/sed: couldn't open temporary file operations/helm/tests/metamonitoring-values-generated/mimir-distributed/templates/sedr870Ig: Permission denied
helm template PID 58 exited with non-zero exit code 123. Aborting.
make: *** [Makefile:495: build-helm-tests] Error 123
make: Leaving directory '/go/src/github.com/grafana/mimir'

real    0m12.229s
user    0m0.034s
sys     0m0.028s
make: *** [build-helm-tests] Error 2

Looking into it more.

Turns out it was caused by this, disabling it helped.

Screenshot 2023-10-28 at 22 32 06

@@ -8,7 +8,10 @@ metadata:
{{- toYaml .Values.distributor.annotations | nindent 4 }}
namespace: {{ .Release.Namespace | quote }}
spec:
# If replicas is not number (when using values file it's float64, when using --set arg it's int64) and is false (i.e. null) don't set it
{{- if or (or (kindIs "int64" .Values.distributor.replicas) (kindIs "float64" .Values.distributor.replicas)) (.Values.distributor.replicas) }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be better to have this in a named template, so we don't repeat ourselves, but I think I can address this in a follow-up PR

Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the contribution and I appreciate the patience 🙂

@dimitarvdimitrov dimitarvdimitrov merged commit 1cdfe2c into grafana:main Nov 1, 2023
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants