Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jsonnet / Helm: improve distributors graceful shutdown #7361

Merged
merged 2 commits into from
Feb 12, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions operations/helm/charts/mimir-distributed/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ Entries should include a reference to the Pull Request that introduced the chang
* [ENHANCEMENT] Add `jaegerReporterMaxQueueSize` Helm value for all components where configuring `JAEGER_REPORTER_MAX_QUEUE_SIZE` makes sense, and override the Jaeger client's default value of 100 for components expected to generate many trace spans. #7068 #7086 #7259
* [ENHANCEMENT] Rollout-operator: upgraded to v0.10.1. #7125
* [ENHANCEMENT] Query-frontend: configured `-shutdown-delay`, `-server.grpc.keepalive.max-connection-age` and termination grace period to reduce the likelihood of queries hitting terminated query-frontends. #7129
* [ENHANCEMENT] Distributor: reduced `-server.grpc.keepalive.max-connection-age` from `2m` to `60s` and configured `-shutdown-delay` to `90s` in order to reduce the chances of failed gRPC write requests when distributors gracefully shutdown. #7361
* [ENHANCEMENT] Add the possibility to create a dedicated serviceAccount for the `ruler` component by setting `ruler.serivceAcount.create` to true in the values. #7132
* [ENHANCEMENT] nginx, Gateway: set `proxy_http_version: 1.1` to proxy to HTTP 1.1. #5040
* [ENHANCEMENT] Gateway: make Ingress/Route host templateable. #7218
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,10 +47,14 @@ spec:
- "-target=distributor"
- "-config.expand-env=true"
- "-config.file=/etc/mimir/mimir.yaml"
# Force gRPC clients connecting to distributor to reconnect periodically in order to have them re-resolve endpoints and discover new replicas.
- "-server.grpc.keepalive.max-connection-age=2m"
# When write requests go through distributors via gRPC, we want gRPC clients to re-resolve the distributors DNS
# endpoint before the distributor process is terminated, in order to avoid any failures during graceful shutdown.
# To achieve it, we set a shutdown delay greater than the gRPC max connection age, and we set an even higher
# termination grace period (to give some extra buffer to the process to gracefully shutdown).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Should we mention the termination grace period here, since it is actually not set in this yaml?

Copy link
Collaborator Author

@pracucci pracucci Feb 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You raised a very good point. I should have changed the termination grace period in Helm too. It wasn't detected by the linter because we currently skip the diff check for the termination grace period (I will try to work on it in a follow up PR).

Addressed in: 7c8a44c

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @pracucci

- "-server.grpc.keepalive.max-connection-age=60s"
- "-server.grpc.keepalive.max-connection-age-grace=5m"
- "-server.grpc.keepalive.max-connection-idle=1m"
- "-shutdown-delay=90s"
{{- if .Values.ingester.zoneAwareReplication.migration.enabled }}
{{- if not .Values.ingester.zoneAwareReplication.migration.writePath }}
- "-ingester.ring.zone-awareness-enabled=false"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,10 +54,14 @@ spec:
- "-target=distributor"
- "-config.expand-env=true"
- "-config.file=/etc/mimir/mimir.yaml"
# Force gRPC clients connecting to distributor to reconnect periodically in order to have them re-resolve endpoints and discover new replicas.
- "-server.grpc.keepalive.max-connection-age=2m"
# When write requests go through distributors via gRPC, we want gRPC clients to re-resolve the distributors DNS
# endpoint before the distributor process is terminated, in order to avoid any failures during graceful shutdown.
# To achieve it, we set a shutdown delay greater than the gRPC max connection age, and we set an even higher
# termination grace period (to give some extra buffer to the process to gracefully shutdown).
- "-server.grpc.keepalive.max-connection-age=60s"
- "-server.grpc.keepalive.max-connection-age-grace=5m"
- "-server.grpc.keepalive.max-connection-idle=1m"
- "-shutdown-delay=90s"
volumeMounts:

- mountPath: /certs
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,10 +54,14 @@ spec:
- "-target=distributor"
- "-config.expand-env=true"
- "-config.file=/etc/mimir/mimir.yaml"
# Force gRPC clients connecting to distributor to reconnect periodically in order to have them re-resolve endpoints and discover new replicas.
- "-server.grpc.keepalive.max-connection-age=2m"
# When write requests go through distributors via gRPC, we want gRPC clients to re-resolve the distributors DNS
# endpoint before the distributor process is terminated, in order to avoid any failures during graceful shutdown.
# To achieve it, we set a shutdown delay greater than the gRPC max connection age, and we set an even higher
# termination grace period (to give some extra buffer to the process to gracefully shutdown).
- "-server.grpc.keepalive.max-connection-age=60s"
- "-server.grpc.keepalive.max-connection-age-grace=5m"
- "-server.grpc.keepalive.max-connection-idle=1m"
- "-shutdown-delay=90s"
volumeMounts:
- name: config
mountPath: /etc/mimir
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,10 +54,14 @@ spec:
- "-target=distributor"
- "-config.expand-env=true"
- "-config.file=/etc/mimir/mimir.yaml"
# Force gRPC clients connecting to distributor to reconnect periodically in order to have them re-resolve endpoints and discover new replicas.
- "-server.grpc.keepalive.max-connection-age=2m"
# When write requests go through distributors via gRPC, we want gRPC clients to re-resolve the distributors DNS
# endpoint before the distributor process is terminated, in order to avoid any failures during graceful shutdown.
# To achieve it, we set a shutdown delay greater than the gRPC max connection age, and we set an even higher
# termination grace period (to give some extra buffer to the process to gracefully shutdown).
- "-server.grpc.keepalive.max-connection-age=60s"
- "-server.grpc.keepalive.max-connection-age-grace=5m"
- "-server.grpc.keepalive.max-connection-idle=1m"
- "-shutdown-delay=90s"
volumeMounts:
- name: config
mountPath: /etc/mimir
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,10 +54,14 @@ spec:
- "-target=distributor"
- "-config.expand-env=true"
- "-config.file=/etc/mimir/mimir.yaml"
# Force gRPC clients connecting to distributor to reconnect periodically in order to have them re-resolve endpoints and discover new replicas.
- "-server.grpc.keepalive.max-connection-age=2m"
# When write requests go through distributors via gRPC, we want gRPC clients to re-resolve the distributors DNS
# endpoint before the distributor process is terminated, in order to avoid any failures during graceful shutdown.
# To achieve it, we set a shutdown delay greater than the gRPC max connection age, and we set an even higher
# termination grace period (to give some extra buffer to the process to gracefully shutdown).
- "-server.grpc.keepalive.max-connection-age=60s"
- "-server.grpc.keepalive.max-connection-age-grace=5m"
- "-server.grpc.keepalive.max-connection-idle=1m"
- "-shutdown-delay=90s"
volumeMounts:
- name: config
mountPath: /etc/mimir
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,10 +54,14 @@ spec:
- "-target=distributor"
- "-config.expand-env=true"
- "-config.file=/etc/mimir/mimir.yaml"
# Force gRPC clients connecting to distributor to reconnect periodically in order to have them re-resolve endpoints and discover new replicas.
- "-server.grpc.keepalive.max-connection-age=2m"
# When write requests go through distributors via gRPC, we want gRPC clients to re-resolve the distributors DNS
# endpoint before the distributor process is terminated, in order to avoid any failures during graceful shutdown.
# To achieve it, we set a shutdown delay greater than the gRPC max connection age, and we set an even higher
# termination grace period (to give some extra buffer to the process to gracefully shutdown).
- "-server.grpc.keepalive.max-connection-age=60s"
- "-server.grpc.keepalive.max-connection-age-grace=5m"
- "-server.grpc.keepalive.max-connection-idle=1m"
- "-shutdown-delay=90s"
volumeMounts:
- name: config
mountPath: /etc/mimir
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,10 +54,14 @@ spec:
- "-target=distributor"
- "-config.expand-env=true"
- "-config.file=/etc/mimir/mimir.yaml"
# Force gRPC clients connecting to distributor to reconnect periodically in order to have them re-resolve endpoints and discover new replicas.
- "-server.grpc.keepalive.max-connection-age=2m"
# When write requests go through distributors via gRPC, we want gRPC clients to re-resolve the distributors DNS
# endpoint before the distributor process is terminated, in order to avoid any failures during graceful shutdown.
# To achieve it, we set a shutdown delay greater than the gRPC max connection age, and we set an even higher
# termination grace period (to give some extra buffer to the process to gracefully shutdown).
- "-server.grpc.keepalive.max-connection-age=60s"
- "-server.grpc.keepalive.max-connection-age-grace=5m"
- "-server.grpc.keepalive.max-connection-idle=1m"
- "-shutdown-delay=90s"
volumeMounts:
- name: config
mountPath: /etc/mimir
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,10 +51,14 @@ spec:
- "-target=distributor"
- "-config.expand-env=true"
- "-config.file=/etc/mimir/mimir.yaml"
# Force gRPC clients connecting to distributor to reconnect periodically in order to have them re-resolve endpoints and discover new replicas.
- "-server.grpc.keepalive.max-connection-age=2m"
# When write requests go through distributors via gRPC, we want gRPC clients to re-resolve the distributors DNS
# endpoint before the distributor process is terminated, in order to avoid any failures during graceful shutdown.
# To achieve it, we set a shutdown delay greater than the gRPC max connection age, and we set an even higher
# termination grace period (to give some extra buffer to the process to gracefully shutdown).
- "-server.grpc.keepalive.max-connection-age=60s"
- "-server.grpc.keepalive.max-connection-age-grace=5m"
- "-server.grpc.keepalive.max-connection-idle=1m"
- "-shutdown-delay=90s"
volumeMounts:
- name: config
mountPath: /etc/mimir
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,10 +54,14 @@ spec:
- "-target=distributor"
- "-config.expand-env=true"
- "-config.file=/etc/mimir/mimir.yaml"
# Force gRPC clients connecting to distributor to reconnect periodically in order to have them re-resolve endpoints and discover new replicas.
- "-server.grpc.keepalive.max-connection-age=2m"
# When write requests go through distributors via gRPC, we want gRPC clients to re-resolve the distributors DNS
# endpoint before the distributor process is terminated, in order to avoid any failures during graceful shutdown.
# To achieve it, we set a shutdown delay greater than the gRPC max connection age, and we set an even higher
# termination grace period (to give some extra buffer to the process to gracefully shutdown).
- "-server.grpc.keepalive.max-connection-age=60s"
- "-server.grpc.keepalive.max-connection-age-grace=5m"
- "-server.grpc.keepalive.max-connection-idle=1m"
- "-shutdown-delay=90s"
volumeMounts:
- name: config
mountPath: /etc/mimir
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,10 +54,14 @@ spec:
- "-target=distributor"
- "-config.expand-env=true"
- "-config.file=/etc/mimir/mimir.yaml"
# Force gRPC clients connecting to distributor to reconnect periodically in order to have them re-resolve endpoints and discover new replicas.
- "-server.grpc.keepalive.max-connection-age=2m"
# When write requests go through distributors via gRPC, we want gRPC clients to re-resolve the distributors DNS
# endpoint before the distributor process is terminated, in order to avoid any failures during graceful shutdown.
# To achieve it, we set a shutdown delay greater than the gRPC max connection age, and we set an even higher
# termination grace period (to give some extra buffer to the process to gracefully shutdown).
- "-server.grpc.keepalive.max-connection-age=60s"
- "-server.grpc.keepalive.max-connection-age-grace=5m"
- "-server.grpc.keepalive.max-connection-idle=1m"
- "-shutdown-delay=90s"
volumeMounts:
- name: config
mountPath: /etc/mimir
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,10 +55,14 @@ spec:
- "-target=distributor"
- "-config.expand-env=true"
- "-config.file=/etc/mimir/mimir.yaml"
# Force gRPC clients connecting to distributor to reconnect periodically in order to have them re-resolve endpoints and discover new replicas.
- "-server.grpc.keepalive.max-connection-age=2m"
# When write requests go through distributors via gRPC, we want gRPC clients to re-resolve the distributors DNS
# endpoint before the distributor process is terminated, in order to avoid any failures during graceful shutdown.
# To achieve it, we set a shutdown delay greater than the gRPC max connection age, and we set an even higher
# termination grace period (to give some extra buffer to the process to gracefully shutdown).
- "-server.grpc.keepalive.max-connection-age=60s"
- "-server.grpc.keepalive.max-connection-age-grace=5m"
- "-server.grpc.keepalive.max-connection-idle=1m"
- "-shutdown-delay=90s"
volumeMounts:
- name: config
mountPath: /etc/mimir
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,10 +54,14 @@ spec:
- "-target=distributor"
- "-config.expand-env=true"
- "-config.file=/etc/mimir/mimir.yaml"
# Force gRPC clients connecting to distributor to reconnect periodically in order to have them re-resolve endpoints and discover new replicas.
- "-server.grpc.keepalive.max-connection-age=2m"
# When write requests go through distributors via gRPC, we want gRPC clients to re-resolve the distributors DNS
# endpoint before the distributor process is terminated, in order to avoid any failures during graceful shutdown.
# To achieve it, we set a shutdown delay greater than the gRPC max connection age, and we set an even higher
# termination grace period (to give some extra buffer to the process to gracefully shutdown).
- "-server.grpc.keepalive.max-connection-age=60s"
- "-server.grpc.keepalive.max-connection-age-grace=5m"
- "-server.grpc.keepalive.max-connection-idle=1m"
- "-shutdown-delay=90s"
volumeMounts:
- name: config
mountPath: /etc/mimir
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,10 +51,14 @@ spec:
- "-target=distributor"
- "-config.expand-env=true"
- "-config.file=/etc/mimir/mimir.yaml"
# Force gRPC clients connecting to distributor to reconnect periodically in order to have them re-resolve endpoints and discover new replicas.
- "-server.grpc.keepalive.max-connection-age=2m"
# When write requests go through distributors via gRPC, we want gRPC clients to re-resolve the distributors DNS
# endpoint before the distributor process is terminated, in order to avoid any failures during graceful shutdown.
# To achieve it, we set a shutdown delay greater than the gRPC max connection age, and we set an even higher
# termination grace period (to give some extra buffer to the process to gracefully shutdown).
- "-server.grpc.keepalive.max-connection-age=60s"
- "-server.grpc.keepalive.max-connection-age-grace=5m"
- "-server.grpc.keepalive.max-connection-idle=1m"
- "-shutdown-delay=90s"
volumeMounts:
- name: config
mountPath: /etc/mimir
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,10 +54,14 @@ spec:
- "-target=distributor"
- "-config.expand-env=true"
- "-config.file=/etc/mimir/mimir.yaml"
# Force gRPC clients connecting to distributor to reconnect periodically in order to have them re-resolve endpoints and discover new replicas.
- "-server.grpc.keepalive.max-connection-age=2m"
# When write requests go through distributors via gRPC, we want gRPC clients to re-resolve the distributors DNS
# endpoint before the distributor process is terminated, in order to avoid any failures during graceful shutdown.
# To achieve it, we set a shutdown delay greater than the gRPC max connection age, and we set an even higher
# termination grace period (to give some extra buffer to the process to gracefully shutdown).
- "-server.grpc.keepalive.max-connection-age=60s"
- "-server.grpc.keepalive.max-connection-age-grace=5m"
- "-server.grpc.keepalive.max-connection-idle=1m"
- "-shutdown-delay=90s"
volumeMounts:
- name: config
mountPath: /etc/mimir
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,10 +54,14 @@ spec:
- "-target=distributor"
- "-config.expand-env=true"
- "-config.file=/etc/mimir/mimir.yaml"
# Force gRPC clients connecting to distributor to reconnect periodically in order to have them re-resolve endpoints and discover new replicas.
- "-server.grpc.keepalive.max-connection-age=2m"
# When write requests go through distributors via gRPC, we want gRPC clients to re-resolve the distributors DNS
# endpoint before the distributor process is terminated, in order to avoid any failures during graceful shutdown.
# To achieve it, we set a shutdown delay greater than the gRPC max connection age, and we set an even higher
# termination grace period (to give some extra buffer to the process to gracefully shutdown).
- "-server.grpc.keepalive.max-connection-age=60s"
- "-server.grpc.keepalive.max-connection-age-grace=5m"
- "-server.grpc.keepalive.max-connection-idle=1m"
- "-shutdown-delay=90s"
volumeMounts:
- name: config
mountPath: /etc/mimir
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,10 +54,14 @@ spec:
- "-target=distributor"
- "-config.expand-env=true"
- "-config.file=/etc/mimir/mimir.yaml"
# Force gRPC clients connecting to distributor to reconnect periodically in order to have them re-resolve endpoints and discover new replicas.
- "-server.grpc.keepalive.max-connection-age=2m"
# When write requests go through distributors via gRPC, we want gRPC clients to re-resolve the distributors DNS
# endpoint before the distributor process is terminated, in order to avoid any failures during graceful shutdown.
# To achieve it, we set a shutdown delay greater than the gRPC max connection age, and we set an even higher
# termination grace period (to give some extra buffer to the process to gracefully shutdown).
- "-server.grpc.keepalive.max-connection-age=60s"
- "-server.grpc.keepalive.max-connection-age-grace=5m"
- "-server.grpc.keepalive.max-connection-idle=1m"
- "-shutdown-delay=90s"
volumeMounts:
- name: config
mountPath: /etc/mimir
Expand Down
Loading
Loading