Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] helm chart: add readinessProbe and let probes be configurable #1242

Merged
merged 1 commit into from
Jan 9, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 16 additions & 3 deletions helm-chart/binderhub/templates/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -121,10 +121,23 @@ spec:
ports:
- containerPort: 8585
name: binder
{{- if .Values.deployment.readinessProbe.enabled }}
readinessProbe:
httpGet:
path: {{ .Values.config.BinderHub.base_url | default "/" }}versions
port: binder
initialDelaySeconds: {{ .Values.deployment.readinessProbe.initialDelaySeconds }}
periodSeconds: {{ .Values.deployment.readinessProbe.periodSeconds }}
timeoutSeconds: {{ .Values.deployment.readinessProbe.timeoutSeconds }}
failureThreshold: {{ .Values.deployment.readinessProbe.failureThreshold }}
{{- end }}
{{- if .Values.deployment.livenessProbe.enabled }}
livenessProbe:
httpGet:
path: {{ .Values.config.BinderHub.base_url | default "/" }}versions
port: binder
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 10
initialDelaySeconds: {{ .Values.deployment.livenessProbe.initialDelaySeconds }}
periodSeconds: {{ .Values.deployment.livenessProbe.periodSeconds }}
timeoutSeconds: {{ .Values.deployment.livenessProbe.timeoutSeconds }}
failureThreshold: {{ .Values.deployment.livenessProbe.failureThreshold }}
{{- end }}
12 changes: 12 additions & 0 deletions helm-chart/binderhub/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,18 @@ jupyterhub:
enabled: false

deployment:
readinessProbe:
enabled: true
initialDelaySeconds: 0
periodSeconds: 5
failureThreshold: 1000 # we rely on the liveness probe to resolve issues if needed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my education: what does the comment mean/try to say? Why not use a startup probe which seems to be made for exactly our case of "pod can take time to start up"?

I think setting the failure threshold to 1000 means that the readiness probe will never(*) become "not ready" again after it becomes ready for the first time.

Why not use an initial delay similar to the liveness probe to cover the expected time the process needs to start up (I think the initial delay only starts counting after the container image has been pulled but couldn't confirm that).

(*) in reality it would become "not ready" if the probe fails for ~5000s, so about 1.3hours

Copy link
Member Author

@consideRatio consideRatio Jan 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think setting the failure threshold to 1000 means that the readiness probe will never(*) become "not ready" again after it becomes ready for the first time.

Yepp!

Why not use an initial delay similar to the liveness probe to cover the expected time the process needs to start up (I think the initial delay only starts counting after the container image has been pulled but couldn't confirm that).

An initial delay of the readinessProbe would delay the first probe. If you know that the pod startup time is between 1-2 seconds, then it could make sense to set the initial delay to 2 seconds for example in order to optimize the speed to become ready. If you would set the startup time to 1 second, and it typically takes 1-2 seconds to startup the container, then you delayed the speed of getting the pod Ready initially.


readinessProbe

  • Required to make a pod Ready.
  • A failure of this -> the container is becoming Unready
  • Ready is required for the pod to receive traffic from k8s Services
  • Ready is required for kubectl rollout status --wait checks of a deployment resource.
  • A readinessProbe with a very lrage failureThreshold makes it act as a delay to become Ready during startup, but will then always remain Ready.
  • To let a readinessProbe fail following having become ready, the purpose would be to temporarily remove it from receiving traffic from k8s Services, perhaps to recover or similar, but this is only useful in specific situations I don't think is applicable to our Binder pod.

livenessProbe

  • A failure of this -> the container is getting restarted

startupProbe

  • k8s 1.16+ required
  • doesn't influence the pod's Ready status
  • serves the purpose only to complement the livenessProbe for slow startups, see this docs

Meaning of comment?
I meant to convey that we rely on the livenessProbe's functionality to recover from bad states, not by simply making the pod Unready which can be unhelpful.

Reference
startupProbe and such is discussed also in jupyterhub/zero-to-jupyterhub-k8s#1732 (comment)

timeoutSeconds: 3
livenessProbe:
enabled: true
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 3
timeoutSeconds: 10
labels: {}

dind:
Expand Down