Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Health Probes do not gracefully shutdown traffic for externalTrafficPolicy Cluster services #3499

Closed
JoelSpeed opened this issue Mar 6, 2023 · 12 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Milestone

Comments

@JoelSpeed
Copy link
Contributor

JoelSpeed commented Mar 6, 2023

What happened:

I have been investigating disruption of services behind load balancers with their externalTrafficPolicy: Cluster (herein ETP Cluster). Currently in Azure, for an ETP Cluster service, the health probes on Azure are configured to check the node port of the application itself, using a blind TCP check. They can be configured to use an HTTP(S) based check but this only allows the path of the check to be specified, it doesn't allow you to specify a custom port.

When draining an application, for an ETP Local service, assuming you have configured graceful shutdown correctly, you can return a negative health probe while still accepting new connections, during the graceful termination period. This allows you to signal to the load balancer before you go away, that you are going away. Allowing it to route new connections to other nodes.

In an ETP Cluster service, this doesn't work. With the same backend, during the graceful shutdown period, the application will start showing a negative readiness probe within the cluster, and will be removed from the service endpoints. To the nodeport however, this is completely opaque and it has no idea that the pod has been removed from the node.

As soon as the application goes into an unready state, kube-proxy (or equivalent) on the node will start routing traffic to other endpoints within the service. The load balancer still believes that the instance is healthy, no matter how you configure the health checks. It is only once the node itself can no longer route traffic to these other endpoints, that the health probes start to fail.
Because we are health checking, effectively, another load balancer, this can only happen after the application can no longer accept new connections (typically this means kube-proxy has shut down).

What you expected to happen:

I should be able to configure a health probe that tells Azure that my instance will shortly, no longer be able to serve traffic, allowing it to remove the instance from service before the instance can no longer accept new connections.

How to reproduce it (as minimally and precisely as possible):

I've created a test that you can run which runs some disruption tests to observe how much a service is disrupted when we upgrade a cluster (we run this within CI as part of OpenShift's release process).

To run it, you will need

  • A copy of oc on your PATH (download here)
  • Check out the disruption-test branch from github.com/JoelSpeed/origin
  • An Azure cluster running with your KUBECONFIG var pointing to the the cluster
  • To execute go test ./test/e2e/upgrade/service/disruption-test -v –timeout=60m from the checked out repo above

Note, this test will take approximately 30 minutes to run and will fail if there is any disruption to the service. We are typically seeing between 3 and 6 seconds of disruption, where we see 0 seconds with the equivalent for an ETP local service.

The test runs through each node in the cluster, drains the instance, reboots it, then uncordons it before moving onto the next instance.

Anything else we need to know?:

I think you don't necessarily need to reproduce this as we can have a discussion logically about this.

Since the service and kube-proxy handle the load balancing within the cluster, health checking the application itself is not helpful from an azure perspective to know whether to route traffic to the instance. Instead, we need to ascertain whether the instance has an ability to route traffic to the backend. This relies on whatever is implementing the node ports, in most cases this is kube-proxy.

Based on that, we should logically be checking kube-proxy's health, and not the node port itself.

In fact, GCP came to the same conclusion and, for an ETP Cluster service, they by default check port 10256 with path /healthz, which is exposed by kube-proxy. (internal and external).

I've been testing a patch that is very much WIP right now, but moves Azure's default ETP Cluster check to match GCPs and, in some testing with the above tests, I have achieved results of 0s of disruption on Azure as well. Because the Azure LB can now see when the instance is going to go unhealthy, it can remove the instance from service earlier meaning fewer dropped connections.

I would like to propose that we make that patch a feature of the Azure CCM and make sure that, by default, when no other health probe configuration is specified, Azure ETP Cluster services health check port 10256 with path /healthz instead of their current behaviour.

Some other notes:

  • I've been testing with ovn-kubernetes rather than kube-proxy, I'm working with the team there who have made ovn-k behave in the same way as kube-proxy for my testing, so the results should be the same
  • We also tried this with a patch that prevents Nodes from being removed from the load balancer when they are unready according to their status. We believe this is causing separate issues and I will raise a separate issue for this. Without the improved health checks, we were seeing 7-15s of disruption with this patch, with the improved health checks, we saw 0s.
  • @damdo and @ricky-rav have been helping me to put together the various patches we needed to get to this point
  • I was using the kubelets graceful node shutdown feature to test this, without it, disruption is around 4s while not removing instances from the load balancer, better than the 7-15 seconds we were seeing

Environment:

  • Kubernetes version (use kubectl version): 1.26.2
  • Cloud provider or hardware configuration: Azure
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools: OpenShift 4.13 pre-release
  • Network plugin and version (if this is a network-related bug): ovn-kubernetes (OpenShift 4.13 pre-release)
  • Others:
@JoelSpeed
Copy link
Contributor Author

I think this KEP which talks about graceful draining of connections might be related, I wonder if Azure maintainers have seen this?

@MartinForReal
Copy link
Contributor

The following annotation is added to change the probe port number for service. FYI
service.beta.kubernetes.io/port_{port}_health-probe_port

@JoelSpeed
Copy link
Contributor Author

Hi @MartinForReal thanks for the suggestion, have you confirmed that this actually works with the Azure cloud provider?

I'm on PTO right now so don't have a laptop handy, but I did go through the code pretty thoroughly before submitting this issue and came to the conclusion that there was no way to override the port. I saw other annotations of a similar format, but this one didn't appear to be supported in Azure.

@MartinForReal
Copy link
Contributor

I think it is added.

probePort, err := consts.GetHealthProbeConfigOfPortFromK8sSvcAnnotation(serviceManifest.Annotations, port.Port, consts.HealthProbeParamsPort, func(s *string) error {

@JoelSpeed
Copy link
Contributor Author

Reviewing that, looks like we need to have the port in the service for it to be used as a health check right?

return fmt.Errorf("port %s not found in service", *s)

So in that case, you'd expect to add a dummy port 10256 to the service, and then set service.beta.kubernetes.io/port_10256_health-probe_only, have I understood correctly?

@JoelSpeed
Copy link
Contributor Author

JoelSpeed commented Mar 29, 2023

I've tried the above today to no luck, if I specify a port annotation to redirect the health check port on my service to port 10256, it errors because there is no port within the service with port 10256. If i were to create a port, that doesn't actually help me because I can't force the nodeport to be port 10256

The service I was trying is as below:

apiVersion: v1
kind: Service
metadata:
  name: test-service
  annotations:
    service.beta.kubernetes.io/port_80_health-probe_port: "10256"
    service.beta.kubernetes.io/port_10256_no_probe_rule: "true"
    service.beta.kubernetes.io/port_10256_no_lb_rule: "true"
spec:
  selector:
    k8s-app: test
  ports:
  - name: app
    protocol: TCP
    port: 80
  - name: healthz
    protocol: TCP
    port: 10256
  type: LoadBalancer
  externalTrafficPolicy: Cluster

So how can I get the load balancer health probe port to be an arbitrary port that isn't within the bounds of the service?

@JoelSpeed
Copy link
Contributor Author

Cc @alexanderConstantinescu I know you've been working on improving disruption to services, could you check out this issue and see if you agree/if there's any more you can add

bertinatto pushed a commit to bertinatto/kubernetes that referenced this issue May 6, 2023
…rs from lb when unready

workaround to mitigate issue: kubernetes-sigs/cloud-provider-azure#3500
bug: https://issues.redhat.com/browse/OCPBUGS-7359

UPSTREAM: <carry>: legacy-cloud-providers: azure: use kube-proxy based health probes by default

See
issue: kubernetes-sigs/cloud-provider-azure#3499
bug: https://issues.redhat.com/browse/OCPBUGS-7359
bertinatto pushed a commit to bertinatto/kubernetes that referenced this issue Jun 9, 2023
…rs from lb when unready

workaround to mitigate issue: kubernetes-sigs/cloud-provider-azure#3500
bug: https://issues.redhat.com/browse/OCPBUGS-7359

UPSTREAM: <carry>: legacy-cloud-providers: azure: use kube-proxy based health probes by default

See
issue: kubernetes-sigs/cloud-provider-azure#3499
bug: https://issues.redhat.com/browse/OCPBUGS-7359
bertinatto pushed a commit to bertinatto/kubernetes that referenced this issue Jun 15, 2023
…rs from lb when unready

workaround to mitigate issue: kubernetes-sigs/cloud-provider-azure#3500
bug: https://issues.redhat.com/browse/OCPBUGS-7359

UPSTREAM: <carry>: legacy-cloud-providers: azure: use kube-proxy based health probes by default

See
issue: kubernetes-sigs/cloud-provider-azure#3499
bug: https://issues.redhat.com/browse/OCPBUGS-7359
bertinatto pushed a commit to bertinatto/kubernetes that referenced this issue Jul 25, 2023
…rs from lb when unready

workaround to mitigate issue: kubernetes-sigs/cloud-provider-azure#3500
bug: https://issues.redhat.com/browse/OCPBUGS-7359

UPSTREAM: <carry>: legacy-cloud-providers: azure: use kube-proxy based health probes by default

See
issue: kubernetes-sigs/cloud-provider-azure#3499
bug: https://issues.redhat.com/browse/OCPBUGS-7359
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 23, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 22, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants