Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod Stuck in Terminating state if tainted node is power cycled.... #118286

Closed
nmathew opened this issue May 26, 2023 · 22 comments
Closed

Pod Stuck in Terminating state if tainted node is power cycled.... #118286

nmathew opened this issue May 26, 2023 · 22 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@nmathew
Copy link

nmathew commented May 26, 2023

What happened?

If I power cycle the node after tainting the node, certain pods are stuck in "Terminating" state(these pods usually take more time to delete)

kubectl taint nodes control-plane-dc275-2-master-n5-1003 node-role.kubernetes.io/master:NoExecute
poweroff -f --reboot

pod is stuck in Terminating state.

But If I remove the taint immediately after poweroff, eventually pod will get removed.

If I untainted after the node comes up, the pod is stuck in terminating state.

Kubernetes Version: v1.27.1

What did you expect to happen?

Pod should get removed

How can we reproduce it (as minimally and precisely as possible)?

kubectl taint nodes control-plane-dc275-2-master-n5-1003 node-role.kubernetes.io/master:NoExecute
poweroff -f --reboot

Anything else we need to know?

Pods stuck in Terminated state usually take more time to get deleted

Kubernetes version

v1.27.1

Cloud provider

none

OS version

SLES 15 SP4

Install tools

none

Container runtime (CRI) and version (if applicable)

containerd

Related plugins (CNI, CSI, ...) and versions (if applicable)

No response

@nmathew nmathew added the kind/bug Categorizes issue or PR as related to a bug. label May 26, 2023
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label May 26, 2023
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label May 26, 2023
@nmathew
Copy link
Author

nmathew commented May 29, 2023

Not seen in kubernetes 1.26.1

@nmathew
Copy link
Author

nmathew commented May 31, 2023

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 31, 2023
@nmathew
Copy link
Author

nmathew commented Jun 1, 2023

@smarterclayton @wojtek-t @bobbypage any pointers?

@harche
Copy link
Contributor

harche commented Jun 7, 2023

/cc

@SergeyKanzhelev
Copy link
Member

/triage needs-information

Can you share the kubelet.log after restart? Maybe also before restart to see what state those Pods were at the moment of power cycle.

What is the time difference between immediately un-tainting or after node come up. What is immediately here?

@k8s-ci-robot k8s-ci-robot added the triage/needs-information Indicates an issue needs more information in order to work on it. label Jun 7, 2023
@zyriljamez
Copy link

This issue is reproduced with k8s 1.27.2 as well.

After tainting the node, within 5 seconds the node is power cycled. Some of the pods are stuck in "Terminating" state.

Tried restarting kubelet multiple times, pods are not removed. But as soon as the taint is removed and the kubelet is restarted, the "Terminating" pods are removed.

@mboukhalfa
Copy link
Member

mboukhalfa commented Jun 9, 2023

Can you share the kubelet.log after restart? Maybe also before restart to see what state those Pods were at the moment of power cycle.

I reproduced this issue please find kubelet logs :

kubelet.txt

@bobbypage
Copy link
Member

bobbypage commented Jun 12, 2023

Can you please provide the kubectl get pod {SOME_POD} -o yaml of one the pods stuck in terminating? It would be good to see the state of it.

I tried to play around with a kind cluster to repro this but was unable to. Can you please provide the more detailed repro instructions?

In my repro attempt, the pod was in terminating until the kubelet restarted. After the kubelet restarted, the pod ends up getting deleted.

@bobbypage
Copy link
Member

bobbypage commented Jun 12, 2023

Here's my repro attempt using a kind cluster... after the kubelet is restarted, the pod is deleted, so I am not able to repro the behavior mentioned in the issue. Is there something I'm missing here?

$ k get nodes
NAME                 STATUS   ROLES           AGE     VERSION
kind-control-plane   Ready    control-plane   8m57s   v1.27.1
kind-worker          Ready    <none>          8m32s   v1.27.1

$ kubectl run debug-sleep --image=k8s.gcr.io/busybox:latest --restart=Always --pod-running-timeout=20m sleep 9999999
pod/debug-sleep created


$ k get pods -A -o wide
NAMESPACE            NAME                                         READY   STATUS    RESTARTS   AGE     IP            NODE                 NOMINATED NODE   READINESS GATES
default              debug-sleep                                  1/1     Running   0          8s      10.244.1.3    kind-worker          <none>           <none>
kube-system          coredns-5d78c9869d-6rtd9                     1/1     Running   0          2m50s   10.244.0.4    kind-control-plane   <none>           <none>
kube-system          coredns-5d78c9869d-s2g95                     1/1     Running   0          2m50s   10.244.0.2    kind-control-plane   <none>           <none>
kube-system          etcd-kind-control-plane                      1/1     Running   0          3m4s    192.168.8.3   kind-control-plane   <none>           <none>
kube-system          kindnet-fh952                                1/1     Running   0          2m50s   192.168.8.3   kind-control-plane   <none>           <none>
kube-system          kindnet-kqfqd                                1/1     Running   0          2m43s   192.168.8.2   kind-worker          <none>           <none>
kube-system          kube-apiserver-kind-control-plane            1/1     Running   0          3m6s    192.168.8.3   kind-control-plane   <none>           <none>
kube-system          kube-controller-manager-kind-control-plane   1/1     Running   0          3m4s    192.168.8.3   kind-control-plane   <none>           <none>
kube-system          kube-proxy-gvvt9                             1/1     Running   0          2m50s   192.168.8.3   kind-control-plane   <none>           <none>
kube-system          kube-proxy-snkml                             1/1     Running   0          2m43s   192.168.8.2   kind-worker          <none>           <none>
kube-system          kube-scheduler-kind-control-plane            1/1     Running   0          3m4s    192.168.8.3   kind-control-plane   <none>           <none>
local-path-storage   local-path-provisioner-6bc4bddd6b-xcctc      1/1     Running   0          2m50s   10.244.0.3    kind-control-plane   <none>           <none>

# Simulating the issue by tainting the node and stopping the kubelet.

$ kubectl taint nodes kind-worker node-role.kubernetes.io/master:NoExecute

$ docker exec -it kind-worker /bin/bash
root@kind-worker:/# systemctl stop kubelet
root@kind-worker:/#

# Pod is terminating, below:

$ k get pods -A -o wide
NAMESPACE            NAME                                         READY   STATUS        RESTARTS   AGE     IP            NODE                 NOMINATED NODE   READINESS GATES
default              debug-sleep                                  1/1     Terminating   0          3m38s   10.244.1.3    kind-worker          <none>           <none>
kube-system          coredns-5d78c9869d-6rtd9                     1/1     Running       0          6m20s   10.244.0.4    kind-control-plane   <none>           <none>
kube-system          coredns-5d78c9869d-s2g95                     1/1     Running       0          6m20s   10.244.0.2    kind-control-plane   <none>           <none>
kube-system          etcd-kind-control-plane                      1/1     Running       0          6m34s   192.168.8.3   kind-control-plane   <none>           <none>
kube-system          kindnet-fh952                                1/1     Running       0          6m20s   192.168.8.3   kind-control-plane   <none>           <none>
kube-system          kindnet-kqfqd                                1/1     Running       0          6m13s   192.168.8.2   kind-worker          <none>           <none>
kube-system          kube-apiserver-kind-control-plane            1/1     Running       0          6m36s   192.168.8.3   kind-control-plane   <none>           <none>
kube-system          kube-controller-manager-kind-control-plane   1/1     Running       0          6m34s   192.168.8.3   kind-control-plane   <none>           <none>
kube-system          kube-proxy-gvvt9                             1/1     Running       0          6m20s   192.168.8.3   kind-control-plane   <none>           <none>
kube-system          kube-proxy-snkml                             1/1     Running       0          6m13s   192.168.8.2   kind-worker          <none>           <none>
kube-system          kube-scheduler-kind-control-plane            1/1     Running       0          6m34s   192.168.8.3   kind-control-plane   <none>           <none>
local-path-storage   local-path-provisioner-6bc4bddd6b-xcctc      1/1     Running       0          6m20s   10.244.0.3    kind-control-plane   <none>           <none>


$ k get pod debug-sleep -o yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2023-06-12T23:34:07Z"
  deletionGracePeriodSeconds: 30
  deletionTimestamp: "2023-06-12T23:35:30Z"
  labels:
    run: debug-sleep
  name: debug-sleep
  namespace: default
  resourceVersion: "814"
  uid: e8e04d42-b620-4973-9fab-a6700e9994f8
spec:
  containers:
  - args:
    - sleep
    - "9999999"
    image: k8s.gcr.io/busybox:latest
    imagePullPolicy: Always
    name: debug-sleep
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-rpvrk
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: kind-worker
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: kube-api-access-rpvrk
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2023-06-12T23:35:00Z"
    message: 'Taint manager: deleting due to NoExecute taint'
    reason: DeletionByTaintManager
    status: "True"
    type: DisruptionTarget
  - lastProbeTime: null
    lastTransitionTime: "2023-06-12T23:34:07Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2023-06-12T23:34:09Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2023-06-12T23:34:09Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2023-06-12T23:34:07Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://86896d23ada950bcd3ea50526fa51e32423ee4d722dd257e570838f93cedc079
    image: k8s.gcr.io/busybox:latest
    imageID: sha256:36a4dca0fe6fb2a5133dc11a6c8907a97aea122613fa3e98be033959a0821a1f
    lastState: {}
    name: debug-sleep
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2023-06-12T23:34:09Z"
  hostIP: 192.168.8.2
  phase: Running
  podIP: 10.244.1.3
  podIPs:
  - ip: 10.244.1.3
  qosClass: BestEffort
  startTime: "2023-06-12T23:34:07Z"


$ docker exec -it kind-worker /bin/bash
root@kind-worker:/# systemctl start kubelet

# Pod is deleted after the the kubelet is started

$ k get pods -A -o wide
NAMESPACE            NAME                                         READY   STATUS    RESTARTS   AGE     IP            NODE                 NOMINATED NODE   READINESS GATES
kube-system          coredns-5d78c9869d-6rtd9                     1/1     Running   0          6m53s   10.244.0.4    kind-control-plane   <none>           <none>
kube-system          coredns-5d78c9869d-s2g95                     1/1     Running   0          6m53s   10.244.0.2    kind-control-plane   <none>           <none>
kube-system          etcd-kind-control-plane                      1/1     Running   0          7m7s    192.168.8.3   kind-control-plane   <none>           <none>
kube-system          kindnet-fh952                                1/1     Running   0          6m53s   192.168.8.3   kind-control-plane   <none>           <none>
kube-system          kindnet-kqfqd                                1/1     Running   0          6m46s   192.168.8.2   kind-worker          <none>           <none>
kube-system          kube-apiserver-kind-control-plane            1/1     Running   0          7m9s    192.168.8.3   kind-control-plane   <none>           <none>
kube-system          kube-controller-manager-kind-control-plane   1/1     Running   0          7m7s    192.168.8.3   kind-control-plane   <none>           <none>
kube-system          kube-proxy-gvvt9                             1/1     Running   0          6m53s   192.168.8.3   kind-control-plane   <none>           <none>
kube-system          kube-proxy-snkml                             1/1     Running   0          6m46s   192.168.8.2   kind-worker          <none>           <none>
kube-system          kube-scheduler-kind-control-plane            1/1     Running   0          7m7s    192.168.8.3   kind-control-plane   <none>           <none>
local-path-storage   local-path-provisioner-6bc4bddd6b-xcctc      1/1     Running   0          6m53s   10.244.0.3    kind-control-plane   <none>           <none>

@nmathew
Copy link
Author

nmathew commented Jun 13, 2023

hi @bobbypage

systemctl stop kubelet is a graceful shutdown right? I was able to reproduce the issue with poweroff -f --reboot, which is not a graceful shutdown.

may be kill -9 {kubelet pid} might create this issue.

-Nobin

@mboukhalfa
Copy link
Member

mboukhalfa commented Jun 13, 2023

Here's my repro attempt using a kind cluster... after the kubelet is restarted, the pod is deleted, so I am not able to repro the behavior mentioned in the issue. Is there something I'm missing here?

Probably you need to increase the number of pod replicas !

@mboukhalfa
Copy link
Member

Can you please provide the kubectl get pod {SOME_POD} -o yaml of one the pods stuck in terminating? It would be good to see the state of it.

Here is yaml for a pod stucking in terminating state :

[metal3@test1-8ggvx ~]$ kubectl get pods  my-deployment-544587cc57-262wz -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    cni.projectcalico.org/containerID: 00d1c2a7e039917a3e4aa8dab55b146695f395caaa472a19d9c4df48674bd20d
    cni.projectcalico.org/podIP: ""
    cni.projectcalico.org/podIPs: ""
  creationTimestamp: "2023-06-09T09:07:30Z"
  deletionGracePeriodSeconds: 30
  deletionTimestamp: "2023-06-09T14:29:49Z"
  generateName: my-deployment-544587cc57-
  labels:
    app: my-app
    pod-template-hash: 544587cc57
  name: my-deployment-544587cc57-262wz
  namespace: default
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: my-deployment-544587cc57
    uid: ae3840ab-62a9-42fb-8bef-7f17106016a7
  resourceVersion: "123282"
  uid: 230e9786-cb81-429a-aa14-d5da148b86cb
spec:
  containers:
  - image: nginx:latest
    imagePullPolicy: Always
    name: my-container
    ports:
    - containerPort: 80
      protocol: TCP
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-l762r
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: test1-55b88586c9xcp4bd-jwz2v
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: kube-api-access-l762r
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2023-06-09T14:29:14Z"
    message: 'Taint manager: deleting due to NoExecute taint'
    reason: DeletionByTaintManager
    status: "True"
    type: DisruptionTarget
  message: 'Pod was rejected: Predicate TaintToleration failed'
  phase: Failed
  reason: TaintToleration
  startTime: "2023-06-09T09:08:48Z"

@mboukhalfa
Copy link
Member

mboukhalfa commented Jun 13, 2023

@bobbypage
My repro was using metal3 cluster now I tried with kind I tested with 25 pods when stopped kubelet many pods were terminating but after starting it some got removed but few kept terminating as shown bellow so I think you need to increase the replicas in your env
image

@SergeyKanzhelev
Copy link
Member

@bobbypage do you have enough to repro?

@mboukhalfa
Copy link
Member

@bobbypage @SergeyKanzhelev any other information needed

@bobbypage
Copy link
Member

bobbypage commented Jun 22, 2023

Thanks @mboukhalfa

I was able to repro now, I followed the same exact steps as #118286 (comment) but instead of running 1 replica, I ran a deployment with 50 replicas.

$ kubectl create deployment debug-sleep --image=k8s.gcr.io/busybox:latest --replicas=50 -- sleep 999999

Here is one of the pods that is stuck in "Terminating" as per kubectl

$ k get pod  debug-sleep-85dbbfc897-2lpwd -o yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2023-06-22T18:48:22Z"
  deletionGracePeriodSeconds: 30
  deletionTimestamp: "2023-06-22T18:49:49Z"
  generateName: debug-sleep-85dbbfc897-
  labels:
    app: debug-sleep
    pod-template-hash: 85dbbfc897
  name: debug-sleep-85dbbfc897-2lpwd
  namespace: default
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: debug-sleep-85dbbfc897
    uid: 256c2eff-2fd1-4409-8593-d8d29c897d5d
  resourceVersion: "2355"
  uid: f1e3e13d-89e3-43f6-8498-a0252564e9a7
spec:
  containers:
  - command:
    - sleep
    - "9999999"
    image: k8s.gcr.io/busybox:latest
    imagePullPolicy: Always
    name: busybox
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-hbbtr
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: kind-worker
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: kube-api-access-hbbtr
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2023-06-22T18:49:14Z"
    message: 'Taint manager: deleting due to NoExecute taint'
    reason: DeletionByTaintManager
    status: "True"
    type: DisruptionTarget
  message: 'Pod was rejected: Predicate TaintToleration failed'
  phase: Failed
  reason: TaintToleration
  startTime: "2023-06-22T18:48:22Z"

Note: The phase is Failed, the pod has failed kubelet admission due to Predicate TaintToleration failed and deletion timestamp is set, but the pod is not deleted.

The kubelet logs - https://gist.github.com/bobbypage/d8f59c6d73d9527fbef3e5d8d1fe3f33

Jun 22 18:49:52 kind-worker kubelet[6270]: I0622 18:49:52.511976    6270 status_manager.go:893] "Delaying pod deletion as the phase is non-terminal" phase=Failed pod="default/debug-sleep-85dbbfc897-2lpwd" podUID=f1e3e13d-89e3-43f6-8498-a0252564e9a7

which is same as in #118472 (comment)

@bobbypage
Copy link
Member

I expect that this is a duplicate of #118472

@mboukhalfa
Copy link
Member

@bobbypage thanks I have noticed that a fix for that issue already merged #118497 and cherry picked in the open PR #118841 I am waiting for a release with one of them so I can test if fixed the issue here.

@mboukhalfa
Copy link
Member

I just tested with v1.27.4 and this issue did not happen ! As you expected @bobbypage the PRs related ti #118472 fixed this as well
@nmathew I think this issue can be closed now

@rphillips
Copy link
Member

/close

@k8s-ci-robot
Copy link
Contributor

@rphillips: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
Archived in project
Development

No branches or pull requests

8 participants