Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PD pods created due to failover are not deleted after the failed pods recover #2161

Closed
DanielZhangQD opened this issue Apr 13, 2020 · 3 comments · Fixed by #2300
Closed

PD pods created due to failover are not deleted after the failed pods recover #2161

DanielZhangQD opened this issue Apr 13, 2020 · 3 comments · Fixed by #2300
Assignees
Milestone

Comments

@DanielZhangQD
Copy link
Contributor

Bug Report

What version of Kubernetes are you using?

1.12.8
What version of TiDB Operator are you using?

1.1.0-rc.1
What storage classes exist in the Kubernetes cluster and what are used for PD/TiKV pods?

local-storage
What's the status of the TiDB cluster pods?

Running
What did you do?

I have 3 pd pods, then pd-1 failed, after 5 minutes, pd-3 is created, but I only have 3 nodes, so pd-3 is in Pending State, after 5 minutes, pd-4 is created and also in Pending state, then pd-1 is Ready
What did you expect to see?
pd-3 and pd-4 are deleted because my 3 pd pods are all ready now.
What did you see instead?
pd-3 and pd-4 are in Pending state forever.

@DanielZhangQD
Copy link
Contributor Author

PD status:

  pd:
    failureMembers:
      dan10-pd-1:
        createdAt: "2020-04-11T15:22:36Z"
        memberDeleted: true
        memberID: "8148274967755769641"
        podName: dan10-pd-1
        pvcUID: 94b6cae1-7a43-11ea-bc25-b654036b575c
      dan10-pd-3:
        createdAt: "2020-04-12T14:44:36Z"
        memberDeleted: true
        memberID: "8570340967003158874"
        podName: dan10-pd-3
        pvcUID: ef6ca412-7c09-11ea-bc25-b654036b575c
    image: pingcap/pd:v3.1.0-rc
    leader:
      clientURL: http://dan10-pd-0.dan10-pd-peer.dan10.svc:2379
      health: true
      id: "11967912460155284990"
      lastTransitionTime: "2020-04-09T09:22:34Z"
      name: dan10-pd-0
    members:
      dan10-pd-0:
        clientURL: http://dan10-pd-0.dan10-pd-peer.dan10.svc:2379
        health: true
        id: "11967912460155284990"
        lastTransitionTime: "2020-04-09T09:22:34Z"
        name: dan10-pd-0
      dan10-pd-1:
        clientURL: http://dan10-pd-1.dan10-pd-peer.dan10.svc:2379
        health: true
        id: "17795000137901483009"
        lastTransitionTime: "2020-04-12T14:45:11Z"
        name: dan10-pd-1
      dan10-pd-2:
        clientURL: http://dan10-pd-2.dan10-pd-peer.dan10.svc:2379
        health: true
        id: "17406365543539376262"
        lastTransitionTime: "2020-04-09T09:22:34Z"
        name: dan10-pd-2
    phase: Normal
    statefulSet:
      collisionCount: 0
      currentReplicas: 5
      currentRevision: dan10-pd-8485bd776b
      observedGeneration: 3
      readyReplicas: 3
      replicas: 5
      updateRevision: dan10-pd-8485bd776b
      updatedReplicas: 5
    synced: true
    unjoinedMembers:
      dan10-pd-3:
        createdAt: "2020-04-13T01:43:30Z"
        podName: dan10-pd-3
        pvcUID: 28993250-7ccc-11ea-bc25-b654036b575c
      dan10-pd-4:
        createdAt: "2020-04-13T01:43:30Z"
        podName: dan10-pd-4
        pvcUID: 23bb9287-7ccc-11ea-bc25-b654036b575c

Pod Status:

dan10-pd-0                         1/1     Running   0          3d16h
dan10-pd-1                         1/1     Running   0          33h
dan10-pd-2                         1/1     Running   0          3d16h
dan10-pd-3                         0/1     Pending   0          11h
dan10-pd-4                         0/1     Pending   0          11h

@DanielZhangQD DanielZhangQD added this to the v1.1.0 milestone Apr 14, 2020
@cofyc cofyc self-assigned this Apr 15, 2020
@jamiechapmanbrn
Copy link

We're running into the same problem with tikv. Is there a recommended workaround for this?

@DanielZhangQD
Copy link
Contributor Author

We're running into the same problem with tikv. Is there a recommended workaround for this?

@jamiechapmanbrn TiDB Operator does not delete the TiKV pods that created during failover because it involves data migration. If the failed TiKV Pods recover, you can follow the doc here to remove the newly created pods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants