Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix PD failover #2570

Merged
merged 8 commits into from
May 27, 2020
Merged

Fix PD failover #2570

merged 8 commits into from
May 27, 2020

Conversation

Yisaer
Copy link
Contributor

@Yisaer Yisaer commented May 27, 2020

What problem does this PR solve?

Limit the PD Member in checking PD failover. If the PD Member didn't manage by the operator, the operator wouldn't failover if the unmanaged pd member failed.

Does this PR introduce a user-facing change?:

`Operator` won't failover failed `PD` Member if it didn't be managed by `Operator`.

@cofyc cofyc mentioned this pull request May 27, 2020
9 tasks
for podName, pdMember := range tc.Status.PD.Members {
if !podNames.Has(podName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should update tryToMarkAPeerAsFailure instead of here, why ignore the non-managed members in the cluster health check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ok for me. updated.

@Yisaer Yisaer requested a review from DanielZhangQD May 27, 2020 06:02
for podName, pdMember := range tc.Status.PD.Members {
if !podNames.Has(podName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep consistent with tikv handling here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

@Yisaer Yisaer requested a review from DanielZhangQD May 27, 2020 06:47
@@ -125,6 +125,12 @@ func (pf *pdFailover) tryToMarkAPeerAsFailure(tc *v1alpha1.TidbCluster) error {
if pdMember.LastTransitionTime.IsZero() {
continue
}
if !pf.isPodDesired(tc, podName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also delete the non desired members in the failure members as done in https://github.com/pingcap/tidb-operator/pull/2560/files#diff-cff8f7143431f3d6302182e300a81909R58?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's necessary right now since all failure members will be cleared. anyway, we can revise it later.

weekface
weekface previously approved these changes May 27, 2020
Co-authored-by: Yecheng Fu <cofyc.jackson@gmail.com>
@cofyc
Copy link
Contributor

cofyc commented May 27, 2020

/merge

@sre-bot
Copy link
Contributor

sre-bot commented May 27, 2020

Your auto merge job has been accepted, waiting for:

  • 2504

@sre-bot
Copy link
Contributor

sre-bot commented May 27, 2020

/run-all-tests

@sre-bot sre-bot merged commit c75ae12 into pingcap:master May 27, 2020
@DanielZhangQD
Copy link
Contributor

/run-cherry-picker

sre-bot pushed a commit to sre-bot/tidb-operator that referenced this pull request May 28, 2020
Signed-off-by: sre-bot <sre-bot@pingcap.com>
@sre-bot sre-bot mentioned this pull request May 28, 2020
@sre-bot
Copy link
Contributor

sre-bot commented May 28, 2020

cherry pick to release-1.1 in PR #2577

sre-bot added a commit to sre-bot/tidb-operator that referenced this pull request May 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants