Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do we need to remove node on failure? #54

Closed
JackyChiu opened this issue Mar 3, 2020 · 5 comments · Fixed by #63
Closed

Do we need to remove node on failure? #54

JackyChiu opened this issue Mar 3, 2020 · 5 comments · Fixed by #63
Assignees
Labels
research Research topic and discuss with team

Comments

@JackyChiu
Copy link
Contributor

From #15 (comment) we saw that changes to the etcd cluster won't be allowed if it isn't healthy. The case that we are concerned about is if a node in the cluster gets taken down. We think it might jeopardize the health of the cluster and not allow changes.

If we do need to remove it on failure, we need to build a mechanism for it.

@JackyChiu JackyChiu added the research Research topic and discuss with team label Mar 3, 2020
@JackyChiu JackyChiu self-assigned this Mar 3, 2020
@bkzhang
Copy link
Contributor

bkzhang commented Mar 6, 2020

MemberRemove to remove, and to keep track of which nodes are healthy, a possible solution is using cli.Status and checking StatusResponse.Errors. Another possibility is to see if a node's changed using watcher however we need to peerURL to remove member so this may not be possible

@JackyChiu
Copy link
Contributor Author

JackyChiu commented Mar 7, 2020

I have just realized that this is only an issue if the cluster loses quorum. Ie. in the scenario where we have 3 nodes and one goes down, the addition of a new member will still work. Therefore it is not mandatory to explicitly remove a node on failure.

Will confirm this with a test. If this is true, I think we can ignore this issue.

EDIT: nope, looks like this is an issue: 23871c3

@JackyChiu
Copy link
Contributor Author

https://github.com/etcd-io/etcd/blob/master/Documentation/faq.md#should-i-add-a-member-before-removing-an-unhealthy-member

This might be the reason why my tests don't work. Since when I try to add node 4 in a 3 node cluster with 1 failing, it requires 3 nodes of quorum.

@JackyChiu
Copy link
Contributor Author

Moving this back to TODO to prioritze the project's poster fair slides and final paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
research Research topic and discuss with team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants