Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bring leader election budget mechanism in etcd leader election to stablize cluster availability and reliability #17326

Closed
armstrongli opened this issue Jan 26, 2024 · 3 comments

Comments

@armstrongli
Copy link

armstrongli commented Jan 26, 2024

What would you like to be added?

bring the leader election budget mechanism to etcd leader election to etcd.

goal: the member with negative budget is not voted within a configured period

h2. overall workflow:

h3. normal flow:

  1. all the members have a default budget when joining the cluster(e.g. 10)
  2. every member has records about members and their stores(e.g. a:10, b:10, c:10,...)
  3. every member reduce the budget of that member after voting it. e.g. a starts a vote, then , b has: a:9, c:10.
  4. after cluster stabilizes for some while(e.g. 20min), the budget is reset to default budget

h3. cluster with a bad guy

  1. all members have a default budget when joining the cluster(e.g. 10)
  2. every member has records about members and their stores(e.g. a:10, b:10, c:10,...)
  3. every member reduce the budget of that member after voting it. e.g. a starts a vote, then , b has: a:9, c:10.
  4. a is a bad guy and can't reach b & c and loses leader after get it
  5. b or c starts the leader and get the leader
  6. a starts another leader again and get the leader on next term
  7. repeat #5 and #6 until
  8. a starts a leader election, b and c's budget on a is -1. they won't vote it anymore. and every vote from a postpone the budget reset to avoid a gets the leader.

Why is this needed?

etcd is built on mutual trust. everything goes well in common case that network is good, disk IO is good, etc. and all the followers follow the new leader on new term.

but it can't survive the scenario that some member is not stable and make the cluster thrashing on leader election. bring the trust mechanism is a good way to allow etcd survive from such scenarios.

@serathius
Copy link
Member

Have you enabled --pre-vote flag on etcd (default in v3.5)? It should prevent faulty member continuously forcing leader re-election. It works by adding additional pre-election phase, where healthy members can reject leader election if cluster is healthy. So faulty member request for election will be rejected.

@armstrongli
Copy link
Author

we don't have the flag enabled. i'll take a look this option and do investigation.

@armstrongli
Copy link
Author

@serathius thank you very much for the info. i didn't aware of this feature. it fulfill our requirement according to the feature description.

- For instance, a flaky(or rejoining) member may drop in and out, and start campaign. This member will end up with a higher term, and ignore all incoming messages with lower term. In this case, a new leader eventually need to get elected, thus disruptive to cluster availability. Raft implements Pre-Vote phase to prevent this kind of disruptions. If enabled, Raft runs an additional phase of election to check if pre-candidate can get enough votes to win an election.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants