Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leases may be auto-renewed indefinitely due to leader elections #9888

Closed
jpbetz opened this issue Jun 26, 2018 · 8 comments
Closed

Leases may be auto-renewed indefinitely due to leader elections #9888

jpbetz opened this issue Jun 26, 2018 · 8 comments

Comments

@jpbetz
Copy link
Contributor

jpbetz commented Jun 26, 2018

Per the etcd ops guide: "Once a majority of members works, the etcd cluster elects a new leader automatically and returns to a healthy state. The new leader extends timeouts automatically for all leases. This mechanism ensures no lease expires due to server side unavailability."

Kubernetes event leases are 1hr by default are never renewed, and if not revoked according to their original TTL, the volume of events can eventually exceed the etcd storage space limit is exceeded or some other limit is hit (e.g. lease count results in excessively expensive revoke operations).

Would it be possible to either:

a) Persist remaining TTL for long leases so they are expired on-time after leader changes (say, TTLs of, say more than 5min ?)
b) Allow auto-renew to be disabled when creating leases

From @xiang90's comment on kubernetes/kubernetes#65497 it sounds like this might need to persist remaining durations to support either.

xref: kubernetes/kubernetes#65497

@jpbetz
Copy link
Contributor Author

jpbetz commented Jun 26, 2018

c.f. #9526. Thanks for finding @gyuho

@xiang90
Copy link
Contributor

xiang90 commented Jun 26, 2018

a) Do not auto-renew TTLs for long leases (say, TTLs of, say more than 5min ?)

As I suggested in the k8s issue, a better approach is to persist the remaining TTL of long leases lazily. It wont hurt performance :)

@jpbetz
Copy link
Contributor Author

jpbetz commented Jun 26, 2018

@xiang90 Sounds good, I'll update the description to match. What do you mean by "lazily"?

@jpbetz jpbetz self-assigned this Jun 26, 2018
@xiang90
Copy link
Contributor

xiang90 commented Jun 26, 2018

@jpbetz

If we persist the deadline of all the leases through raft for every keepalive request, then after a new leader is elected it wont need to refresh all leases (since it knows the deadline already). But this can be super expensive when we have a lot of keepalives for short leases(i believe chubby paper mentioned this too). Yes, some users will send keepalive at second level.

However, if we persist nothing then we have to do a refresh which leads to the problem you just described. We can make a tradeoff by persisting the deadline of a long lease every X minutes (or only every X minutes only if it is refreshed by a keepalive)

@jpbetz
Copy link
Contributor Author

jpbetz commented Jun 26, 2018

Got it. Thanks @xiang90!

@wojtek-t
Copy link
Contributor

+1 to what Xiang described - this sounds really reasonable.

@jpbetz
Copy link
Contributor Author

jpbetz commented Jun 29, 2018

Great, I'll write up a short "design doc" for how we plan to implement and circulate it for review.

@jpbetz
Copy link
Contributor Author

jpbetz commented Jul 14, 2018

@xiang90 Here's an approach that seems to work well and I believe is inline with what was proposed above: #9924. Let me know if this differs from what you were thinking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants