Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip leadership check if the etcd instance is active processing heartbeats #18428

Merged
merged 1 commit into from
Aug 14, 2024

Conversation

ahrtr
Copy link
Member

@ahrtr ahrtr commented Aug 9, 2024

Resolve #18069

Will post benchmark result later.

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

@ahrtr
Copy link
Member Author

ahrtr commented Aug 9, 2024

Summary

This PR resolves the performance issue #18069

Main branch

Summary:
  Total:	25.8324 secs.
  Slowest:	0.1114 secs.
  Fastest:	0.0001 secs.
  Average:	0.0024 secs.
  Stddev:	0.0030 secs.
  Requests/sec:	38711.1170

Response time histogram:
  0.0001 [1]	|
  0.0112 [986525]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.0223 [10104]	|
  0.0335 [1905]	|
  0.0446 [712]	|
  0.0557 [247]	|
  0.0669 [243]	|
  0.0780 [234]	|
  0.0891 [21]	|
  0.1003 [5]	|
  0.1114 [3]	|

Latency distribution:
  10% in 0.0008 secs.
  25% in 0.0012 secs.
  50% in 0.0017 secs.
  75% in 0.0027 secs.
  90% in 0.0043 secs.
  95% in 0.0060 secs.
  99% in 0.0132 secs.
  99.9% in 0.0406 secs.

This PR

Summary:
  Total:	14.8739 secs.
  Slowest:	0.0824 secs.
  Fastest:	0.0000 secs.
  Average:	0.0012 secs.
  Stddev:	0.0014 secs.
  Requests/sec:	67231.7847

Response time histogram:
  0.0000 [1]	|
  0.0083 [996078]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.0165 [2981]	|
  0.0247 [548]	|
  0.0330 [185]	|
  0.0412 [99]	|
  0.0494 [26]	|
  0.0577 [41]	|
  0.0659 [39]	|
  0.0741 [0]	|
  0.0824 [2]	|

Latency distribution:
  10% in 0.0004 secs.
  25% in 0.0006 secs.
  50% in 0.0009 secs.
  75% in 0.0014 secs.
  90% in 0.0021 secs.
  95% in 0.0028 secs.
  99% in 0.0063 secs.
  99.9% in 0.0160 secs.

Revert the #16822 (checked out 2f03bbc5dd34b4250c9baece58754e8180786bd4)

Summary:
  Total:	14.6155 secs.
  Slowest:	0.0637 secs.
  Fastest:	0.0000 secs.
  Average:	0.0011 secs.
  Stddev:	0.0012 secs.
  Requests/sec:	68420.5991

Response time histogram:
  0.0000 [1]	|
  0.0064 [993920]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.0128 [4897]	|
  0.0191 [705]	|
  0.0255 [216]	|
  0.0319 [158]	|
  0.0383 [55]	|
  0.0446 [20]	|
  0.0510 [7]	|
  0.0574 [16]	|
  0.0637 [5]	|

Latency distribution:
  10% in 0.0004 secs.
  25% in 0.0006 secs.
  50% in 0.0009 secs.
  75% in 0.0013 secs.
  90% in 0.0019 secs.
  95% in 0.0025 secs.
  99% in 0.0054 secs.
  99.9% in 0.0136 secs.

@ahrtr
Copy link
Member Author

ahrtr commented Aug 9, 2024

cc @fuweid @jmhbnz @serathius @qixiaoyang0 @ranandfigma @caesarxuchao

This PR resolves the performance issue with a minimum change.

@codecov-commenter
Copy link

codecov-commenter commented Aug 9, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 68.85%. Comparing base (df4e472) to head (3aa072a).
Report is 207 commits behind head on main.

Current head 3aa072a differs from pull request most recent head b8b0cf8

Please upload reports for the commit b8b0cf8 to get more accurate results.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
Files with missing lines Coverage Δ
server/etcdserver/raft.go 87.16% <100.00%> (+0.42%) ⬆️
server/etcdserver/server.go 81.71% <100.00%> (+0.04%) ⬆️

... and 23 files with indirect coverage changes

@@            Coverage Diff             @@
##             main   #18428      +/-   ##
==========================================
+ Coverage   68.83%   68.85%   +0.02%     
==========================================
  Files         420      420              
  Lines       35475    35490      +15     
==========================================
+ Hits        24420    24438      +18     
+ Misses       9631     9621      -10     
- Partials     1424     1431       +7     

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update df4e472...b8b0cf8. Read the comment docs.

…beat

Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>
Copy link
Contributor

@ah8ad3 ah8ad3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementation looks clean. LGTM
I have one question: Imagine a scenario where leadership is lost due to two candidates fighting over with similar terms and equal votes.
Does that mean after merging this the new vote will be start with more delay? And possibly the cluster will be down?

I know this scenario is hard to produce i'm asking to find out the behavior.

@ahrtr
Copy link
Member Author

ahrtr commented Aug 9, 2024

I have one question: Imagine a scenario where leadership is lost due to two candidates fighting over with similar terms and equal votes.
Does that mean after merging this the new vote will be start with more delay? And possibly the cluster will be down?

This PR has nothing to do with any raft's protocol. It just tries to avoid unnecessary ensureLeadership check, so as to avoid unnecessary performance reduce.

Copy link

@caesarxuchao caesarxuchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the quick fix!

Copy link
Member

@jmhbnz jmhbnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - Nice work @ahrtr

Copy link
Member

@ivanvc ivanvc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks, Benjamin.

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahrtr, caesarxuchao, ivanvc, jmhbnz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ahrtr
Copy link
Member Author

ahrtr commented Aug 14, 2024

Thanks all for the review.

@ahrtr ahrtr deleted the improve_leaseRenew_20240809 branch August 14, 2024 20:36
@@ -904,10 +904,26 @@ func (s *EtcdServer) revokeExpiredLeases(leases []*lease.Lease) {
})
}

// isActive checks if the etcd instance is still actively processing the
// heartbeat message (ticks). It returns false if no heartbeat has been
// received within 3 * tickMs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 3 election ticks? I think the correct value is somewhere between TickMs (default 100ms) and ElectionMs (default 1s), meaning the 3 makes sense for default config, but I would argue it can be problematic for cases where ElectionMS < 3* TicksMs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 3 election ticks?

Read #18428 (comment).

I would argue it can be problematic for cases where ElectionMS < 3* TicksMs

I am not worry about this.

  • checking activity should be only related to tickMs.
  • also if ElectionMS < 3* TicksMs, it means it's a very inappropriate election value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also if ElectionMS < 3* TicksMs, it means it's a very inappropriate election value.

Then let's codify that is is inappropriate. Either working or failing validation.

// ensureLeadership checks whether current member is still the leader.
func (s *EtcdServer) ensureLeadership() bool {
lg := s.Logger()

if s.isActive() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect assumption for 5 node clusters, getting a tick from 1 member is not effective way to confirm no other leader was elected. Imagine 3/2 network split with leader on side of 2. With your change the leader can continue think it's active by receiving ticks from 1 member, where on the other side of network 3 members have already elected new leader.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has nothing to do with how many nodes the cluster has. The intention is to guard the case the node being stuck by itself, e.g stall write.

Imagine 3/2 network split with leader on side of 2. With your change the leader can continue think it's active by receiving ticks from 1 member

It's active, but it won't be a leader any more. It will step down automatically in such case due to losing quorum.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the raft protocol handle the network partition case perfectly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's active, but it won't be a leader any more.

We are talking about 1 second before network partition, until heath probes fail for ElectionMs, the old leader will still think it's a leader before they resign.

@serathius
Copy link
Member

The proposed solution is not compatible with etcd configuration options and doesn't work for 5 node clusters. Please consider reverting and redesign.

@ahrtr
Copy link
Member Author

ahrtr commented Aug 15, 2024

The proposed solution is not compatible with etcd configuration options and doesn't work for 5 node clusters. Please consider reverting and redesign.

Don't agree. Pls see my comments above. The only minor possible improvement is to guard the threshold < electionTimeout, but I am not worry about it as mentioned in #18428 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Performance Issue: CPU usage of the leader node increased by 20%
8 participants