Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement HA E2e for downgrades #13696

Merged
merged 8 commits into from
May 6, 2022
Merged

Implement HA E2e for downgrades #13696

merged 8 commits into from
May 6, 2022

Conversation

serathius
Copy link
Member

prefixArgs := []string{e2e.CtlBinPath, "--endpoints", strings.Join(epc.EndpointsV3(), ",")}
t.Log("Write keys to ensure wal snapshot is created so cluster version set is snapshotted")
var err error
e2e.ExecuteWithTimeout(t, 20*time.Second, func() {
for i := 0; i < 10; i++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In https://github.com/etcd-io/etcd/pull/13686/files, you are removing snapshotCount from the newCluster method.

I assume this will cause this method to stop working...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those two PRs will conflict as #13686 removes the need to set snapshotCount and write keys to force snapshot. It should not break the tests, I just will need to rebase the second change and fix the conflict.

tests/e2e/cluster_downgrade_test.go Outdated Show resolved Hide resolved
tests/e2e/cluster_downgrade_test.go Outdated Show resolved Hide resolved
for i := 0; i < len(epc.Procs); i++ {
t.Logf("Upgrading member %d", i)
stopEtcd(t, epc.Procs[i])
startEtcd(t, epc.Procs[i], currentEtcdBinary)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shell we validate here that cluster is not upgraded if quorum is not yet upgraded (1-st iteration),
and that cluster gets updated if quorum (including leader ?) get's upgraded ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I considered it here before. One not that all (not quorum) members need to be upgraded before cluster version is updated. Cluster version indicates minimal version of server in cluster.

@@ -491,3 +493,25 @@ func (epc *EtcdProcessCluster) WithStopSignal(sig os.Signal) (ret os.Signal) {
}
return ret
}
func (epc *EtcdProcessCluster) Leader(ctx context.Context) (EtcdProcess, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: empty line to separate from import.

What warries me is that this method is relatively 'heavy' (3 connections made)
and it's easy to write very slow test that recaluculates 'leader' before each iteration.

Mitigations:

  1. call it 'FindLeaderHeavy'
  2. cache the leader as part of EtcdProcessCluster and have 2 methods: Leader() that RefreshLeader().
    Leader would only call RefreshLeader when there is no populated cache...

Unit test owner would need to call RefreshLeader in all the cases when leader is expected to change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think heavy here is very relative. Creating 3 connections could be indeed considered heavy, however it is still lighter then all other methods that create whole process.

As this function is just needed for this test (we are looking for log generated by leader), I have removed it from Cluster.

@codecov-commenter
Copy link

Codecov Report

Merging #13696 (9c796b9) into main (6af7601) will decrease coverage by 0.17%.
The diff coverage is n/a.

❗ Current head 9c796b9 differs from pull request most recent head 963a78d. Consider uploading reports for the commit 963a78d to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##             main   #13696      +/-   ##
==========================================
- Coverage   72.76%   72.59%   -0.18%     
==========================================
  Files         467      467              
  Lines       38278    38278              
==========================================
- Hits        27854    27788      -66     
- Misses       8623     8681      +58     
- Partials     1801     1809       +8     
Flag Coverage Δ
all 72.59% <ø> (-0.18%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
server/proxy/grpcproxy/register.go 69.76% <0.00%> (-9.31%) ⬇️
client/pkg/v3/tlsutil/tlsutil.go 83.33% <0.00%> (-8.34%) ⬇️
server/etcdserver/api/v3rpc/lease.go 74.68% <0.00%> (-7.60%) ⬇️
client/v3/leasing/util.go 91.66% <0.00%> (-6.67%) ⬇️
etcdctl/ctlv3/command/lease_command.go 65.34% <0.00%> (-5.95%) ⬇️
raft/rafttest/node.go 95.00% <0.00%> (-5.00%) ⬇️
client/v3/leasing/cache.go 87.77% <0.00%> (-5.00%) ⬇️
server/storage/mvcc/watchable_store.go 85.50% <0.00%> (-3.63%) ⬇️
server/etcdserver/util.go 85.71% <0.00%> (-3.18%) ⬇️
client/v3/experimental/recipes/key.go 75.34% <0.00%> (-2.74%) ⬇️
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6af7601...963a78d. Read the comment docs.

@serathius
Copy link
Member Author

Would like to have this merged to cleanup my backlog. This is mostly test change so it is safe.
cc @ptabor @ahrtr @spzala

@ptabor ptabor merged commit 02c0321 into etcd-io:main May 6, 2022
@serathius serathius deleted the ha-e2e branch June 15, 2023 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants