Implement HA E2e for downgrades #13696

serathius · 2022-02-14T12:03:08Z

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

ptabor · 2022-02-21T13:35:34Z

tests/e2e/cluster_downgrade_test.go

 	prefixArgs := []string{e2e.CtlBinPath, "--endpoints", strings.Join(epc.EndpointsV3(), ",")}
-	t.Log("Write keys to ensure wal snapshot is created so cluster version set is snapshotted")
+	var err error
 	e2e.ExecuteWithTimeout(t, 20*time.Second, func() {
 		for i := 0; i < 10; i++ {


In https://github.com/etcd-io/etcd/pull/13686/files, you are removing snapshotCount from the newCluster method.

I assume this will cause this method to stop working...

Those two PRs will conflict as #13686 removes the need to set snapshotCount and write keys to force snapshot. It should not break the tests, I just will need to rebase the second change and fix the conflict.

tests/e2e/cluster_downgrade_test.go

ptabor · 2022-02-21T13:41:14Z

tests/e2e/cluster_downgrade_test.go

+	for i := 0; i < len(epc.Procs); i++ {
+		t.Logf("Upgrading member %d", i)
+		stopEtcd(t, epc.Procs[i])
+		startEtcd(t, epc.Procs[i], currentEtcdBinary)


Shell we validate here that cluster is not upgraded if quorum is not yet upgraded (1-st iteration),
and that cluster gets updated if quorum (including leader ?) get's upgraded ?

Yes, I considered it here before. One not that all (not quorum) members need to be upgraded before cluster version is updated. Cluster version indicates minimal version of server in cluster.

ptabor · 2022-02-21T13:47:09Z

tests/framework/e2e/cluster.go

@@ -491,3 +493,25 @@ func (epc *EtcdProcessCluster) WithStopSignal(sig os.Signal) (ret os.Signal) {
 	}
 	return ret
 }
+func (epc *EtcdProcessCluster) Leader(ctx context.Context) (EtcdProcess, error) {


nit: empty line to separate from import.

What warries me is that this method is relatively 'heavy' (3 connections made)
and it's easy to write very slow test that recaluculates 'leader' before each iteration.

Mitigations:

call it 'FindLeaderHeavy'

cache the leader as part of EtcdProcessCluster and have 2 methods: Leader() that RefreshLeader().
Leader would only call RefreshLeader when there is no populated cache...

Unit test owner would need to call RefreshLeader in all the cases when leader is expected to change.

I think heavy here is very relative. Creating 3 connections could be indeed considered heavy, however it is still lighter then all other methods that create whole process.

As this function is just needed for this test (we are looking for log generated by leader), I have removed it from Cluster.

codecov-commenter · 2022-02-23T15:21:39Z

Codecov Report

Merging #13696 (9c796b9) into main (6af7601) will decrease coverage by 0.17%.
The diff coverage is n/a.

❗ Current head 9c796b9 differs from pull request most recent head 963a78d. Consider uploading reports for the commit 963a78d to get more accurate results

@@            Coverage Diff             @@
##             main   #13696      +/-   ##
==========================================
- Coverage   72.76%   72.59%   -0.18%     
==========================================
  Files         467      467              
  Lines       38278    38278              
==========================================
- Hits        27854    27788      -66     
- Misses       8623     8681      +58     
- Partials     1801     1809       +8

Flag	Coverage Δ
all	`72.59% <ø> (-0.18%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
server/proxy/grpcproxy/register.go	`69.76% <0.00%> (-9.31%)`	⬇️
client/pkg/v3/tlsutil/tlsutil.go	`83.33% <0.00%> (-8.34%)`	⬇️
server/etcdserver/api/v3rpc/lease.go	`74.68% <0.00%> (-7.60%)`	⬇️
client/v3/leasing/util.go	`91.66% <0.00%> (-6.67%)`	⬇️
etcdctl/ctlv3/command/lease_command.go	`65.34% <0.00%> (-5.95%)`	⬇️
raft/rafttest/node.go	`95.00% <0.00%> (-5.00%)`	⬇️
client/v3/leasing/cache.go	`87.77% <0.00%> (-5.00%)`	⬇️
server/storage/mvcc/watchable_store.go	`85.50% <0.00%> (-3.63%)`	⬇️
server/etcdserver/util.go	`85.71% <0.00%> (-3.18%)`	⬇️
client/v3/experimental/recipes/key.go	`75.34% <0.00%> (-2.74%)`	⬇️
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6af7601...963a78d. Read the comment docs.

…cleaner

serathius · 2022-05-06T10:13:00Z

Would like to have this merged to cleanup my backlog. This is mostly test change so it is safe.
cc @ptabor @ahrtr @spzala

serathius mentioned this pull request Feb 14, 2022

Plans for v3.6.0 release #13538

Open

serathius force-pushed the ha-e2e branch 2 times, most recently from c2b754c to 95a49f8 Compare February 14, 2022 22:03

ptabor reviewed Feb 21, 2022

View reviewed changes

serathius force-pushed the ha-e2e branch 3 times, most recently from 9a088c6 to 963a78d Compare February 23, 2022 14:47

serathius force-pushed the ha-e2e branch from 963a78d to 2d4fa58 Compare April 6, 2022 14:21

serathius added 8 commits May 6, 2022 12:09

tests: Use restart in downgrade tests

3f05f7d

tests: Simplify downgrade tests

923096a

tests: Allow sepecting member to in curl commands

75449c0

server: Move fatal calls out of internal function to make stacktrace …

7b04674

…cleaner

tests: Improve logging in downgrade tests

c9af75d

tests: Retry validating version during downgrades

c50fbb9

server: Implement HA downgrade test

a65f7da

tests: Apply suggestions from code-review

e292e5b

serathius force-pushed the ha-e2e branch from 2d4fa58 to e292e5b Compare May 6, 2022 10:10

ptabor approved these changes May 6, 2022

View reviewed changes

ptabor merged commit 02c0321 into etcd-io:main May 6, 2022

serathius deleted the ha-e2e branch June 15, 2023 20:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement HA E2e for downgrades #13696

Implement HA E2e for downgrades #13696

serathius commented Feb 14, 2022

ptabor Feb 21, 2022

serathius Feb 21, 2022

ptabor Feb 21, 2022

serathius Feb 21, 2022

ptabor Feb 21, 2022

serathius Feb 21, 2022

codecov-commenter commented Feb 23, 2022

serathius commented May 6, 2022

Implement HA E2e for downgrades #13696

Implement HA E2e for downgrades #13696

Conversation

serathius commented Feb 14, 2022

ptabor Feb 21, 2022

Choose a reason for hiding this comment

serathius Feb 21, 2022

Choose a reason for hiding this comment

ptabor Feb 21, 2022

Choose a reason for hiding this comment

serathius Feb 21, 2022

Choose a reason for hiding this comment

ptabor Feb 21, 2022

Choose a reason for hiding this comment

serathius Feb 21, 2022

Choose a reason for hiding this comment

codecov-commenter commented Feb 23, 2022

Codecov Report

serathius commented May 6, 2022