Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downgrade cluster setversion fatal error #13365

Closed
LeoYang90 opened this issue Sep 23, 2021 · 2 comments
Closed

Downgrade cluster setversion fatal error #13365

LeoYang90 opened this issue Sep 23, 2021 · 2 comments
Labels

Comments

@LeoYang90
Copy link

LeoYang90 commented Sep 23, 2021

The steps to produce the bug:

Step 1

Start master code with 3.6.0-pre version

{"level":"info","ts":"2021-09-23T06:03:07.722Z","caller":"embed/etcd.go:308","msg":"starting an etcd server","etcd-version":"3.6.0-pre","git-sha":"13cd383","go-version":"go1.16.6","go-os":"linux","go-arch":"amd64","downgrade-check-interval":"5s"}

Step 2

Downgrade to 3.5.0 from 3.6.0-pre

{"level":"info","ts":"2021-09-23T14:31:31.258+0800","caller":"membership/cluster.go:762","msg":"The server is ready to downgrade","target-version":"3.5.0","server-version":"3.6.0-pre"}
{"level":"warn","ts":"2021-09-23T14:31:31.259+0800","caller":"version/monitor.go:148","msg":"remotes server has mismatching etcd version","remote-member-id":"8e9e05c52164694d","current-server-version":"3.6.0","go-os":"darwin","go-arch":"amd64","max-cpu-set":8,"max-cpu-available":8,"member-initialized":true,"name":"default","data-dir":"default.etcd","wal-dir":"","wal-dir-dedicated":"","member-dir":"default.etcd/member","force-new-cluster":false,"heartbeat-interval":"100ms","election-timeout":"1s","initial-election-tick-advance":true,"snapshot-count":100000,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["http://localhost:2380"],"listen-peer-urls":["http://localhost:2380"],"advertise-client-urls":["http://localhost:2379"],"listen-client-urls":["http://localhost:2379"],"listen-metrics-urls":[],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"","initial-cluster-state":"new","initial-cluster-token":"","quota-size-bytes":2147483648,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","auto-compaction-mode":"periodic","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","downgrade-check-interval":"5s"}

Step 3

Start release-3.5.0

{"level":"info","ts":"2021-09-23T14:36:28.820+0800","caller":"embed/etcd.go:307","msg":"starting an etcd server","etcd-version":"3.5.0","git-sha":"Not provided (use ./build instead of go build)","go-version":"go1.16.2","go-os":"darwin","go-arch":"amd64","max-cpu-set":8,"max-cpu-available":8,"member-initialized":true,"name":"default","data-dir":"default.etcd","wal-dir":"","wal-dir-dedicated":"","member-dir":"default.etcd/member","force-new-cluster":false,"heartbeat-interval":"100ms","election-timeout":"1s","initial-election-tick-advance":true,"snapshot-count":100000,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["http://localhost:2380"],"listen-peer-urls":["http://localhost:2380"],"advertise-client-urls":["http://localhost:2379"],"listen-client-urls":["http://localhost:2379"],"listen-metrics-urls":[],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"","initial-cluster-state":"new","initial-cluster-token":"","quota-size-bytes":2147483648,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","auto-compaction-mode":"periodic","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","downgrade-check-interval":"5s"}

{"level":"info","ts":"2021-09-23T14:36:36.072+0800","caller":"membership/downgrade.go:50","msg":"cluster is downgrading to target version","target-cluster-version":"3.5.0","determined-cluster-version":"3.6","current-server-version":"3.5.0"}
{"level":"info","ts":"2021-09-23T14:36:36.072+0800","caller":"api/capability.go:75","msg":"enabled capabilities for version","cluster-version":"3.6"}

{"level":"info","ts":"2021-09-23T14:36:38.522+0800","caller":"membership/downgrade.go:50","msg":"cluster is downgrading to target version","target-cluster-version":"3.5.0","determined-cluster-version":"3.6","current-server-version":"3.5.0"}
{"level":"warn","ts":"2021-09-23T14:36:38.523+0800","caller":"etcdserver/util.go:166","msg":"apply request took too long","took":"2.354064467s","expected-duration":"100ms","prefix":"","request":"ID:7587857361149586436 Method:\"PUT\" Path:\"/0/version\" Val:\"3.6.0\" ","response":""}

{"level":"info","ts":"2021-09-23T14:36:40.311+0800","caller":"membership/cluster.go:837","msg":"The server is ready to downgrade","target-version":"3.5.0","server-version":"3.5.0"}
{"level":"warn","ts":"2021-09-23T14:36:40.313+0800","caller":"etcdserver/util.go:166","msg":"apply request took too long","took":"1.78869471s","expected-duration":"100ms","prefix":"","request":"header:<ID:7587857361149586438 > downgrade_info_set:<enabled:true ver:\"3.5.0\" > ","response":""}

{"level":"info","ts":"2021-09-23T14:36:40.476+0800","caller":"etcdserver/server.go:2481","msg":"updating cluster version using v2 API","from":"3.6","to":"3.5"}
{"level":"info","ts":"2021-09-23T14:36:41.165+0800","caller":"membership/cluster.go:523","msg":"updated cluster version","cluster-id":"cdf818194e3a8c32","local-member-id":"8e9e05c52164694d","from":"3.6","to":"3.5"}

{"level":"info","ts":"2021-09-23T14:36:42.766+0800","caller":"api/capability.go:75","msg":"enabled capabilities for version","cluster-version":"3.5"}
{"level":"warn","ts":"2021-09-23T14:36:42.766+0800","caller":"etcdserver/util.go:166","msg":"apply request took too long","took":"2.247252695s","expected-duration":"100ms","prefix":"","request":"ID:7587857361235437828 Method:\"PUT\" Path:\"/0/version\" Val:\"3.5.0\" ","response":""}

{"level":"info","ts":"2021-09-23T14:36:42.766+0800","caller":"etcdserver/server.go:2500","msg":"cluster version is updated","cluster-version":"3.5"}
{"level":"info","ts":"2021-09-23T14:36:42.766+0800","caller":"etcdserver/server.go:2573","msg":"the cluster has been downgraded","cluster-version":"3.5.0"}
{"level":"warn","ts":"2021-09-23T14:36:44.461+0800","caller":"etcdserver/util.go:166","msg":"apply request took too long","took":"1.694203871s","expected-duration":"100ms","prefix":"","request":"header:<ID:7587857361235437830 > downgrade_info_set:<> ","response":""}

Step 4

Back to 3.6.0-pre

{"level":"info","ts":"2021-09-23T14:44:01.803+0800","caller":"embed/etcd.go:308","msg":"starting an etcd server","etcd-version":"3.6.0-pre","git-sha":"Not provided (use ./build instead of go build)","go-version":"go1.16.2","go-os":"darwin","go-arch":"amd64","max-cpu-set":8,"max-cpu-available":8,"member-initialized":true,"name":"default","data-dir":"default.etcd","wal-dir":"","wal-dir-dedicated":"","member-dir":"default.etcd/member","force-new-cluster":false,"heartbeat-interval":"100ms","election-timeout":"1s","initial-election-tick-advance":true,"snapshot-count":100000,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["http://localhost:2380"],"listen-peer-urls":["http://localhost:2380"],"advertise-client-urls":["http://localhost:2379"],"listen-client-urls":["http://localhost:2379"],"listen-metrics-urls":[],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"","initial-cluster-state":"new","initial-cluster-token":"","quota-size-bytes":2147483648,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","auto-compaction-mode":"periodic","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","downgrade-check-interval":"5s"}

{"level":"info","ts":"2021-09-23T14:44:04.401+0800","caller":"api/capability.go:76","msg":"enabled capabilities for version","cluster-version":"3.5"}
{"level":"info","ts":"2021-09-23T14:44:04.401+0800","caller":"membership/cluster.go:282","msg":"set cluster version from store","cluster-version":"3.5"}

{"level":"info","ts":"2021-09-23T14:44:04.495+0800","caller":"etcdserver/server.go:560","msg":"starting etcd server","local-member-id":"8e9e05c52164694d","local-server-version":"3.6.0-pre","cluster-id":"cdf818194e3a8c32","cluster-version":"3.5"}

{"level":"info","ts":"2021-09-23T14:44:06.368+0800","caller":"api/capability.go:76","msg":"enabled capabilities for version","cluster-version":"3.6"}
{"level":"warn","ts":"2021-09-23T14:44:06.369+0800","caller":"etcdserver/util.go:166","msg":"apply request took too long","took":"1.872048495s","expected-duration":"100ms","prefix":"","request":"ID:7587857361149586436 Method:\"PUT\" Path:\"/0/version\" Val:\"3.6.0\" ","response":""}

{"level":"info","ts":"2021-09-23T14:44:08.887+0800","caller":"membership/cluster.go:762","msg":"The server is ready to downgrade","target-version":"3.5.0","server-version":"3.6.0-pre"}
{"level":"warn","ts":"2021-09-23T14:44:08.889+0800","caller":"etcdserver/util.go:166","msg":"apply request took too long","took":"2.518755058s","expected-duration":"100ms","prefix":"","request":"header:<ID:7587857361149586438 > downgrade_info_set:<enabled:true ver:\"3.5.0\" > ","response":""}

{"level":"info","ts":"2021-09-23T14:44:09.356+0800","caller":"membership/cluster.go:520","msg":"updated cluster version","cluster-id":"cdf818194e3a8c32","local-member-id":"8e9e05c52164694d","from":"3.6","to":"3.5"}


{"level":"fatal","ts":"2021-09-23T14:44:11.161+0800","caller":"membership/downgrade.go:59","msg":"invalid downgrade; server version is not allowed to join when downgrade is enabled","current-server-version":"3.6.0-pre","target-cluster-version":"3.5.0","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver/api/membership.mustDetectDowngrade\n\t/Users/leo/AliCoding/etcd/server/etcdserver/api/membership/downgrade.go:59\ngo.etcd.io/etcd/server/v3/etcdserver/api/membership.(*RaftCluster).SetVersion\n\t/Users/leo/AliCoding/etcd/server/etcdserver/api/membership/cluster.go:537\ngo.etcd.io/etcd/server/v3/etcdserver.(*applierV2store).Put\n\t/Users/leo/AliCoding/etcd/server/etcdserver/apply_v2.go:101\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyV2Request\n\t/Users/leo/AliCoding/etcd/server/etcdserver/apply_v2.go:135\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyEntryNormal\n\t/Users/leo/AliCoding/etcd/server/etcdserver/server.go:1910\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).apply\n\t/Users/leo/AliCoding/etcd/server/etcdserver/server.go:1842\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyEntries\n\t/Users/leo/AliCoding/etcd/server/etcdserver/server.go:1083\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyAll\n\t/Users/leo/AliCoding/etcd/server/etcdserver/server.go:905\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).run.func8\n\t/Users/leo/AliCoding/etcd/server/etcdserver/server.go:837\ngo.etcd.io/etcd/pkg/v3/schedule.(*fifo).run\n\t/Users/leo/AliCoding/etcd/pkg/schedule/schedule.go:157"}
Exiting.

RaftLog:

Index 4

image

STACK

go.etcd.io/etcd/server/v3/etcdserver/api/membership.(*RaftCluster).SetDowngradeInfo at cluster.go:759
go.etcd.io/etcd/server/v3/etcdserver.(*applierV3backend).DowngradeInfoSet at apply.go:966
go.etcd.io/etcd/server/v3/etcdserver.(*applierV3backend).Apply at apply.go:157
<autogenerated>:2
go.etcd.io/etcd/server/v3/etcdserver.(*authApplierV3).Apply at apply_auth.go:61
go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyEntryNormal at server.go:1932
go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).apply at server.go:1842
go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyEntries at server.go:1083
go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyAll at server.go:905
go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).run.func8 at server.go:837
go.etcd.io/etcd/pkg/v3/schedule.(*fifo).run at schedule.go:157
runtime.goexit at asm_amd64.s:1371
 - Async stack trace
go.etcd.io/etcd/pkg/v3/schedule.NewFIFOScheduler at schedule.go:70

SetDowngradeInfo

func (c *RaftCluster) SetDowngradeInfo(d *DowngradeInfo, shouldApplyV3 ShouldApplyV3) {
	c.Lock()
	defer c.Unlock()

	if c.be != nil && shouldApplyV3 {
		c.be.MustSaveDowngradeToBackend(d)
	}

	c.downgradeInfo = d

	...
}

Index 7

image

STACK

go.etcd.io/etcd/server/v3/etcdserver/api/membership.mustDetectDowngrade at downgrade.go:59
go.etcd.io/etcd/server/v3/etcdserver/api/membership.(*RaftCluster).SetVersion at cluster.go:537
go.etcd.io/etcd/server/v3/etcdserver.(*applierV2store).Put at apply_v2.go:101
go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyV2Request at apply_v2.go:135
go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyEntryNormal at server.go:1910
go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).apply at server.go:1842
go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyEntries at server.go:1083
go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyAll at server.go:905
go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).run.func8 at server.go:837
go.etcd.io/etcd/pkg/v3/schedule.(*fifo).run at schedule.go:157
runtime.goexit at asm_amd64.s:1371
 - Async stack trace
go.etcd.io/etcd/pkg/v3/schedule.NewFIFOScheduler at schedule.go:70

SetVersion:

func (c *RaftCluster) SetVersion(ver *semver.Version, onSet func(*zap.Logger, *semver.Version), shouldApplyV3 ShouldApplyV3) {
	...
	oldVer := c.version
	c.version = ver
	mustDetectDowngrade(c.lg, c.version, c.downgradeInfo)
}

func mustDetectDowngrade(lg *zap.Logger, cv *semver.Version, d *DowngradeInfo) {
	lv := semver.Must(semver.NewVersion(version.Version))
	// only keep major.minor version for comparison against cluster version
	lv = &semver.Version{Major: lv.Major, Minor: lv.Minor}

	// if the cluster enables downgrade, check local version against downgrade target version.
	if d != nil && d.Enabled && d.TargetVersion != "" {
		if lv.Equal(*d.GetTargetVersion()) {
			if cv != nil {
				lg.Info(
					"cluster is downgrading to target version",
					zap.String("target-cluster-version", d.TargetVersion),
					zap.String("determined-cluster-version", version.Cluster(cv.String())),
					zap.String("current-server-version", version.Version),
				)
			}
			return
		}
		lg.Fatal(
			"invalid downgrade; server version is not allowed to join when downgrade is enabled",
			zap.String("current-server-version", version.Version),
			zap.String("target-cluster-version", d.TargetVersion),
		)
	}

	...
}

DB:

default.etcd.zip

@serathius
Copy link
Member

Just looking into the issue. Unfortunately downgrade is still in development and is expected to be broken. I'm working on implementing downgrades in #13168. I don't think we should fix an panic in as the core feature is still not working. Feel free to reach out to me on the issue I linked, I'm happy to discuss progress of the work and how we can collaborate.

@stale
Copy link

stale bot commented Dec 24, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Dec 24, 2021
@stale stale bot closed this as completed Jan 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging a pull request may close this issue.

2 participants