Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release: v19.2.3 #44455

Closed
19 tasks done
miretskiy opened this issue Jan 28, 2020 · 7 comments
Closed
19 tasks done

release: v19.2.3 #44455

miretskiy opened this issue Jan 28, 2020 · 7 comments
Assignees

Comments

@miretskiy
Copy link
Contributor

miretskiy commented Jan 28, 2020

Candidate 2353f82

Nightly Suite:
https://teamcity.cockroachdb.com/viewLog.html?buildId=1719820&buildTypeId=Cockroach_Nightlies_NightlySuite

Deployment Dashboards

Release process checklist

Prep date: WIP fill in date to roll out: usually Monday, a week before the release WIP

One day after prep date:

Release date: Feb 4 or 5th (moving 1 day forward)

@miretskiy miretskiy self-assigned this Jan 28, 2020
@miretskiy
Copy link
Contributor Author

miretskiy commented Jan 30, 2020

Outdated, see signoff list below.

Roachtest failures: https://teamcity.cockroachdb.com/viewLog.html?buildId=1717648&buildTypeId=Cockroach_Nightlies_NightlySuite

Signoffs

bulkio

- [ ] schemachange/mixed/tpcc
- [ ] cdc/tpcc-1000/rangefeed=true

kv

- [ ] tpcc/headroom/n4cpu16
- [ ] overload/tpcc_olap/nodes=3/cpu=8/w=50/c=96
- [ ] tpccbench/nodes=6/cpu=16/multi-az
- [ ] tpcc/mixed-headroom/n5cpu16

appdev

- [ ] django
- [ ] gopg
- [ ] jepsen/monotonic/split
- [ ] sqlalchemy
- [ ] TestRandomSyntaxSQLSmith

@miretskiy
Copy link
Contributor Author

miretskiy commented Jan 30, 2020

Restarting Release. New candidate sha: 2353f82

@nathanstilwell
Copy link
Contributor

nathanstilwell commented Feb 4, 2020

Roachtest failures: https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_NightlySuite/1719820

Signoffs

KV

  • tpcc/headroom/n4cpu16 - log
  • jepsen/monotonic/split - log
  • overload/tpcc_olap/nodes=3/cpu=8/w=50/c=96 - log
  • tpccbench/nodes=6/cpu=16/multi-az - log
  • tpcc/mixed-headroom/n5cpu16 - log

appdev

cdc

  • cdc/tpcc-1000/rangefeed=true - log

sql-schema

  • schemachange/mixed/tpcc - log

@ajwerner
Copy link
Contributor

ajwerner commented Feb 4, 2020

@nvanbenschoten is going to look further into the cdc rangefeed crasher. For context the issue is a node died. Here's the panic:

panic: resolved timestamp 1580429579.909877007,0 equal to or above timestamp of operation {<nil> txn_id:0b8710be-7217-4e65-ac35-7b4cf5eb5e51 txn_key:"\303\211\367\001\247\221\210" txn_min_timestamp:<wall_time:1580429574307327120 > timestamp:<wall_time:1580429574307327120 >  <nil> <nil> <nil> <nil>} [recovered]
	panic: resolved timestamp 1580429579.909877007,0 equal to or above timestamp of operation {<nil> txn_id:0b8710be-7217-4e65-ac35-7b4cf5eb5e51 txn_key:"\303\211\367\001\247\221\210" txn_min_timestamp:<wall_time:1580429574307327120 > timestamp:<wall_time:1580429574307327120 >  <nil> <nil> <nil> <nil>}
goroutine 569858 [running]:
panic(0x3b17820, 0xc017f527e0)
	/usr/local/go/src/runtime/panic.go:565 +0x2c5 fp=0xc07e973458 sp=0xc07e9733c8 pc=0x78d495
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc0005aafc0, 0x4a752a0, 0xc030f170b0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:181 +0x121 fp=0xc07e9734b8 sp=0xc07e973458 pc=0x12d17d1
runtime.call32(0x0, 0x427bb00, 0xc052f7dd90, 0x1800000018)
	/usr/local/go/src/runtime/asm_amd64.s:519 +0x3b fp=0xc07e9734e8 sp=0xc07e9734b8 pc=0x7bb50b
panic(0x3b17820, 0xc017f527e0)
	/usr/local/go/src/runtime/panic.go:522 +0x1b5 fp=0xc07e973578 sp=0xc07e9734e8 pc=0x78d385
github.com/cockroachdb/cockroach/pkg/storage/rangefeed.(*resolvedTimestamp).assertOpAboveRTS(0xc03795a380, 0x0, 0xc0324b91d0, 0x0, 0x0, 0x0, 0x0, 0x15eed053091c2c90, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/rangefeed/resolved_timestamp.go:250 +0x16f fp=0xc07e973610 sp=0xc07e973578 pc=0x1b409cf
github.com/cockroachdb/cockroach/pkg/storage/rangefeed.(*resolvedTimestamp).consumeLogicalOp(0xc03795a380, 0x0, 0xc0324b91d0, 0x0, 0x0, 0x0, 0x0, 0x773885)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/rangefeed/resolved_timestamp.go:145 +0x448 fp=0xc07e9736e8 sp=0xc07e973610 pc=0x1b402e8
github.com/cockroachdb/cockroach/pkg/storage/rangefeed.(*resolvedTimestamp).ConsumeLogicalOp(0xc03795a380, 0x0, 0xc0324b91d0, 0x0, 0x0, 0x0, 0x0, 0x8)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/rangefeed/resolved_timestamp.go:131 +0x4d fp=0xc07e973738 sp=0xc07e9736e8 pc=0x1b3fe3d
github.com/cockroachdb/cockroach/pkg/storage/rangefeed.(*Processor).consumeLogicalOps(0xc03795a2c0, 0x4a751e0, 0xc02335f3c0, 0xc03a51ed80, 0x3, 0x4)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/rangefeed/processor.go:523 +0x19c fp=0xc07e973820 sp=0xc07e973738 pc=0x1b3c8ec
github.com/cockroachdb/cockroach/pkg/storage/rangefeed.(*Processor).consumeEvent(0xc03795a2c0, 0x4a751e0, 0xc02335f3c0, 0xc03a51ed80, 0x3, 0x4, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/rangefeed/processor.go:472 +0x21f fp=0xc07e973888 sp=0xc07e973820 pc=0x1b3c71f
github.com/cockroachdb/cockroach/pkg/storage/rangefeed.(*Processor).Start.func1(0x4a752a0, 0xc030f170b0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/rangefeed/processor.go:241 +0xa9c fp=0xc07e973f78 sp=0xc07e973888 pc=0x1b4344c
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker.func1(0xc026fa0a20, 0xc0005aafc0, 0xc030f17080)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:196 +0xfb fp=0xc07e973fc8 sp=0xc07e973f78 pc=0x12d396b
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1337 +0x1 fp=0xc07e973fd0 sp=0xc07e973fc8 pc=0x7bd221
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:189 +0xa8

@nvanbenschoten
Copy link
Member

I was remembering correctly. We also saw this in #43056 (comment):

cdc/tpcc-1000/rangefeed=true
We saw this fail with the following panic:
panic: resolved timestamp 1576278178.164280366,1 equal to or above timestamp of operation {<nil> txn_id:71aadccd-f7f3-4129-9657-78dd88c76fbc txn_key:"\304\211\367\001\222\221\210" txn_min_timestamp:<wall_time:1576278175277997537 > timestamp:<wall_time:1576278175277997537 > <nil> <nil> <nil> <nil>}
I recently saw this a few times when stressing #43121 and it went away with @tbg's recent fix in #42939. We've also seen it before with these cdc roachtests. I feel comfortable signing off on it because it's not a recent regression.

The bug that #42939 fixed was present in 19.2 and we haven't backported that change, so I think it's likely that we're seeing that issue. So we have two options here:

  1. ignore this crash - it's not new in v19.2.3.
  2. backport storage: respect closed timestamp in tryReproposeWithNewLeaseIndex #42939 and include that in this release.

I think that change is small and safe enough to backport, but I'd like to get @tbg to sign off on that as well. Was there a reason we didn't backport it before? The backport was mentioned in #42939 (review).

@andreimatei
Copy link
Contributor

I've checked off all the "KV" ones because I didn't see anything interesting in any of the failures.

@nathanstilwell
Copy link
Contributor

schemachange/mixed/tpcc sign off here -- https://cockroachlabs.slack.com/archives/CARM0PTK4/p1580919731008700

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants