storage: respect closed timestamp in tryReproposeWithNewLeaseIndex #42939

tbg · 2019-12-04T12:51:35Z

Prior to this commit, tryReproposeWithNewLeaseIndex would not consult
with the closed timestamp tracker. This meant that a proposal may end up
with a new lease index at a timestamp that consumers of closed timestamp
updates had already considered immutable. In other words, this broke
closed timestamp guarantees and would by extension also have the
potential to break CDC's.

This path is fairly rare and it isn't one likely to cause problems
in practice, though it certainly could have.

See #42821 for details.

Fixes #42821

Release note (bug fix): fix a bug which could lead to follower reads or
CDC updates that did not reflect the full set of data at the timestamp.
This bug was never observed in practice and should be rare to cause
issues, one of the necessary ingredients being an aggressive closed
timestamp interval.

cockroach-teamcity · 2019-12-04T12:51:42Z

This change is

ajwerner

The basic idea makes sense. Good find and nice fix.

Reviewed 1 of 2 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @nvanbenschoten)

pkg/roachpb/batch.go, line 62 at r1 (raw file):

	if txn := ba.Txn; txn != nil {
		ts := txn.ReadTimestamp
		ts.Forward(txn.DeprecatedOrigTimestamp)

Can this be removed in 20.2?

pkg/storage/replica_application_result.go, line 214 at r1 (raw file):

	maxLeaseIndex, pErr := r.propose(ctx, p)
	if pErr != nil {
		log.Warningf(ctx, "failed to repropose with new lease index: %s", pErr)

Can you help me understand why you didn't want to log here and you did want to log above?

pkg/storage/replica_application_result.go, line 220 at r1 (raw file):

		// will go to the client, so send something it can digest.
		//
		// WIP(tbg): is this the right error?

This is a bummer. We want an error which will be retried in distsender. Not leaseholder error will do the trick but seems wrong. I think I’d be in favor of adding a new error over this.

tbg

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)

pkg/roachpb/batch.go, line 62 at r1 (raw file):

Previously, ajwerner wrote…

Can this be removed in 20.2?

I think so, but let's remove this thing in its own PR.

pkg/storage/replica_application_result.go, line 214 at r1 (raw file):

Previously, ajwerner wrote…

Can you help me understand why you didn't want to log here and you did want to log above?

I removed this and added logging at the caller, to avoid introducing extra log.Warningf for the new error return in this method.

pkg/storage/replica_application_result.go, line 220 at r1 (raw file):

Previously, ajwerner wrote…

This is a bummer. We want an error which will be retried in distsender. Not leaseholder error will do the trick but seems wrong. I think I’d be in favor of adding a new error over this.

Hmm - this needs a backport, so I don't think we should be adding a new error. The leaseholder error has a "custom message" field, I can use it to disambiguate this error should it pop up somewhere. Agreed that there's not much to love about returning this error. More generally though, there's lots that's not to love about this whole method in general. I chatted with @nvanbenschoten yesterday and we both thought that there's something we should do around handling ErrProposalDropped (#21849) and avoiding raft forwarding entirely (#37906). The idea being that illegal lease applied index && proposal hasn't applied yet just can't happen if you don't forward proposals. This needs some fleshing out, but I'd very much like to remove all of this reordering, and thus this subtle code, altogether. (The bug count for this code is >5).

tbg

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)

pkg/storage/replica_application_result.go, line 220 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Hmm - this needs a backport, so I don't think we should be adding a new error. The leaseholder error has a "custom message" field, I can use it to disambiguate this error should it pop up somewhere. Agreed that there's not much to love about returning this error. More generally though, there's lots that's not to love about this whole method in general. I chatted with @nvanbenschoten yesterday and we both thought that there's something we should do around handling ErrProposalDropped (#21849) and avoiding raft forwarding entirely (#37906). The idea being that illegal lease applied index && proposal hasn't applied yet just can't happen if you don't forward proposals. This needs some fleshing out, but I'd very much like to remove all of this reordering, and thus this subtle code, altogether. (The bug count for this code is >5).

Wrote up the comment: #21849 (comment)

ajwerner

How do we write tests to detect closed timestamp violations such as this one? Something that might show up had we set the closed timestamp interval to something short and then added some chaos to the raft transport perhaps?

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)

pkg/roachpb/batch.go, line 62 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I think so, but let's remove this thing in its own PR.

You need this while in 20.1.I think I was mostly just asking for a TODO to get rid of this forwarding in 20.2 but I suppose that will just fall out of the removal of that field.

pkg/storage/replica_application_result.go, line 214 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I removed this and added logging at the caller, to avoid introducing extra log.Warningf for the new error return in this method.

Ack.

pkg/storage/replica_application_result.go, line 220 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Wrote up the comment: #21849 (comment)

Thanks. Makes sense.

tbg

How do we write tests to detect closed timestamp violations such as this one? Something that might show up had we set the closed timestamp interval to something short and then added some chaos to the raft transport perhaps?

I think there are two ingredients we need:

an invariant that detects CT violations (i.e. something like: query the tracker in .propose to validate that the MLAI is tracked; and/or with each applied Raft command, verify that the MVCC snapshot being changed isn't immutable yet)
run through situations in which a violation is more likely.

Honestly to establish confidence that we have fixed this particular bug I would rather spend energy on simplifying our architecture so that we just handle things in a simpler way that doesn't even have this problem (i.e. there's only one path into Raft, and it "obviously" goes through the tracker). But even after that having some way to establish confidence that our closed timestamps are in fact properly closed does seem desirable. An always-on assertion to that end could be good, coupled with tests in which we run with an aggressive closed timestamp duration as you suggested.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)

ajwerner · 2019-12-09T15:48:05Z

Honestly to establish confidence that we have fixed this particular bug I would rather spend energy on simplifying our architecture so that we just handle things in a simpler way that doesn't even have this problem (i.e. there's only one path into Raft, and it "obviously" goes through the tracker).

Fair. Thanks for the thoughts on the testing strategy. I'll keep those in mind as I look towards moving some of the closed timestamp subsystem into the raft transport and work to make closed timestamps something more of a per-range concept than today's store level concept.

nvanbenschoten

Here's a test that indicates that this works: #43121. In general, the assertions in rangefeed are actually pretty good at catching these issues.

Reviewed 2 of 2 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @tbg)

pkg/roachpb/batch.go, line 59 at r1 (raw file):

// GetActiveTimestamp returns the timestamp at which the request will read.
func (ba *BatchRequest) GetActiveTimestamp() hlc.Timestamp {

Isn't ba.Timestamp set to this timestamp in SetActiveTimestamp? Why do we need this accessor?

pkg/storage/replica_application_result.go, line 223 at r1 (raw file):

		lhErr := roachpb.NewError(newNotLeaseHolderError(
			r.mu.state.Lease,
			cmd.proposal.command.ProposerReplica.StoreID,

I think I just removed this. If we're here then don't we know that this was proposed locally?

pkg/storage/replica_application_result.go, line 238 at r1 (raw file):

	// NB: The caller already promises that the lease check succeeded, meaning
	// the sequence numbers match, implying that the lease epoch hasn't changed
	// from what it was under the proposal-time lease.

This all checks out, but I still think it would look less troubling if we pulled the lease epoch off of cmd. Is that possible?

Prior to this commit, tryReproposeWithNewLeaseIndex would not consult with the closed timestamp tracker. This meant that a proposal may end up with a new lease index at a timestamp that consumers of closed timestamp updates had already considered immutable. In other words, this broke closed timestamp guarantees and would by extension also have the potential to break CDC's. This path is fairly rare and it isn't one likely to cause problems in practice, though it certainly could have. See cockroachdb#42821 for details. Fixes cockroachdb#42821 Release note (bug fix): fix a bug which could lead to follower reads or CDC updates that did not reflect the full set of data at the timestamp. This bug was never observed in practice and should be rare to cause issues, one of the necessary ingredients being an aggressive closed timestamp interval.

tbg

Neat, thanks for writing that test!

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)

pkg/storage/replica_application_result.go, line 223 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I think I just removed this. If we're here then don't we know that this was proposed locally?

Yes, we do. Updated.

pkg/storage/replica_application_result.go, line 238 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This all checks out, but I still think it would look less troubling if we pulled the lease epoch off of cmd. Is that possible?

It's not available there, or are you saying I should stash it on ProposalData? I agree that it'd be nice not to have to do mental yoga over the epoch.

nvanbenschoten

thanks for writing that test!

It wasn't meant to test this. So I should be thanking you for deflaking my test.

Reviewed 2 of 2 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @tbg)

pkg/storage/replica_application_result.go, line 238 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

It's not available there, or are you saying I should stash it on ProposalData? I agree that it'd be nice not to have to do mental yoga over the epoch.

Oh you need the epoch, not the sequence, never mind.

tbg · 2019-12-13T09:55:13Z

TFTR!

bors r=nvanbenschoten

craig · 2019-12-13T10:46:11Z

Build failed (retrying...)

GitHub CI (Cockroach)

craig · 2019-12-13T11:57:34Z

Build failed

GitHub CI (Cockroach)

tbg · 2019-12-13T12:08:46Z

bors r=nvanbenschoten

flake was #38428

42939: storage: respect closed timestamp in tryReproposeWithNewLeaseIndex r=nvanbenschoten a=tbg Prior to this commit, tryReproposeWithNewLeaseIndex would not consult with the closed timestamp tracker. This meant that a proposal may end up with a new lease index at a timestamp that consumers of closed timestamp updates had already considered immutable. In other words, this broke closed timestamp guarantees and would by extension also have the potential to break CDC's. This path is fairly rare and it isn't one likely to cause problems in practice, though it certainly could have. See #42821 for details. Fixes #42821 Release note (bug fix): fix a bug which could lead to follower reads or CDC updates that did not reflect the full set of data at the timestamp. This bug was never observed in practice and should be rare to cause issues, one of the necessary ingredients being an aggressive closed timestamp interval. Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>

craig · 2019-12-13T12:31:04Z

Build succeeded

GitHub CI (Cockroach)

I noticed that there weren't any storage-level integration tests that verified that an active rangefeed processor would push transactions that were holding up its resolved timestamp from progressing. This commit adds such a test. Periodically flakes with the following panic unless rebased on cockroachdb#42939: ``` panic: resolved timestamp 1576109922.890013000,0 equal to or above timestamp of operation {<nil> txn_id:d599eb68-cf29-4e15-90be-554b04f10a8b txn_key:"\275\211\211\210" txn_min_timestamp:<wall_time:1576109922877135000 > timestamp:<wall_time:1576109922883114000 logical:2 > <nil> <nil> <nil> <nil>} ``` Release note: None

43121: storage: create TestReplicaRangefeedPushesTransactions r=nvanbenschoten a=nvanbenschoten I noticed that there weren't any storage-level integration tests that verified that an active rangefeed processor would push transactions that were holding up its resolved timestamp from progressing. This commit adds such a test. Periodically flakes under stress with the following panic unless rebased on #42939: ``` panic: resolved timestamp 1576109922.890013000,0 equal to or above timestamp of operation {<nil> txn_id:d599eb68-cf29-4e15-90be-554b04f10a8b txn_key:"\275\211\211\210" txn_min_timestamp:<wall_time:1576109922877135000 > timestamp:<wall_time:1576109922883114000 logical:2 > <nil> <nil> <nil> <nil>} ``` Release note: None Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>

117801: kvserver: acquire `Replica.mu` when returning reproposal error r=erikgrinaker a=erikgrinaker Discovered while working on #117612. Looks like it's been there since #42939. This is a rare error, and these fields are unlikely to change while we're holding the raft mutex, so seems very unlikely to have caused any problems -- I could be convinced we shouldn't backport this, on the off-chance I've missed a potential deadlock. Epic: none Release note: None 117833: *: un-skip many tests that were skipped over the last month r=rail a=rickystewart This partially reverts the following commits: ``` e615aec 99884f8 99f9760 75c91c2 5c64a22 c1778b7 381c96b f9b20ae 8608499 eb71a83 337777f ``` These tests were skipped due to failures under RBE, typically under `deadlock` and `race`. With the addition of the `heavy` pool for these tests, we anticipate the decreased load will cause these tests to stop failing, therefore we un-skip the skipped tests. Epic: CRDB-8308 Release note: None Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com> Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>

tbg requested a review from nvanbenschoten December 4, 2019 12:51

tbg requested a review from ajwerner December 4, 2019 12:52

ajwerner reviewed Dec 4, 2019

View reviewed changes

tbg commented Dec 4, 2019

View reviewed changes

ajwerner reviewed Dec 4, 2019

View reviewed changes

tbg commented Dec 9, 2019

View reviewed changes

nvanbenschoten mentioned this pull request Dec 12, 2019

storage: create TestReplicaRangefeedPushesTransactions #43121

Merged

nvanbenschoten reviewed Dec 12, 2019

View reviewed changes

tbg force-pushed the lai branch from f8ee83a to 98dab93 Compare December 12, 2019 14:27

tbg force-pushed the lai branch from 98dab93 to 20600dc Compare December 12, 2019 14:32

tbg requested a review from nvanbenschoten December 12, 2019 14:32

tbg commented Dec 12, 2019

View reviewed changes

nvanbenschoten approved these changes Dec 12, 2019

View reviewed changes

craig bot merged commit 20600dc into cockroachdb:master Dec 13, 2019

tbg deleted the lai branch December 13, 2019 12:38

nvanbenschoten mentioned this pull request Dec 16, 2019

release: v20.1.0-alpha20191216 #43056

Closed

19 tasks

nvanbenschoten mentioned this pull request Feb 4, 2020

release: v19.2.3 #44455

Closed

19 tasks

danhhz mentioned this pull request Feb 5, 2020

kvnemesis: add a multi-node variant of the test #44769

Merged

erikgrinaker mentioned this pull request Jan 16, 2024

kvserver: acquire Replica.mu when returning reproposal error #117801

Merged

blathers-crl bot mentioned this pull request Jan 16, 2024

release-23.2: kvserver: acquire Replica.mu when returning reproposal error #117839

Merged

erikgrinaker mentioned this pull request Jan 22, 2024

release-23.1: kvserver: acquire Replica.mu when returning reproposal error #118071

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: respect closed timestamp in tryReproposeWithNewLeaseIndex #42939

storage: respect closed timestamp in tryReproposeWithNewLeaseIndex #42939

tbg commented Dec 4, 2019

cockroach-teamcity commented Dec 4, 2019

ajwerner left a comment

tbg left a comment

tbg left a comment

ajwerner left a comment

tbg left a comment

ajwerner commented Dec 9, 2019

nvanbenschoten left a comment

tbg left a comment

nvanbenschoten left a comment

tbg commented Dec 13, 2019

craig bot commented Dec 13, 2019

craig bot commented Dec 13, 2019

tbg commented Dec 13, 2019

craig bot commented Dec 13, 2019

storage: respect closed timestamp in tryReproposeWithNewLeaseIndex #42939

storage: respect closed timestamp in tryReproposeWithNewLeaseIndex #42939

Conversation

tbg commented Dec 4, 2019

cockroach-teamcity commented Dec 4, 2019

ajwerner left a comment

Choose a reason for hiding this comment

tbg left a comment

Choose a reason for hiding this comment

tbg left a comment

Choose a reason for hiding this comment

ajwerner left a comment

Choose a reason for hiding this comment

tbg left a comment

Choose a reason for hiding this comment

ajwerner commented Dec 9, 2019

nvanbenschoten left a comment

Choose a reason for hiding this comment

tbg left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

tbg commented Dec 13, 2019

craig bot commented Dec 13, 2019

Build failed (retrying...)

craig bot commented Dec 13, 2019

Build failed

tbg commented Dec 13, 2019

craig bot commented Dec 13, 2019

Build succeeded