-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
engine: Handle put when all writes are ignored and there are no committed values #44697
Conversation
The following sequence currently triggers an assertion error inside `mvcc.go`, where it should succeed without error instead: 1. start txn 2. put at seq = 10 3. ignore seqs 5-15 4. put at seq = 20 The error is: `previous intent of the transaction with the same epoch not found for ...` (mvcc.go:1650) See the enclosed test case. Release note: none
@itsbilal @nvanbenschoten your input/opinion would be appreciated |
This is falling into roughly the same situation as cockroach/pkg/storage/engine/mvcc.go Lines 1593 to 1596 in 5224918
To see this, let's remember the partial equivalence between a transaction restart (increasing its epoch) and a transaction rollback (adding to its IgnoredSeqNums). Here, we have a txn overwriting a key that it has previously written. However, the previous value is now ignored, so it doesn't want to move the existing value into its intent history. In the txn restart case, we wipe the entire sequence history, because we know it's all now ignored. In this case, we don't necessarily want to wipe the entire sequence history, but we should avoid adding the existing value to the sequence history when we overwrite it. So the change we want is basically just (though it should be cleaned up): diff --git a/pkg/storage/engine/mvcc.go b/pkg/storage/engine/mvcc.go
index 595b51ad3e..325aaa2ef1 100644
--- a/pkg/storage/engine/mvcc.go
+++ b/pkg/storage/engine/mvcc.go
@@ -1642,15 +1642,9 @@ func mvccPutInternal(
// If the epoch of the transaction doesn't match the epoch of the
// intent, blow away the intent history.
if txn.Epoch == meta.Txn.Epoch {
- if existingVal == nil {
- // This case shouldn't pop up, but it is worth asserting
- // that it doesn't. We shouldn't write invalid intents
- // to the history.
- return errors.Errorf(
- "previous intent of the transaction with the same epoch not found for %s (%+v)",
- metaKey, txn)
+ if existingVal != nil {
+ buf.newMeta.AddToIntentHistory(prevIntentSequence, prevIntentValBytes)
}
- buf.newMeta.AddToIntentHistory(prevIntentSequence, prevIntentValBytes)
} else {
buf.newMeta.IntentHistory = nil
} |
Thanks Nathan for the early analysis. @itsbilal what do you think? Also Nathan what do you mean with "it needs to be cleaned up"? |
Nathan's overview makes sense to me - we shouldn't be throwing an error in this case, instead just not writing the existing value in the intent history. If I'm reading correctly, we should also be safe to just blow up the intent history just like we do with txn restarts; the only case where we'll have cockroach/pkg/storage/engine/mvcc.go Lines 1542 to 1564 in 5224918
I'm assuming, by "cleanup", Nathan is referring to updating comments in |
Yes, by cleanup I just meant that the comment will need to be updated. There might also be a better way to structure the code after this change. I'll leave that for you to decide. |
Ok thanks for the insights. Just this discussion is sufficient to unblock other parts of my work, but I'd prefer if @itsbilal could own this PR and bring it to completion with a fix. (And maybe extend the test suite to include other types of MVCC operations than Puts to check we have all paths covered) |
The second commit fixes the issue and updates the test that Raphael wrote. This is ready for a review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but don't take my word for it
Reviewed 2 of 2 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten, @petermattis, and @sumeerbhola)
pkg/storage/engine/testdata/mvcc_histories/put_after_rollback, line 26 at r2 (raw file):
txn_ignore_seqs seqs=(5-35) txn_step seq=40 del v=b
I'd like to see more tests with other op types (cput at least)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz, @petermattis, and @sumeerbhola)
pkg/storage/engine/mvcc.go, line 1542 at r2 (raw file):
been rolled back, either due to transaction retries or transaction savepoint rollbacks.
pkg/storage/engine/mvcc.go, line 1639 at r2 (raw file):
writeTimestamp = metaTimestamp } // Since an intent with a smaller sequence number exists for the
Is there a reason why this logic is split from the // It's possible that the existing value is nil if the intent on the key
logic? That's part of what made this unclear.
40e3e48
to
bcb03aa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz, @nvanbenschoten, @petermattis, and @sumeerbhola)
pkg/storage/engine/mvcc.go, line 1542 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
been rolled back, either due to transaction retries or transaction savepoint rollbacks.
Done.
pkg/storage/engine/mvcc.go, line 1639 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Is there a reason why this logic is split from the
// It's possible that the existing value is nil if the intent on the key
logic? That's part of what made this unclear.
No reason really. Good point; I'll move that down as well.
pkg/storage/engine/testdata/mvcc_histories/put_after_rollback, line 26 at r2 (raw file):
Previously, knz (kena) wrote…
I'd like to see more tests with other op types (cput at least)
Done. Added cput test. They all use mvccPutInternal
in the end so it shouldn't change too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz, @petermattis, and @sumeerbhola)
pkg/storage/engine/mvcc.go, line 1542 at r2 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Done.
Thanks. While you're here, do you mind wrapping this entire comment to 80 chars?
pkg/storage/engine/mvcc.go, line 1644 at r3 (raw file):
// transaction are ignored, and there are no past committed values at this // key. if txn.Epoch == meta.Txn.Epoch && existingVal != nil {
I just realized that this isn't equivalent to what we had in #44697 (comment). Specifically, we're now going to wipe the intent history if the existingVal
is nil.
That reveals something I think we have wrong here in both attempts. existingVal
isn't the existing provisional value. Instead, it's the latest non-reverted value in the entire sequence history. It doesn't make sense that we'd be adding something from the sequence history back into the sequence history. We only want to add something here if the current provisional value is not ignored. So the semantics we want are something like:
curProvVal := ... // not existingVal
if txn.Epoch == meta.Txn.Epoch {
if !enginepb.TxnSeqIsIgnored(meta.Txn.Sequence, txn.IgnoredSeqNums) {
buf.newMeta.AddToIntentHistory(meta.Txn.Sequence, curProvVal)
}
} else {
buf.newMeta.IntentHistory = nil
}
bcb03aa
to
b0c0cdc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz, @nvanbenschoten, @petermattis, and @sumeerbhola)
pkg/storage/engine/mvcc.go, line 1542 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Thanks. While you're here, do you mind wrapping this entire comment to 80 chars?
Done.
pkg/storage/engine/mvcc.go, line 1644 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I just realized that this isn't equivalent to what we had in #44697 (comment). Specifically, we're now going to wipe the intent history if the
existingVal
is nil.That reveals something I think we have wrong here in both attempts.
existingVal
isn't the existing provisional value. Instead, it's the latest non-reverted value in the entire sequence history. It doesn't make sense that we'd be adding something from the sequence history back into the sequence history. We only want to add something here if the current provisional value is not ignored. So the semantics we want are something like:curProvVal := ... // not existingVal if txn.Epoch == meta.Txn.Epoch { if !enginepb.TxnSeqIsIgnored(meta.Txn.Sequence, txn.IgnoredSeqNums) { buf.newMeta.AddToIntentHistory(meta.Txn.Sequence, curProvVal) } } else { buf.newMeta.IntentHistory = nil }
Yes, that difference from your comment is intentional, and something I observed and expanded on in my comment. Basically, if all provisional writes are ignored including the current one, we should be blowing up the intent history. We don't have to do it, but it's a minor performance win. This is the epochs-matching but existingVal == nil
.
I've copied the semantics in the example you provided - thanks for that! Only difference is I still retained the existingValue != nil
in the conditional. PTAL!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @knz, @petermattis, and @sumeerbhola)
pkg/storage/engine/mvcc.go, line 1644 at r3 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Yes, that difference from your comment is intentional, and something I observed and expanded on in my comment. Basically, if all provisional writes are ignored including the current one, we should be blowing up the intent history. We don't have to do it, but it's a minor performance win. This is the epochs-matching but
existingVal == nil
.I've copied the semantics in the example you provided - thanks for that! Only difference is I still retained the
existingValue != nil
in the conditional. PTAL!
Thanks! Can you construct a test case where the previous bug would have resulted in incorrect behavior? Mind adding that to a mvcc_histories
test?
pkg/storage/engine/mvcc.go, line 1551 at r4 (raw file):
// Aliases existingVal when the current provisional value is readable // and not rolled back (due to a savepoint rollback or txn restart). var curProvVal *roachpb.Value
nit: consider changing this to a bool (currentProvisionalNotIgnored
or something less verbose but with the same meaning) and using existingVal
below when this is true
.
…nored Updates mvccPutInternal to account for the case where all past writes by this transaction at a key have been rolled back, and there was no past committed value. This is denoted by txn.Epoch == meta.txn.Epoch, and existingVal == nil (even after the inconsistent mvccGetInternal). The intent history should be cleared in this case. Also updates comments in the function to account for this exception. Release note: None.
b0c0cdc
to
6c21735
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz, @nvanbenschoten, @petermattis, and @sumeerbhola)
pkg/storage/engine/mvcc.go, line 1644 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Thanks! Can you construct a test case where the previous bug would have resulted in incorrect behavior? Mind adding that to a
mvcc_histories
test?
Done. I couldn't think of an example where it causes a genuine correctness issue (assuming the set of ignored sequence numbers can only increase), but I wrote one case where it was adding repeated values as well as already-committed values to the ignore list.
There were no correctness issues because any erroneous additions to the ignore list were happening when the current write was already ignored, so meta.Txn.Sequence
was on the ignore list. That value would never have surfaced unless that sequence number got un-ignored.
pkg/storage/engine/mvcc.go, line 1551 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: consider changing this to a bool (
currentProvisionalNotIgnored
or something less verbose but with the same meaning) and usingexistingVal
below when this istrue
.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r5.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz, @petermattis, and @sumeerbhola)
pkg/storage/engine/mvcc.go, line 1644 at r3 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Done. I couldn't think of an example where it causes a genuine correctness issue (assuming the set of ignored sequence numbers can only increase), but I wrote one case where it was adding repeated values as well as already-committed values to the ignore list.
There were no correctness issues because any erroneous additions to the ignore list were happening when the current write was already ignored, so
meta.Txn.Sequence
was on the ignore list. That value would never have surfaced unless that sequence number got un-ignored.
Thanks. You can take or leave this last suggestion - either way this LGTM. I'm realizing that clearing that intent history now depends on whether there are any other versions from other txns picked up in the if existingVal == nil {
branch above. That's pretty strange, even if it won't actually cause any correctness issues.
This whole decision of what to do with the intent history is difficult because we're poking at it from so far away. It might just be easier to move the AddToHistory
up to where you currently have curProvNotIgnored = true
and the IntentHistory = nil
up to exactly where you want it; an else branch of if txn.Epoch == meta.Txn.Epoch /* last write inside txn */ {
and an else branch of if prevValueWritten {
. That will make this all much more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @petermattis)
pkg/storage/engine/mvcc.go, line 1540 at r5 (raw file):
last write outside txn
what does this phrase mean? Is it synonymous with different epoch since the metadata we found had the same txn ID?
I find this function very confusing, and have a very basic question:
- we have read an intent MVCCMetadata for this transaction
- is it correct to say that meta.Timestamp is guaranteed to refer to the provisional value, so we can just read it (why the complicated call to mvccGetInternal())?
- so first decide whether we need to (a) read it or not, and do the read if needed (b) delete it or not (not if the new provisional value will be at the same timestamp).
- then construct the new MVCCMetadata based on the old MVCCMetadata, new write, ignoredSeqNums and the read value. This could be a separate function -- it doesn't need an iterator.
- in a batch: write the new MVCCMetadata; write the provisional value delete if needed; write the new provisional value.
I don't claim the above is simpler, but mainly wanting to figure out what nuances I am missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reviews!
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz, @petermattis, and @sumeerbhola)
pkg/storage/engine/mvcc.go, line 1644 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Thanks. You can take or leave this last suggestion - either way this LGTM. I'm realizing that clearing that intent history now depends on whether there are any other versions from other txns picked up in the
if existingVal == nil {
branch above. That's pretty strange, even if it won't actually cause any correctness issues.This whole decision of what to do with the intent history is difficult because we're poking at it from so far away. It might just be easier to move the
AddToHistory
up to where you currently havecurProvNotIgnored = true
and theIntentHistory = nil
up to exactly where you want it; an else branch ofif txn.Epoch == meta.Txn.Epoch /* last write inside txn */ {
and an else branch ofif prevValueWritten {
. That will make this all much more explicit.
I also find it weird that existingVal == nil
is being used to denote blowing up the intent history, but it is correct, and the comments should make it easier to understand.
My only issue with making the moves you suggested would be that it'd move the intent history change before the valueFn
execution, and the comments around that call specifically state that valueFn
should be called before the provisional value is moved around. If I do move the AddToIntentHistory call up, and valueFn
returns an error, there's a chance we could have the provisional value lying both in the intent history as well as in the mvcc keyspace. I'll leave it as-is. Thanks for the review!
pkg/storage/engine/mvcc.go, line 1540 at r5 (raw file):
Previously, sumeerbhola wrote…
last write outside txn
what does this phrase mean? Is it synonymous with different epoch since the metadata we found had the same txn ID?
I find this function very confusing, and have a very basic question:
- we have read an intent MVCCMetadata for this transaction
- is it correct to say that meta.Timestamp is guaranteed to refer to the provisional value, so we can just read it (why the complicated call to mvccGetInternal())?
- so first decide whether we need to (a) read it or not, and do the read if needed (b) delete it or not (not if the new provisional value will be at the same timestamp).
- then construct the new MVCCMetadata based on the old MVCCMetadata, new write, ignoredSeqNums and the read value. This could be a separate function -- it doesn't need an iterator.
- in a batch: write the new MVCCMetadata; write the provisional value delete if needed; write the new provisional value.
I don't claim the above is simpler, but mainly wanting to figure out what nuances I am missing.
That's a very good question, and something that also tripped me up at first. Basically, mvccGetInternal
, unlike the full scanner backed MVCCGet()
, is not aware of sequence numbers at all. If we just issue the mvccGetInternal
read rightaway, we might get a value that has now been rolled back. That's why the call to that function is complicated, and could happen multiple times in the same method.
The MVCCMetadata construction is currently wound up in the buf *putBuffer
for efficiency, if you want to take a look.
All of this should already be happening in a batch in practice. The actual management of batches happens in KV land , and the Put Raft command only accepts a "ReadWriter" as its input, so while it could also be an engine, we should probably not be doing batch management at this level.
44697: engine: Handle put when all writes are ignored and there are no committed values r=itsbilal a=knz (This is really an issue, filed as a PR because it contains a test case in the right place) The following sequence currently triggers an assertion error inside `mvcc.go`, where it should succeed without error instead: 1. start txn 2. put at seq = 10 3. ignore seqs 5-15 4. put at seq = 20 The error is: `previous intent of the transaction with the same epoch not found for ...` (mvcc.go:1650) See the enclosed test case. Release note: none Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net> Co-authored-by: Bilal Akhtar <bilal@cockroachlabs.com>
Build succeeded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @knz, and @petermattis)
pkg/storage/engine/mvcc.go, line 1540 at r5 (raw file):
Basically, mvccGetInternal, unlike the full scanner backed MVCCGet(), is not aware of sequence numbers at all. If we just issue the mvccGetInternal read rightaway, we might get a value that has now been rolled back. That's why the call to that function is complicated, and could happen multiple times in the same method.
Sorry about being slow here, but I still don't understand. If we have an intent it must point to the provisional value, so we can do a Get()
for a specific key.
The MVCCMetadata construction is currently wound up in the buf *putBuffer for efficiency, if you want to take a look.
All of this should already be happening in a batch in practice. The actual management of batches happens in KV land , and the Put Raft command only accepts a "ReadWriter" as its input, so while it could also be an engine, we should probably not be doing batch management at this level.
I should not have said "batch" since I didn't mean the code-level abstraction -- just substitute Writer.Put()
for what I said.
My original comment was asking whether one could restructure the code that has found an intent to something like:
var existingValue *roachpb.Value
if needToPlaceExistingValueInHistory(meta, txn) {
existingValue = iter.GetValue(MakeKey(key, meta.Timestamp)) // assuming a simple GetValue() method that gets the value at a particular MVCCKey
}
overwrite := meta.Timestamp == writeTimestamp
if !overwrite {
writer.Clear(MakeKey(key, meta.Timestamp)
}
if existingValue != nil {
meta = fixHistory(meta, txn, existingValue)
}
meta = updateMetaForLatestWrite(meta, txn) // updates Txn, Timestamp, KeyBytes, ValueBytes.
writer.Put(MakeKey(key, 0), meta)
writer.Put(MakeKey(key, writeTimestamp), value)
I am overlooking the complexity introduced by MVCCStats, but longer term is it possible to simplify? I am not suggesting doing that soon, and just trying to understand the possibilities -- we will make changes when we introduce the segregated lock table so that could be a good time to do extensive refactoring.
pkg/storage/engine/mvcc.go, line 1540 at r5 (raw file):
The rollback sequence numbers are specified in Your simplified overview seems to make sense to me, and is something we can consider down the line for sure. I'll have to compare that side-by-side with the current long-winded implementation to see if it's missing anything. There's definitely some historic cruft in the current implementation, but given how critical this code is, we've all been hesitant to refactor it significantly lest it breaks something. And this PR is a perfect example of how we've tried to shoehorn new functionality into existing paradigms instead of completely rethinking this code. |
(This is really an issue, filed as a PR because it contains a test case in the right place)
The following sequence currently triggers an assertion error inside
mvcc.go
, where it should succeed without error instead:The error is:
previous intent of the transaction with the same epoch not found for ...
(mvcc.go:1650)See the enclosed test case.
Release note: none