Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: new tech note on TxnCoordSender #42116

Merged
merged 1 commit into from
Nov 26, 2019

Conversation

knz
Copy link
Contributor

@knz knz commented Nov 1, 2019

I am splitting off this tech note from #41569 so it can become a document that can be consumed standalone.

Direct link for easier consumption/review:
https://github.com/knz/cockroach/blob/20191101-tcs-tech-note/docs/tech-notes/txn_coord_sender.md

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz
Copy link
Contributor Author

knz commented Nov 1, 2019

@tbg you wanted to have a look, I think this is mature enough for you to read (and dump additional knowledge into if you have it).

@nvanbenschoten I acknowledge that this tech note, as-is, is not doing justice to all the logic and intelligence present in the various interceptors. If you have reference links or extra contents to add, I'll take it.

@knz
Copy link
Contributor Author

knz commented Nov 1, 2019

TODO(self):

  • integrate the discussion on WriteTooOldError from the mailing list.

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice write up. I added a few comments.

Reviewed 19 of 19 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @knz, and @nvanbenschoten)


docs/tech-notes/txn_coord_sender.md, line 453 at r1 (raw file):

object and the TCS to become unusable. Indeed, there is no good reason
for that. It is actually silly, as a SQL client may legitimately want
to continue using the txn object after detecting a logical error (eg

I wouldn't call this silly, just poor design. Whenever we extract information from an error that counts as a read, we have to make sure that read is accounted for by the KV store. For example, a ConditionFailedError from a CPut is in fact a successful read; if this isn't accounted for properly at the KV layer (timestamp cache update), there will be anomalies (CPut has code to do exactly that here). I generally find it unfortunate that we're relying on errors to return what are really logical results, and I hope that we buy out of that as much as we can. For CPut, for example, we'd have ConditionalPutResponse carry a flag that tells us the actual value and whether a write was carried out. I suspect we're using errors for some of these only because errors eagerly halt the processing of the current batch.
Continuing past errors generally needs a good amount of idempotency (for example, getting a timeout during a CPut and retrying the CPut without seqnos could read-your-own-write). We had no way of doing that prior to the intent history and seqnos.

By the way, in case you haven't stumbled upon this yet, the txn span refresher has a horrible contract with DistSender, where DistSender returns a partially populated response batch on errors, from which the refresher then picks out spans for its refresh set. I'm wondering how this hasn't backfired yet.


docs/tech-notes/txn_coord_sender.md, line 486 at r1 (raw file):

  complexity to process writes, concurrency would be limited by this
  counter, until/unless we can guarantee that two separate TCSs
  generate separate operation identifiers.

I wonder if we also need their read/write spans to be non-overlapping. There's all sort of weird stuff if they do overlap, though maybe it's not actually illegal. Either way, not something we're going to do today or tomorrow.


docs/tech-notes/txn_coord_sender.md, line 503 at r1 (raw file):

  then at later instant t2 is augmented by a LeafTxn whose
  state was part of its "past" using the original txn object.
  How to combine the states from the two "generations" of the txn object?

The real problem I see here is similar to allowing leaves to write. You have to reason about the semantics in case the spans accessed overlap. If we've answered that, I'm not aware of any specific problem with updating the state in the way you're describing. As long as there is a distinguished error handling actor, this looks just like what txn.Update already does (except we need to merge the whole state, like refresh spans, etc). Curious if you had a concrete problem in mind.
PS I'm totally fine not wanting to change this restriction, just working on my understanding.


docs/tech-notes/txn_coord_sender.md, line 549 at r1 (raw file):

- read operations operate "at" a particular seqnum: a MVCC read that
  encounters an intent ignores the values written at later seqnums
  and returns the most recent value at that seqnum instead.

Isn't this one-offy? If I want to retry a CPut idempotently (because the first one got context canceled, who knows), I'd want it to not see its own seqno.

@knz
Copy link
Contributor Author

knz commented Nov 4, 2019

If I want to retry a CPut idempotently (because the first one got context canceled, who knows), I'd want it to not see its own seqno.

I just realized it's worse than that, the cput has to ignore seqnums for its condition too (it shouldn't see a rolled back value when testing the condition).

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz and @nvanbenschoten)


docs/tech-notes/txn_coord_sender.md, line 384 at r1 (raw file):

     its "identity" (ID, epoch) is unchanged.

     For example, txn refreshes are processes automatically

are processed


docs/tech-notes/txn_coord_sender.md, line 413 at r1 (raw file):

  This group contains 3 kinds of errors:

  1. *transaction aborts* (`TransactionAbortedError`), which occurs when

TransactionAbortedError trashes the TCS, but not the client.Txn. The client.Txn will recognize the error an instantiate a new TCS.


docs/tech-notes/txn_coord_sender.md, line 461 at r1 (raw file):

inside the TCS may become different "on the way out" (back to
client.Txn and the SQL layer) from what it was "on the way in". It is
currently the responsibility of the client (SQL layer), which may have

What are you referring to here? What state is SQL picking up on?

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it, ConditionalPut demonstrates something we missed when introducing seqnos: commands that combine both a read and a write operation need two seqnums, one for its reads and one for its writes. Isn't the correct thing to use seqno-1 for the read portion? And of course, like any other read, it also needs to ignore the rolled back sequences, as you point out. I don't see a major problem here, just something we overlooked so far.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz and @nvanbenschoten)

Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the correct thing to use seqno-1 for the read portion?

Yes Nathan has convinced me this is true. Added a note to this effect.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @nvanbenschoten, and @tbg)


docs/tech-notes/txn_coord_sender.md, line 384 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

are processed

Done.


docs/tech-notes/txn_coord_sender.md, line 413 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

TransactionAbortedError trashes the TCS, but not the client.Txn. The client.Txn will recognize the error an instantiate a new TCS.

Your prompt invited me to investigate more and I found out the situation is more complicated than that. There's in fact two categories of abort errors.


docs/tech-notes/txn_coord_sender.md, line 453 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I wouldn't call this silly, just poor design. Whenever we extract information from an error that counts as a read, we have to make sure that read is accounted for by the KV store. For example, a ConditionFailedError from a CPut is in fact a successful read; if this isn't accounted for properly at the KV layer (timestamp cache update), there will be anomalies (CPut has code to do exactly that here). I generally find it unfortunate that we're relying on errors to return what are really logical results, and I hope that we buy out of that as much as we can. For CPut, for example, we'd have ConditionalPutResponse carry a flag that tells us the actual value and whether a write was carried out. I suspect we're using errors for some of these only because errors eagerly halt the processing of the current batch.
Continuing past errors generally needs a good amount of idempotency (for example, getting a timeout during a CPut and retrying the CPut without seqnos could read-your-own-write). We had no way of doing that prior to the intent history and seqnos.

By the way, in case you haven't stumbled upon this yet, the txn span refresher has a horrible contract with DistSender, where DistSender returns a partially populated response batch on errors, from which the refresher then picks out spans for its refresh set. I'm wondering how this hasn't backfired yet.

Added your comments in the tech note.


docs/tech-notes/txn_coord_sender.md, line 461 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

What are you referring to here? What state is SQL picking up on?

Added comment to clarify.


docs/tech-notes/txn_coord_sender.md, line 486 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I wonder if we also need their read/write spans to be non-overlapping. There's all sort of weird stuff if they do overlap, though maybe it's not actually illegal. Either way, not something we're going to do today or tomorrow.

Added note.


docs/tech-notes/txn_coord_sender.md, line 503 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

The real problem I see here is similar to allowing leaves to write. You have to reason about the semantics in case the spans accessed overlap. If we've answered that, I'm not aware of any specific problem with updating the state in the way you're describing. As long as there is a distinguished error handling actor, this looks just like what txn.Update already does (except we need to merge the whole state, like refresh spans, etc). Curious if you had a concrete problem in mind.
PS I'm totally fine not wanting to change this restriction, just working on my understanding.

My opinion is that this is the wrong problem to solve. These complex questions, which may or may not have good answers, should not be necessary to answer in the first place. Enabling concurrency between Root and Leafs on the same spans is just a horrible mess.


docs/tech-notes/txn_coord_sender.md, line 549 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Isn't this one-offy? If I want to retry a CPut idempotently (because the first one got context canceled, who knows), I'd want it to not see its own seqno.

Done.

@knz
Copy link
Contributor Author

knz commented Nov 26, 2019

Going to merge this as it now has enough content to base further work on.

Also it's needed to ground the discussion on #42766.

bors r+

craig bot pushed a commit that referenced this pull request Nov 26, 2019
42116: docs: new tech note on TxnCoordSender r=knz a=knz

I am splitting off this tech note from #41569 so it can become a document that can be consumed standalone.

Direct link for easier consumption/review:
https://github.com/knz/cockroach/blob/20191101-tcs-tech-note/docs/tech-notes/txn_coord_sender.md

Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
@craig
Copy link
Contributor

craig bot commented Nov 26, 2019

Build succeeded

@craig craig bot merged commit ee398a5 into cockroachdb:master Nov 26, 2019
@knz knz deleted the 20191101-tcs-tech-note branch November 26, 2019 15:54
Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @knz, @nvanbenschoten, and @tbg)


docs/tech-notes/txn_coord_sender.md, line 413 at r1 (raw file):

Previously, knz (kena) wrote…

Your prompt invited me to investigate more and I found out the situation is more complicated than that. There's in fact two categories of abort errors.

Are there? I'm not sure what you're referring to here, I think it's wrong. AOST read under GC doesn't have anything to do with a TransactionAbortedError, does it?
TransactionAbortedError means one thing - some other transaction blew away some of our intents.


docs/tech-notes/txn_coord_sender.md, line 417 at r2 (raw file):

     internal CRDB mechanism, but not due to a logical
     error. Intuitively, this corresponds to errors due to the
     distributed nature of CRDB which would not occur in a

Deadlock between multiple transactions is the case where transaction aborts happen because of "user fault". That's analogous to single-node systems - there too sometimes transactions are aborted and forced to release their locks.

@knz
Copy link
Contributor Author

knz commented Nov 26, 2019

Can you suggest an extension to the text to make it better?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants