Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfc: add an RFC describing an improved client.Txn interface #42766

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

andreimatei
Copy link
Contributor

... allowing for concurrent use of a txn.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, @knz, @nvanbenschoten, and @tbg)


docs/RFCS/20191125_txn_interface_improvements.md, line 53 at r1 (raw file):

  requesting a refresh
* the `TxnCoordSender` successfully refreshes all the requests prior to *r* and
  *r1* (note that r*'s read spans have not yet been recorded) and bumps the

note that r


docs/RFCS/20191125_txn_interface_improvements.md, line 188 at r1 (raw file):

An alternative to the concurrency part is to simply dissallow concurrency. We
have write pipelining on the server-side, which is already a form of

Write pipelining is a much less effective form of concurrency when the leaseholder is in a remote region.

I think it’s worth looking at different cases where write concurrency on top of pipelining might occur.

Two that come to mind are

  • The implementation of some schema changes

Currently they perform multiple batches to write to different system tables. These tables rarely have a collocated leaseholder in global deployments.

  • CTEs with multiple mutations

These will be executed sequentially IIUC.

In both these cases it might be even better to fuse separate write batches into a single batch than to run batches in parallel (up to some point). However, especially for the first case, fusing batches makes it difficult to craft abstractions which operate on a client.Txn.

I’m not sure I have a concrete proposal re: fusing batches as it feels like a radically different programming model. Enabling parallel writes seems more approachable.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for putting these thoughts in writing. I recognize your proposal from a month or two ago. I like the API you propose too, and 💯 on aligning the proposal with the savepoints RFC.

Reviewed 1 of 1 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, @nvanbenschoten, and @tbg)


docs/RFCS/20191125_txn_interface_improvements.md, line 19 at r1 (raw file):

concurrent actors before restarting.

# Motivation

I think some xrefs with the TCS tech note would be useful in this section.


docs/RFCS/20191125_txn_interface_improvements.md, line 128 at r1 (raw file):

### Refreshes

I think the alternatives in this section would benefit from sequence diagrams akin to those in the tech note. The visual sequencing would clarify the prosaic explanation.


docs/RFCS/20191125_txn_interface_improvements.md, line 183 at r1 (raw file):

allow further operations on the handle that encountered the retriable error.
However, this seems likely to confuse users.

The TCS tech note identifies various kinds of errors, not all of which are retry errors. What are the non-retry errors that would keep a txnhandle able to process further requests?


docs/RFCS/20191125_txn_interface_improvements.md, line 187 at r1 (raw file):

## Rationale and Alternatives

An alternative to the concurrency part is to simply dissallow concurrency. We

nit: disallow


docs/RFCS/20191125_txn_interface_improvements.md, line 200 at r1 (raw file):

behavior of concurrent overlapping reads and writes is undefined. Is there a
reason to allow a particular handle to be associated with a different read
sequence number so it can be isolated from concurrent overlapping writes?

I could see the constraint checks for multiple previous CTEs in a single statement run concurrently with each other at different timestamps? This would be an optimization of course, but maybe it's desirable. I'd ask Radu.


docs/RFCS/20191125_txn_interface_improvements.md, line 206 at r1 (raw file):

extant concurrent actors , whereas the point of leaves is to allow txn state to
be disseminated to remote nodes, updated , and collected back. So I don't think
there's much to unify, but perhaps someone can open my eyes.

I'd say that a Leaf is a particular kind of txnhandle, one that can do fewer things. But it probably should implement the interface.


docs/RFCS/20191125_txn_interface_improvements.md, line 208 at r1 (raw file):

there's much to unify, but perhaps someone can open my eyes.
3. Should we take the opportunity and make a `client.Txn` unusable after a
`TransactionAbortedError`? The current code is pretty surprising - the

yes!

Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @nvanbenschoten, and @tbg)


docs/RFCS/20191125_txn_interface_improvements.md, line 142 at r1 (raw file):

recorded and their response is returned to the client), then we can refresh.
We’d also block new requests from starting. The problem here is that, if one of
these concurrent requests causes the refresh to fail, it’s too to retry that

s/too/too late/


docs/RFCS/20191125_txn_interface_improvements.md, line 144 at r1 (raw file):

these concurrent requests causes the refresh to fail, it’s too to retry that
request at the new read timestamp (since the result was already delivered to the
client). This seems suboptimal, but it’s also analogous the handling of all the

Just to be explicit, the downside here is that we'd return a user-visible retry error, right? Would option 2 or 3 be able to avoid that?

... allowing for concurrent use of a txn.

Release note: None
@bobvawter
Copy link
Member

bobvawter commented Mar 10, 2020

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
X-noremind Bots won't notify about PRs with X-noremind
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants