Skip to content

Commit

Permalink
rfc: add an RFC describing an improved client.Txn interface
Browse files Browse the repository at this point in the history
... allowing for concurrent use of a txn.

Release note: None
  • Loading branch information
andreimatei committed Nov 27, 2019
1 parent d905be8 commit 45788cf
Showing 1 changed file with 240 additions and 0 deletions.
240 changes: 240 additions & 0 deletions docs/RFCS/20191125_txn_interface_improvements.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,240 @@
- Feature Name: client.Txn interface v2
- Status: draft
- Start Date: 2019-11-15
- Authors: Andrei
- RFC PR: (PR # after acceptance of initial draft)
- Cockroach Issue: [#25329](https://github.com/cockroachdb/cockroach/issues/25329) [#22615](https://github.com/cockroachdb/cockroach/issues/22615)

# Summary

This RFC discusses additions to the client.Txn interface that make it
thread-safe and otherwise safer. A new notion of a "transaction handle" is
introduced, allowing a sort of fork-join usage model for concurrent actors that
want to use a shared transaction. The client is put in charge of explicitly
restarting a transaction after retriable errors, instead of the current model
where the `client.Txn` restart automatically behind the scenes. Moving this
responsibility to the client makes it possible for the client to synchronize the
concurrent actors before restarting.
See also [this technote](
https://github.com/cockroachdb/cockroach/blob/master/docs/tech-notes/txn_coord_sender.md)
for more information on how transactions are implemented on the "client side".

# Motivation

The transaction (`client.Txn`) is the interface through which SQL and other
modules interact with the KV module. It is a foundational interface in our
codebase and, as such, it deserves to be clean, safe, and powerful. At the
moment, there’s two big problems with it:
1) It generally doesn’t support concurrent use. Performing concurrent operations
(in particular, sending a BatchRequest while another one is in flight) results
in an error. Historically, this didn't result in an error, but instead in data
corruption. This restriction has surprised people repeatedly (we even had two
tests dedicated to asserting that concurrent requests work; despite the
appearances, they did not work).
There's also good reasons for wanting to perform concurrent (parallel)
operations - for example validating foreign keys in the background.
2) It requires the user to synchronize its state with the txn state and, the
minute it fails to do so and the state get out of sync, tragedy follows (making
it unsafe). Ignoring a retriable error (because, for example, you want to ignore
any other error) is currently illegal. When the client.Txn returns a retriable
error to the client, it also restarts the transaction transparently. Failure by
the client to communicate this restart to a retry loop results in committing
half a transaction. This also was a cause of surprise for people.

These two issues stem from the same fact: transactions react internally to
retriable errors assuming that every request coming afterwards will pertain to
the next iteration of the transaction. This implies no concurrent use because,
if there was concurrent use, there'd be no way to distinguish requests intented
for the previous epoch versus requests intended for the new one).
In addition, our read span refreshing code does not currently handle
concurrency: if a requests causes a refresh, any other request straddling that
refresh is not properly refreshed. Consider the following scenario:
* An actor sends a request *r*.
* While the request is in flight, another request *r1* is sent and returns,
requesting a refresh.
* The `TxnCoordSender` successfully refreshes all the requests prior to *r* and
*r1* (note that *r*'s read spans have not yet been recorded) and bumps the
transaction's read timestamp.
* *r* returns. Its result cannot simply be returned to the client; it needs to
be refreshed or retried at the bumped timestamp. The code is not currently
equipped to do so.

There's a particular case where we do allow concurrent use of a txn: DistSQL
flows that can't be completely "fused" use a `LeafTxn`. Leaves allow concurrent
use by inhibiting transparent restarts and refreshes. Disabling refreshes is a
pity. In particular when the flow running entirely on the gateway, we'd like to
be able to refresh in the middle of the flow. (When the flow is distributed,
refreshing is trickier; see
[#24798](https://github.com/cockroachdb/cockroach/issues/24798). The changes
proposed here still move us in the right direction for supporting refreshes in
the middle of distributed queries too).

# Guide-level explanation

We're proposing an interface that supports concurrent use and is safe. We’ll
become safe by putting the client in charge of restarting transactions. We’ll
allow for concurrent use by not allowing the client to restart transactions
while it’s also performing requests that belong to the previous transaction
attempt.

The idea is to split the sets of operations that a transaction can perform in
two: operations that can be performed concurrently and operations that assume
exclusive control over the txn. The latter operations will continue to use
the `client.Txn` struct. The former will start using a new interface:
`client.TxnHandle`. The `client.Txn` will be coded defensively
such that most illegal uses are caught. Each handle is single-threaded, but
there can be multiple handles at the same time. The `client.Txn` keeps track of
how many extant handles there are at any time, and will refuse to do dangerous
things while there’s other handles out there. For backwards compatibility and
also simplicity of non-concurrent use cases, the `client.Txn` will also contain
the `client.TxnHandle` interface - so if you have a `client.Txn`, you can do
everything on it (single-threadedly).

The interfaces will look like this:

```go
client.TxnHandle:
Get(), Scan(), Put(), Delete(), etc.
Fork() -> returns client.TxnClosableHandle

client.Txn:
embeds client.TxnHandleImpl
Commit(), Rollback(), Step(), GetSavepoint(), RollbackToSavepoint()
Restart()

client.TxnClosableHandle:
embeds client.TxnHandle
Close()
```

The Txn will keep track of the number of extant handles - created via `Fork()`
but not yet `Close()d`. While there’s any, calling methods specific to the `Txn`
will error. This means that the client needs to synchronize (i.e. `Close()`) all
the handles before `Restart()`ing. Implementation-wise, all these interfaces are
wrappers on top of a shared `TxnCoordSender`. Transactional state and timestamp
management would be shared by all handles.

The point of the `TxnClosableHandle` is to make it so that there’s no `Txn.Close()`,
but there is a `Txn.Fork().Close()`.

# Reference-level explanation

## Detailed design

The `client.Txn` would keep track of the number of open handles. Otherwise, all
the handles (and the `client.Txn` which is itself a handle) share all the state
(in particular the `TxnCoordSender`). The `TxnCoordSender` already has internal
locking for allowing some concurrent requests. This support will be extended, in
particular in the refresher interceptor as we'll see.

### Refreshes

A request straddling a refresh (i.e. request is sent, then the txn is refreshed,
then the request finishes and returns a result) is problematic because the
response cannot simply be returned to the client; the response would also need
to be refreshed at some point before the results of the request that triggered
the refresh are returned.

We have the following options for mixing refreshes and concurrent requests:

1. When a response asks for a refresh, we could block that refresh until all the
in-flight requests finish (note that the TxnCoordSender knows all the requests
that are in flight). Once they all finish (and once their read spans are also
recorded and their response is returned to the client), then we can refresh.
We’d also block new requests from starting. The problem here is that, if one of
these concurrent requests causes the refresh to fail, it’s too late to retry that
request at the new read timestamp (since the result was already delivered to the
client) and so we'd have to return a retriable error to the client. This seems
suboptimal, but it’s also analogous the handling of all the requests that
finished before the refresh was requested.

2. We could also start refreshing immediately once a responses asks for it. We’d
block new requests from starting. As concurrent requests finish, we’d also
refresh them one by one. If refreshing one of them fails, we’d retry that
particular request at the new read timestamp. Retrying works because of
idempotency, although it’s a bit weird - idempotency generally applies for
identical requests, not requests with different timestamps. However I think it
works out because we don’t have requests that write different values depending
on the timestamp they’re evaluated at.
Compared to 1), this allows us to avoid returning retriable errors when one of
these concurrent requests fails to refresh.

3. We could do 2), but instead of waiting for concurrent requests to finish and
then refreshing them, we could wait for them to finish and then simply retry
them at the new timestamp without attempting to refresh them. This is more
pessimistic than 2) because if refreshing a request is successful, that’s
cheaper than re-evaluating the request (particularly since the request has
writes that need to be replicated). Note that we’d still need to wait for the
original requests to finish rather than just canceling them because of concerns
of evaluating the two attempts out of order.

Option 1) seems the simplest.

### Error handling

This section is pending discussions in https://reviewable.io/reviews/cockroachdb/cockroach/41569#-Lu_-qC4B5J35Nvx0jsy

The pertinent question about error handling is how do errors encountered on one
handle affect other handles.

## Rationale and Alternatives

An alternative to the concurrency part is to simply disallow concurrency. We
have write pipelining on the server-side, which is already a form of
concurrency. However, concurrent read requests seem to be a very natural ask.

Andrew Werner suggested that, perhaps in addition to parallel operations, we
could encourage a different programming style where different modules contribute
operations to a single batch which is then executed as a unit and the results
are then disseminated back to the modules. Use cases would be the implementation
of some schema changes and common table expressions with multiple mutations.
This would be, if you will, a barrier-style computation. It seems that sometimes
we really want parallelism, and that a concurrent model is also the easiest to
use sometimes. But this doesn't preclude different libraries and programming
styles in the future.


## Unresolved questions

1. Should different handle be allowed to use different reference sequence
numbers for reads? As it stands, the plan is to have all handles share the read
sequence number (as set by the `txn Set()` method). In particular, if
`txn.Set()` is never called, then all the handles read "current" data, and the
behavior of concurrent overlapping reads and writes is undefined. Is there a
reason to allow a particular handle to be associated with a different read
sequence number so it can be isolated from concurrent overlapping writes?
2. For using the same txn on different nodes through distributed DistSQL flows,
we have the `RootTxn`/`LeafTxn` distinction. Is there something to unify between
handles and leaves? The point of handles is to allow the txn to keep track of
extant concurrent actors , whereas the point of leaves is to allow txn state to
be disseminated to remote nodes, updated , and collected back. So I don't think
there's much to unify, but perhaps someone can open my eyes.
3. Should we take the opportunity and make a `client.Txn` unusable after a
`TransactionAbortedError`? The current code is pretty surprising - the
`client.Txn` transparently switches the `TxnCoordSender` from under it in case
of `TransactionAbortedError`. From the client’s perspective, a
`TransactionAbortedError` is no different from any other retriable error. But in
reality an abort is quite different - the transaction doesn’t get to keep any of
the locks it acquired (the whole point of the abort is to force a txn to release
its locks). As far as KV is concerned, what follows is a completely new
transaction. The abstraction that “the transaction is the same” is a bit leaky
since transaction ids printed in different logs are different. I think there’s
no state on the current `client.Txn` object that we want to keep after an abort
(or, rather, I don’t think there’s any state that we currently propagate that we
should propagate; I think there’s state we currently don’t propagate but should - the
`ObservedTimestamps`). So I think it’d be a good idea to expose this difference
to clients by not allowing a `Restart()` call after a `TransactionAbortedError`
and, instead, making the client create a new `Txn`. Tying the lifetimes of a
`client.Txn` and `TxnCoordSender` together would allow us to pool these objects
together. db.Txn(closure{}), which implements a retry loop, could still retry on
TransactionAbortedError like it’s doing currently.

## Notes

I'm currently also fighting on some adjacent fronts: [allowing refreshes in
distributed DistSQL
flows](https://github.com/cockroachdb/cockroach/issues/24798) and
[rethinking the refreshing mechanics and deferment of handling some
retriable errors](https://groups.google.com/a/cockroachlabs.com/forum/#!topic/kv/V4Gw9556Pm0).
They're disentangled from this RFC, but they all touch how refreshes work.

0 comments on commit 45788cf

Please sign in to comment.