-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
rfc: add an RFC describing an improved client.Txn interface
... allowing for concurrent use of a txn. Release note: None
- Loading branch information
1 parent
d905be8
commit 45788cf
Showing
1 changed file
with
240 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,240 @@ | ||
- Feature Name: client.Txn interface v2 | ||
- Status: draft | ||
- Start Date: 2019-11-15 | ||
- Authors: Andrei | ||
- RFC PR: (PR # after acceptance of initial draft) | ||
- Cockroach Issue: [#25329](https://github.com/cockroachdb/cockroach/issues/25329) [#22615](https://github.com/cockroachdb/cockroach/issues/22615) | ||
|
||
# Summary | ||
|
||
This RFC discusses additions to the client.Txn interface that make it | ||
thread-safe and otherwise safer. A new notion of a "transaction handle" is | ||
introduced, allowing a sort of fork-join usage model for concurrent actors that | ||
want to use a shared transaction. The client is put in charge of explicitly | ||
restarting a transaction after retriable errors, instead of the current model | ||
where the `client.Txn` restart automatically behind the scenes. Moving this | ||
responsibility to the client makes it possible for the client to synchronize the | ||
concurrent actors before restarting. | ||
See also [this technote]( | ||
https://github.com/cockroachdb/cockroach/blob/master/docs/tech-notes/txn_coord_sender.md) | ||
for more information on how transactions are implemented on the "client side". | ||
|
||
# Motivation | ||
|
||
The transaction (`client.Txn`) is the interface through which SQL and other | ||
modules interact with the KV module. It is a foundational interface in our | ||
codebase and, as such, it deserves to be clean, safe, and powerful. At the | ||
moment, there’s two big problems with it: | ||
1) It generally doesn’t support concurrent use. Performing concurrent operations | ||
(in particular, sending a BatchRequest while another one is in flight) results | ||
in an error. Historically, this didn't result in an error, but instead in data | ||
corruption. This restriction has surprised people repeatedly (we even had two | ||
tests dedicated to asserting that concurrent requests work; despite the | ||
appearances, they did not work). | ||
There's also good reasons for wanting to perform concurrent (parallel) | ||
operations - for example validating foreign keys in the background. | ||
2) It requires the user to synchronize its state with the txn state and, the | ||
minute it fails to do so and the state get out of sync, tragedy follows (making | ||
it unsafe). Ignoring a retriable error (because, for example, you want to ignore | ||
any other error) is currently illegal. When the client.Txn returns a retriable | ||
error to the client, it also restarts the transaction transparently. Failure by | ||
the client to communicate this restart to a retry loop results in committing | ||
half a transaction. This also was a cause of surprise for people. | ||
|
||
These two issues stem from the same fact: transactions react internally to | ||
retriable errors assuming that every request coming afterwards will pertain to | ||
the next iteration of the transaction. This implies no concurrent use because, | ||
if there was concurrent use, there'd be no way to distinguish requests intented | ||
for the previous epoch versus requests intended for the new one). | ||
In addition, our read span refreshing code does not currently handle | ||
concurrency: if a requests causes a refresh, any other request straddling that | ||
refresh is not properly refreshed. Consider the following scenario: | ||
* An actor sends a request *r*. | ||
* While the request is in flight, another request *r1* is sent and returns, | ||
requesting a refresh. | ||
* The `TxnCoordSender` successfully refreshes all the requests prior to *r* and | ||
*r1* (note that *r*'s read spans have not yet been recorded) and bumps the | ||
transaction's read timestamp. | ||
* *r* returns. Its result cannot simply be returned to the client; it needs to | ||
be refreshed or retried at the bumped timestamp. The code is not currently | ||
equipped to do so. | ||
|
||
There's a particular case where we do allow concurrent use of a txn: DistSQL | ||
flows that can't be completely "fused" use a `LeafTxn`. Leaves allow concurrent | ||
use by inhibiting transparent restarts and refreshes. Disabling refreshes is a | ||
pity. In particular when the flow running entirely on the gateway, we'd like to | ||
be able to refresh in the middle of the flow. (When the flow is distributed, | ||
refreshing is trickier; see | ||
[#24798](https://github.com/cockroachdb/cockroach/issues/24798). The changes | ||
proposed here still move us in the right direction for supporting refreshes in | ||
the middle of distributed queries too). | ||
|
||
# Guide-level explanation | ||
|
||
We're proposing an interface that supports concurrent use and is safe. We’ll | ||
become safe by putting the client in charge of restarting transactions. We’ll | ||
allow for concurrent use by not allowing the client to restart transactions | ||
while it’s also performing requests that belong to the previous transaction | ||
attempt. | ||
|
||
The idea is to split the sets of operations that a transaction can perform in | ||
two: operations that can be performed concurrently and operations that assume | ||
exclusive control over the txn. The latter operations will continue to use | ||
the `client.Txn` struct. The former will start using a new interface: | ||
`client.TxnHandle`. The `client.Txn` will be coded defensively | ||
such that most illegal uses are caught. Each handle is single-threaded, but | ||
there can be multiple handles at the same time. The `client.Txn` keeps track of | ||
how many extant handles there are at any time, and will refuse to do dangerous | ||
things while there’s other handles out there. For backwards compatibility and | ||
also simplicity of non-concurrent use cases, the `client.Txn` will also contain | ||
the `client.TxnHandle` interface - so if you have a `client.Txn`, you can do | ||
everything on it (single-threadedly). | ||
|
||
The interfaces will look like this: | ||
|
||
```go | ||
client.TxnHandle: | ||
Get(), Scan(), Put(), Delete(), etc. | ||
Fork() -> returns client.TxnClosableHandle | ||
|
||
client.Txn: | ||
embeds client.TxnHandleImpl | ||
Commit(), Rollback(), Step(), GetSavepoint(), RollbackToSavepoint() | ||
Restart() | ||
|
||
client.TxnClosableHandle: | ||
embeds client.TxnHandle | ||
Close() | ||
``` | ||
|
||
The Txn will keep track of the number of extant handles - created via `Fork()` | ||
but not yet `Close()d`. While there’s any, calling methods specific to the `Txn` | ||
will error. This means that the client needs to synchronize (i.e. `Close()`) all | ||
the handles before `Restart()`ing. Implementation-wise, all these interfaces are | ||
wrappers on top of a shared `TxnCoordSender`. Transactional state and timestamp | ||
management would be shared by all handles. | ||
|
||
The point of the `TxnClosableHandle` is to make it so that there’s no `Txn.Close()`, | ||
but there is a `Txn.Fork().Close()`. | ||
|
||
# Reference-level explanation | ||
|
||
## Detailed design | ||
|
||
The `client.Txn` would keep track of the number of open handles. Otherwise, all | ||
the handles (and the `client.Txn` which is itself a handle) share all the state | ||
(in particular the `TxnCoordSender`). The `TxnCoordSender` already has internal | ||
locking for allowing some concurrent requests. This support will be extended, in | ||
particular in the refresher interceptor as we'll see. | ||
|
||
### Refreshes | ||
|
||
A request straddling a refresh (i.e. request is sent, then the txn is refreshed, | ||
then the request finishes and returns a result) is problematic because the | ||
response cannot simply be returned to the client; the response would also need | ||
to be refreshed at some point before the results of the request that triggered | ||
the refresh are returned. | ||
|
||
We have the following options for mixing refreshes and concurrent requests: | ||
|
||
1. When a response asks for a refresh, we could block that refresh until all the | ||
in-flight requests finish (note that the TxnCoordSender knows all the requests | ||
that are in flight). Once they all finish (and once their read spans are also | ||
recorded and their response is returned to the client), then we can refresh. | ||
We’d also block new requests from starting. The problem here is that, if one of | ||
these concurrent requests causes the refresh to fail, it’s too late to retry that | ||
request at the new read timestamp (since the result was already delivered to the | ||
client) and so we'd have to return a retriable error to the client. This seems | ||
suboptimal, but it’s also analogous the handling of all the requests that | ||
finished before the refresh was requested. | ||
|
||
2. We could also start refreshing immediately once a responses asks for it. We’d | ||
block new requests from starting. As concurrent requests finish, we’d also | ||
refresh them one by one. If refreshing one of them fails, we’d retry that | ||
particular request at the new read timestamp. Retrying works because of | ||
idempotency, although it’s a bit weird - idempotency generally applies for | ||
identical requests, not requests with different timestamps. However I think it | ||
works out because we don’t have requests that write different values depending | ||
on the timestamp they’re evaluated at. | ||
Compared to 1), this allows us to avoid returning retriable errors when one of | ||
these concurrent requests fails to refresh. | ||
|
||
3. We could do 2), but instead of waiting for concurrent requests to finish and | ||
then refreshing them, we could wait for them to finish and then simply retry | ||
them at the new timestamp without attempting to refresh them. This is more | ||
pessimistic than 2) because if refreshing a request is successful, that’s | ||
cheaper than re-evaluating the request (particularly since the request has | ||
writes that need to be replicated). Note that we’d still need to wait for the | ||
original requests to finish rather than just canceling them because of concerns | ||
of evaluating the two attempts out of order. | ||
|
||
Option 1) seems the simplest. | ||
|
||
### Error handling | ||
|
||
This section is pending discussions in https://reviewable.io/reviews/cockroachdb/cockroach/41569#-Lu_-qC4B5J35Nvx0jsy | ||
|
||
The pertinent question about error handling is how do errors encountered on one | ||
handle affect other handles. | ||
|
||
## Rationale and Alternatives | ||
|
||
An alternative to the concurrency part is to simply disallow concurrency. We | ||
have write pipelining on the server-side, which is already a form of | ||
concurrency. However, concurrent read requests seem to be a very natural ask. | ||
|
||
Andrew Werner suggested that, perhaps in addition to parallel operations, we | ||
could encourage a different programming style where different modules contribute | ||
operations to a single batch which is then executed as a unit and the results | ||
are then disseminated back to the modules. Use cases would be the implementation | ||
of some schema changes and common table expressions with multiple mutations. | ||
This would be, if you will, a barrier-style computation. It seems that sometimes | ||
we really want parallelism, and that a concurrent model is also the easiest to | ||
use sometimes. But this doesn't preclude different libraries and programming | ||
styles in the future. | ||
|
||
|
||
## Unresolved questions | ||
|
||
1. Should different handle be allowed to use different reference sequence | ||
numbers for reads? As it stands, the plan is to have all handles share the read | ||
sequence number (as set by the `txn Set()` method). In particular, if | ||
`txn.Set()` is never called, then all the handles read "current" data, and the | ||
behavior of concurrent overlapping reads and writes is undefined. Is there a | ||
reason to allow a particular handle to be associated with a different read | ||
sequence number so it can be isolated from concurrent overlapping writes? | ||
2. For using the same txn on different nodes through distributed DistSQL flows, | ||
we have the `RootTxn`/`LeafTxn` distinction. Is there something to unify between | ||
handles and leaves? The point of handles is to allow the txn to keep track of | ||
extant concurrent actors , whereas the point of leaves is to allow txn state to | ||
be disseminated to remote nodes, updated , and collected back. So I don't think | ||
there's much to unify, but perhaps someone can open my eyes. | ||
3. Should we take the opportunity and make a `client.Txn` unusable after a | ||
`TransactionAbortedError`? The current code is pretty surprising - the | ||
`client.Txn` transparently switches the `TxnCoordSender` from under it in case | ||
of `TransactionAbortedError`. From the client’s perspective, a | ||
`TransactionAbortedError` is no different from any other retriable error. But in | ||
reality an abort is quite different - the transaction doesn’t get to keep any of | ||
the locks it acquired (the whole point of the abort is to force a txn to release | ||
its locks). As far as KV is concerned, what follows is a completely new | ||
transaction. The abstraction that “the transaction is the same” is a bit leaky | ||
since transaction ids printed in different logs are different. I think there’s | ||
no state on the current `client.Txn` object that we want to keep after an abort | ||
(or, rather, I don’t think there’s any state that we currently propagate that we | ||
should propagate; I think there’s state we currently don’t propagate but should - the | ||
`ObservedTimestamps`). So I think it’d be a good idea to expose this difference | ||
to clients by not allowing a `Restart()` call after a `TransactionAbortedError` | ||
and, instead, making the client create a new `Txn`. Tying the lifetimes of a | ||
`client.Txn` and `TxnCoordSender` together would allow us to pool these objects | ||
together. db.Txn(closure{}), which implements a retry loop, could still retry on | ||
TransactionAbortedError like it’s doing currently. | ||
|
||
## Notes | ||
|
||
I'm currently also fighting on some adjacent fronts: [allowing refreshes in | ||
distributed DistSQL | ||
flows](https://github.com/cockroachdb/cockroach/issues/24798) and | ||
[rethinking the refreshing mechanics and deferment of handling some | ||
retriable errors](https://groups.google.com/a/cockroachlabs.com/forum/#!topic/kv/V4Gw9556Pm0). | ||
They're disentangled from this RFC, but they all touch how refreshes work. | ||
|