Skip to content

Commit

Permalink
Merge #42668 #42864
Browse files Browse the repository at this point in the history
42668: sql: add precision support for TIME/TIMETZ r=otan a=otan

Refs: #32565
Refs: #26097 

Adding support for precision for both TIME and TIMETZ. This also
includes threading through some extra parsing syntax for TIMETZ.

ALTER TABLE between TIME and TIMETZ not supported as they have different
representations.

Release note (sql change): This PR adds new support for precision for
TIME types (e.g. TIME(3) will truncate to milliseconds). Previously this
would raise syntax errors.

42864: rfcs: new RFC on fixing the halloween problem r=knz a=knz

I am splitting this RFC away from the [RFC on savepoints](#41569) because I know recognize it is a different problem with an orthogonal solution.

https://github.com/knz/cockroach/blob/20191129-rfc-halloween/docs/RFCS/20191014_halloween.md

There is no new idea in this text (I copy-pasted the text from the other RFC) and the implementation PRs are already out, so if the reviewers are satisfied I would gladly already merge the RFC as in-progress or completed.

Co-authored-by: Oliver Tan <otan@cockroachlabs.com>
Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
  • Loading branch information
3 people committed Dec 3, 2019
3 parents 422de48 + 2c3ade7 + 2f4446b commit a7cf655
Show file tree
Hide file tree
Showing 24 changed files with 743 additions and 165 deletions.
252 changes: 252 additions & 0 deletions docs/RFCS/20191014_halloween.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,252 @@
- Feature Name: SQL step-wise execution
- Status: in-progress
- Start Date: 2019-10-14
- Authors: andrei knz nathan
- RFC PR: [#42864](https://github.com/cockroachdb/cockroach/pull/42864)
- Cockroach Issue:
[#28842](https://github.com/cockroachdb/cockroach/issues/28842)
[#33473](https://github.com/cockroachdb/cockroach/issues/33473)
[#33475](https://github.com/cockroachdb/cockroach/issues/33475)

# Summary

This RFC proposes to introduce *sequence points* around and inside the
execution of SQL statements, so that all KV reads performed after a
sequence point observe the data at the time the sequence point was
established.

This change intends to solve issue
[#28842](https://github.com/cockroachdb/cockroach/issues/28842) which
is a serious—and classical—semantic error that can cause business
problems in customer applications, see
https://en.wikipedia.org/wiki/Halloween_Problem for details.

It's also an area where CockroachDB diverges from PostgreSQL, so
fixing this would provide a modicum of additional compatibility.

The proposed solution capitalizes on the *KV sequence numbers*
previously introduced at the KV and storage levels for an unrelated
reason (txn performance optimizations) and can be summarized as
"annotate every read request with the current read seqnum" on the
common path, with the new `Step()` API to copy the current write
seqnum as new current read seqnum.

# Motivation

See above.

# Guide-level explanation

After this change, the "read" portion of SQL mutations operate using a
snapshot of the database as per the moment the statment started. In
particular it cannot see its own writes.

This guarantees e.g. that a statement like `INSERT INTO t SELECT *
FROM t` always terminates, or that `UPDATE t SET x = x+2` never
operates on the same row two times.

# Reference-level explanation

The following API is added to `TxnCoordSender` and `*client.Txn`:

```go
// Step creates a sequencing point in the current transaction. A
// sequencing point establishes a snapshot baseline for subsequent
// read operations: until the next sequencing point, read operations
// observe the data at the time the snapshot was established and
// ignore writes performed since.
//
// Before the first step is taken, the transaction operates as if
// there was a step after every write: each read to a key is able to
// see the latest write before it. This makes the step behavior
// opt-in and backward-compatible with existing code which does not
// need it.
// The method is idempotent.
Step() error

// DisableStepping disables the sequencing point behavior and
// ensures that every read can read the latest write. The
// effect remains disabled until the next call to Step().
// The method is idempotent.
DisableStepping() error
```

- `Step()` is called every time a SQL execution step is reached:
- before the execution of each regular statement;
- at the end of the mutations, before and in-between the FK and cascading action phases, if any.
- the implementation of `Step()` forwards the call to the RootTxn's
TxnCoordSender `Step()` method (new), which in turn saves the current
write seqnum as reference seqnum for future reads.
- `DisableStepping()` is merely a convenience, but is provided
for use with the execution part of all the DDL statements.
These currently contain many internal steps where each
internal step observes the results of steps prior. Without
`DisableStepping()`, we would need to add calls to `Step()`
throughout, which would be error-prone.

## Detailed design

The design is prototyped here:

https://github.com/cockroachdb/cockroach/pull/42854

https://github.com/cockroachdb/cockroach/pull/42862

Additional topics of interest:

- [Uniqueness violations](#Uniqueness-violations)
- [ON CONFLICT processing](#ON-CONFLICT-processing)
- [FK existence checks under a single mutation](#FK-existence-checks-under-a-single-mutation)
- [FK cascading actions under a single mutation](#FK-cascading-actions-under-a-single-mutation)
- [Multiple mutations with CTEs](#Multiple-mutations-with-CTEs)
- [Schema changes](#Schema-changes)

### Uniqueness violations

There are really two cases:

- we insert/modify a single row, and doing so creating a duplicate of
a row that was modified in a previous statement (or sequencing
step). This case is simple and transparently handled by "read at
seqnum of previous step".

- we insert/modify the same row two times inside the same mutation
statement, or two rows such that they are duplicate according to
some unique index.
Here the problem is seemingly that the 2nd row update will not
see the first.

However, when looking more closely there is no new problem here.

All writes to a unique index go through a KV `CPut` on the uniqueness key.
By ensuring that `CPuts` read their _condition_ at the current write
seqnum, we can always pick up the latest write and detect duplicates.

(CPut will still skip over ignored / rolled back seqnums like other KV
ops. It's only the target read seqnum that's ratcheted up to the
present for CPut, in contrast to other mvcc ops that will be blocked by
the configured target read seqnum.)

This opens a question of whether we need a variant of CPut which does
not do this. TBD. (Initial analysis says no.)

### ON CONFLICT processing

Question arises of what to do when the result of ON CONFLICT
processing changes a row in a read-modify-write fashion. For example:

```sql
INSERT INTO t(x) VALUES (1), (1) ON CONFLICT(x) DO UPDATE SET x = t.x + excluded.x
-- ^^^^^^^^ notice the dup row
```

Here conceptually the INSERT suggests that the 2nd ON CONFLICT resolution
will observe the row as it was left over by the 1st. This would not work
with "read at seqnum of previous statement".

The answer here is from a previous discussion around mutations that
observed the following:

- postgres does not support updating the same row two times in an ON
CONFLICT clause.

- it is desirable to batch / pre-compute the ON CONFLICT result values
concurrently with the mutation for performance, and we've already
established back in 2018 that the lack of support for updating the
same row twice in pg makes this optimization possible.

- the implementation was updated when bringing this logic under the CBO

From here, it follows that we don't care about "read at seqnum"
inconsistencies as the current CBO logic already assumes that it's
fine to miss earlier conflict resolutions.

### FK existence checks under a single mutation

FK existence checks must observe the data values post-mutation. For
this we need to introduce a sequence point (call to `Step()`) between
the end of the "run" phase (where results were produced for the
client) and the FK existence checks.

This way the reads for FK existence checks can see all the writes by
the mutation.

Note that today FK existence checks are "interleaved" with mutations
on the common path, which is a useful optimization but incorrect in
some cases. This will need to be adjusted. See the following issue for details:
https://github.com/cockroachdb/cockroach/issues/33475

In 19.2/20.1 there is a new notion of "post-queries" which the CBO is
increasingly using to perform FK checks and cascading actions. These
benefit simply by adding a sequence point before the execution of each
post-query.

### FK cascading actions under a single mutation

Postgres uses post-statement triggers to process FK cascading actions
and existence checks. Cascading actions that result in mutations to
other tables themselves append more triggers to run.

Each subsequent step in this cascade of effects is able to read its
own writes (for futher FK checks).

We emulate this in CockroachDB by introducing a step boundary between
iterations of the cascading algorithm.

### Multiple mutations with CTEs

It's possible for a single statement to define multiple mutations for example:

```sql
WITH
a AS (INSERT ... RETURNING ...),
b AS (INSERT ... RETURNING ...)
SELECT ...
```

PostgreSQL does not guarantee that the effect of one mutation is
visible to another, or even to the later read-only parts of the
statement. In fact it requires that all mutations operate
on the same data at the beginning of the statement:

More specifically: https://www.postgresql.org/docs/12/queries-with.html

> The sub-statements in WITH are executed concurrently with each other
> and with the main query. Therefore, when using data-modifying
> statements in WITH, the order in which the specified updates
> actually happen is unpredictable. **All the statements are executed
> with the same snapshot (see Chapter 13), so they cannot “see” one
> another's effects on the target tables.** This alleviates the effects
> of the unpredictability of the actual order of row updates, and
> means that RETURNING data is the only way to communicate changes
> between different WITH sub-statements and the main query.
So with the logic proposed so far, all the mutations inside the same
statement execute from the same read seqnum.

If there is FK work to be done, the first sequencing step necessary
for FK checks (to advance the read seqnum) will only occur after all
mutations have completed.

(The observations from [Uniqueness violations](#Uniqueness-violations) above apply here as well.)

### Schema changes

Schema changers that operate synchronously operate "under the sequence
point" and need no further adjustment.

Schema changers that operate asynchronously already operate under
independent `*client.Txn` instances and are thus unaffected.


## Drawbacks

None known.

## Rationale and Alternatives

No alternative was evaluated.

## Unresolved questions

None known.
3 changes: 3 additions & 0 deletions docs/generated/sql/bnf/stmt_block.bnf
Original file line number Diff line number Diff line change
Expand Up @@ -2091,7 +2091,10 @@ character_without_length ::=

const_datetime ::=
'DATE'
| 'TIME' opt_timezone
| 'TIME' '(' iconst32 ')' opt_timezone
| 'TIMETZ'
| 'TIMETZ' '(' iconst32 ')'
| 'TIMESTAMP' opt_timezone
| 'TIMESTAMP' '(' iconst32 ')' opt_timezone
| 'TIMESTAMPTZ'
Expand Down
2 changes: 1 addition & 1 deletion pkg/ccl/changefeedccl/avro.go
Original file line number Diff line number Diff line change
Expand Up @@ -230,7 +230,7 @@ func columnDescToAvroSchema(colDesc *sqlbase.ColumnDescriptor) (*avroSchemaField
return d.(*tree.DTimeTZ).TimeTZ.String(), nil
}
schema.decodeFn = func(x interface{}) (tree.Datum, error) {
return tree.ParseDTimeTZ(nil, x.(string))
return tree.ParseDTimeTZ(nil, x.(string), time.Microsecond)
}
case types.TimestampFamily:
avroType = avroLogicalType{
Expand Down
63 changes: 63 additions & 0 deletions pkg/sql/logictest/testdata/logic_test/time
Original file line number Diff line number Diff line change
Expand Up @@ -343,3 +343,66 @@ SELECT extract('day' from time '12:00:00')

query error pgcode 22023 extract\(\): unsupported timespan: day
SELECT extract('DAY' from time '12:00:00')

subtest precision_tests

query error precision 7 out of range
select '1:00:00.001':::TIME(7)

statement ok
CREATE TABLE time_precision_test (
id integer PRIMARY KEY,
t TIME(5)
)

statement ok
INSERT INTO time_precision_test VALUES
(1,'12:00:00.123456+03:00'),
(2,'12:00:00.12345+03:00'),
(3,'12:00:00.1234+03:00'),
(4,'12:00:00.123+03:00'),
(5,'12:00:00.12+03:00'),
(6,'12:00:00.1+03:00'),
(7,'12:00:00+03:00')

query IT
SELECT * FROM time_precision_test ORDER BY id ASC
----
1 0000-01-01 12:00:00.12346 +0000 UTC
2 0000-01-01 12:00:00.12345 +0000 UTC
3 0000-01-01 12:00:00.1234 +0000 UTC
4 0000-01-01 12:00:00.123 +0000 UTC
5 0000-01-01 12:00:00.12 +0000 UTC
6 0000-01-01 12:00:00.1 +0000 UTC
7 0000-01-01 12:00:00 +0000 UTC

query TT
select column_name, data_type FROM [SHOW COLUMNS FROM time_precision_test] ORDER BY column_name
----
id INT8
t TIME(5)

statement ok
ALTER TABLE time_precision_test ALTER COLUMN t TYPE time(6)

statement ok
INSERT INTO time_precision_test VALUES
(100,'12:00:00.123456+03:00')

query IT
SELECT * FROM time_precision_test ORDER BY id ASC
----
1 0000-01-01 12:00:00.12346 +0000 UTC
2 0000-01-01 12:00:00.12345 +0000 UTC
3 0000-01-01 12:00:00.1234 +0000 UTC
4 0000-01-01 12:00:00.123 +0000 UTC
5 0000-01-01 12:00:00.12 +0000 UTC
6 0000-01-01 12:00:00.1 +0000 UTC
7 0000-01-01 12:00:00 +0000 UTC
100 0000-01-01 12:00:00.123456 +0000 UTC

query TT
select column_name, data_type FROM [SHOW COLUMNS FROM time_precision_test] ORDER BY column_name
----
id INT8
t TIME(6)
Loading

0 comments on commit a7cf655

Please sign in to comment.