Merge #42668 #42864

42668: sql: add precision support for TIME/TIMETZ r=otan a=otan Refs: #32565 Refs: #26097 Adding support for precision for both TIME and TIMETZ. This also includes threading through some extra parsing syntax for TIMETZ. ALTER TABLE between TIME and TIMETZ not supported as they have different representations. Release note (sql change): This PR adds new support for precision for TIME types (e.g. TIME(3) will truncate to milliseconds). Previously this would raise syntax errors. 42864: rfcs: new RFC on fixing the halloween problem r=knz a=knz I am splitting this RFC away from the [RFC on savepoints](#41569) because I know recognize it is a different problem with an orthogonal solution. https://github.com/knz/cockroach/blob/20191129-rfc-halloween/docs/RFCS/20191014_halloween.md There is no new idea in this text (I copy-pasted the text from the other RFC) and the implementation PRs are already out, so if the reviewers are satisfied I would gladly already merge the RFC as in-progress or completed. Co-authored-by: Oliver Tan <otan@cockroachlabs.com> Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
cockroachdb · Dec 3, 2019 · a7cf655 · a7cf655
3 parents 422de48 + 2c3ade7 + 2f4446b
commit a7cf655
Show file tree

Hide file tree

Showing 24 changed files with 743 additions and 165 deletions.
diff --git a/docs/RFCS/20191014_halloween.md b/docs/RFCS/20191014_halloween.md
@@ -0,0 +1,252 @@
+- Feature Name: SQL step-wise execution
+- Status: in-progress
+- Start Date: 2019-10-14
+- Authors: andrei knz nathan
+- RFC PR: [#42864](https://github.com/cockroachdb/cockroach/pull/42864)
+- Cockroach Issue:
+  [#28842](https://github.com/cockroachdb/cockroach/issues/28842)
+  [#33473](https://github.com/cockroachdb/cockroach/issues/33473)
+  [#33475](https://github.com/cockroachdb/cockroach/issues/33475)
+
+# Summary
+
+This RFC proposes to introduce *sequence points* around and inside the
+execution of SQL statements, so that all KV reads performed after a
+sequence point observe the data at the time the sequence point was
+established.
+
+This change intends to solve issue
+[#28842](https://github.com/cockroachdb/cockroach/issues/28842) which
+is a serious—and classical—semantic error that can cause business
+problems in customer applications, see
+https://en.wikipedia.org/wiki/Halloween_Problem for details.
+
+It's also an area where CockroachDB diverges from PostgreSQL, so
+fixing this would provide a modicum of additional compatibility.
+
+The proposed solution capitalizes on the *KV sequence numbers*
+previously introduced at the KV and storage levels for an unrelated
+reason (txn performance optimizations) and can be summarized as
+"annotate every read request with the current read seqnum" on the
+common path, with the new `Step()` API to copy the current write
+seqnum as new current read seqnum.
+
+# Motivation
+
+See above.
+
+# Guide-level explanation
+
+After this change, the "read" portion of SQL mutations operate using a
+snapshot of the database as per the moment the statment started. In
+particular it cannot see its own writes.
+
+This guarantees e.g. that a statement like `INSERT INTO t SELECT *
+FROM t` always terminates, or that `UPDATE t SET x = x+2` never
+operates on the same row two times.
+
+# Reference-level explanation
+
+The following API is added to `TxnCoordSender` and `*client.Txn`:
+
+```go
+   // Step creates a sequencing point in the current transaction. A
+   // sequencing point establishes a snapshot baseline for subsequent
+   // read operations: until the next sequencing point, read operations
+   // observe the data at the time the snapshot was established and
+   // ignore writes performed since.
+   //
+   // Before the first step is taken, the transaction operates as if
+   // there was a step after every write: each read to a key is able to
+   // see the latest write before it. This makes the step behavior
+   // opt-in and backward-compatible with existing code which does not
+   // need it.
+   // The method is idempotent.
+   Step() error
+
+   // DisableStepping disables the sequencing point behavior and
+   // ensures that every read can read the latest write. The
+   // effect remains disabled until the next call to Step().
+   // The method is idempotent.
+   DisableStepping() error
+```
+
+- `Step()` is called every time a SQL execution step is reached:
+  - before the execution of each regular statement;
+  - at the end of the mutations, before and in-between the FK and cascading action phases, if any.
+- the implementation of `Step()` forwards the call to the RootTxn's
+  TxnCoordSender `Step()` method (new), which in turn saves the current
+  write seqnum as reference seqnum for future reads.
+- `DisableStepping()` is merely a convenience, but is provided
+  for use with the execution part of all the DDL statements.
+  These currently contain many internal steps where each
+  internal step observes the results of steps prior. Without
+  `DisableStepping()`, we would need to add calls to `Step()`
+  throughout, which would be error-prone.
+
+## Detailed design
+
+The design is prototyped here:
+
+https://github.com/cockroachdb/cockroach/pull/42854
+
+https://github.com/cockroachdb/cockroach/pull/42862
+
+Additional topics of interest:
+
+- [Uniqueness violations](#Uniqueness-violations)
+- [ON CONFLICT processing](#ON-CONFLICT-processing)
+- [FK existence checks under a single mutation](#FK-existence-checks-under-a-single-mutation)
+- [FK cascading actions under a single mutation](#FK-cascading-actions-under-a-single-mutation)
+- [Multiple mutations with CTEs](#Multiple-mutations-with-CTEs)
+- [Schema changes](#Schema-changes)
+
+### Uniqueness violations
+
+There are really two cases:
+
+- we insert/modify a single row, and doing so creating a duplicate of
+  a row that was modified in a previous statement (or sequencing
+  step). This case is simple and transparently handled by "read at
+  seqnum of previous step".
+
+- we insert/modify the same row two times inside the same mutation
+  statement, or two rows such that they are duplicate according to
+  some unique index.
+  Here the problem is seemingly that the 2nd row update will not
+  see the first.
+
+However, when looking more closely there is no new problem here.
+
+All writes to a unique index go through a KV `CPut` on the uniqueness key.
+By ensuring that `CPuts` read their _condition_ at the current write
+seqnum, we can always pick up the latest write and detect duplicates.
+
+(CPut will still skip over ignored / rolled back seqnums like other KV
+ops. It's only the target read seqnum that's ratcheted up to the
+present for CPut, in contrast to other mvcc ops that will be blocked by
+the configured target read seqnum.)
+
+This opens a question of whether we need a variant of CPut which does
+not do this. TBD. (Initial analysis says no.)
+
+### ON CONFLICT processing
+
+Question arises of what to do when the result of ON CONFLICT
+processing changes a row in a read-modify-write fashion. For example:
+
+```sql
+INSERT INTO t(x) VALUES (1), (1) ON CONFLICT(x) DO UPDATE SET x = t.x + excluded.x
+--                      ^^^^^^^^ notice the dup row
+```
+
+Here conceptually the INSERT suggests that the 2nd ON CONFLICT resolution
+will observe the row as it was left over by the 1st. This would not work
+with "read at seqnum of previous statement".
+
+The answer here is from a previous discussion around mutations that
+observed the following:
+
+- postgres does not support updating the same row two times in an ON
+  CONFLICT clause.
+
+- it is desirable to batch / pre-compute the ON CONFLICT result values
+  concurrently with the mutation for performance, and we've already
+  established back in 2018 that the lack of support for updating the
+  same row twice in pg makes this optimization possible.
+
+- the implementation was updated when bringing this logic under the CBO
+
+From here, it follows that we don't care about "read at seqnum"
+inconsistencies as the current CBO logic already assumes that it's
+fine to miss earlier conflict resolutions.
+
+### FK existence checks under a single mutation
+
+FK existence checks must observe the data values post-mutation. For
+this we need to introduce a sequence point (call to `Step()`) between
+the end of the "run" phase (where results were produced for the
+client) and the FK existence checks.
+
+This way the reads for FK existence checks can see all the writes by
+the mutation.
+
+Note that today FK existence checks are "interleaved" with mutations
+on the common path, which is a useful optimization but incorrect in
+some cases. This will need to be adjusted. See the following issue for details:
+https://github.com/cockroachdb/cockroach/issues/33475
+
+In 19.2/20.1 there is a new notion of "post-queries" which the CBO is
+increasingly using to perform FK checks and cascading actions. These
+benefit simply by adding a sequence point before the execution of each
+post-query.
+
+### FK cascading actions under a single mutation
+
+Postgres uses post-statement triggers to process FK cascading actions
+and existence checks. Cascading actions that result in mutations to
+other tables themselves append more triggers to run.
+
+Each subsequent step in this cascade of effects is able to read its
+own writes (for futher FK checks).
+
+We emulate this in CockroachDB by introducing a step boundary between
+iterations of the cascading algorithm.
+
+### Multiple mutations with CTEs
+
+It's possible for a single statement to define multiple mutations for example:
+
+```sql
+WITH
+  a AS (INSERT ... RETURNING ...),
+  b AS (INSERT ... RETURNING ...)
+  SELECT ...
+```
+
+PostgreSQL does not guarantee that the effect of one mutation is
+visible to another, or even to the later read-only parts of the
+statement. In fact it requires that all mutations operate
+on the same data at the beginning of the statement:
+
+More specifically: https://www.postgresql.org/docs/12/queries-with.html
+
+> The sub-statements in WITH are executed concurrently with each other
+> and with the main query. Therefore, when using data-modifying
+> statements in WITH, the order in which the specified updates
+> actually happen is unpredictable. **All the statements are executed
+> with the same snapshot (see Chapter 13), so they cannot “see” one
+> another's effects on the target tables.** This alleviates the effects
+> of the unpredictability of the actual order of row updates, and
+> means that RETURNING data is the only way to communicate changes
+> between different WITH sub-statements and the main query.
+
+So with the logic proposed so far, all the mutations inside the same
+statement execute from the same read seqnum.
+
+If there is FK work to be done, the first sequencing step necessary
+for FK checks (to advance the read seqnum) will only occur after all
+mutations have completed.
+
+(The observations from [Uniqueness violations](#Uniqueness-violations) above apply here as well.)
+
+### Schema changes
+
+Schema changers that operate synchronously operate "under the sequence
+point" and need no further adjustment.
+
+Schema changers that operate asynchronously already operate under
+independent `*client.Txn` instances and are thus unaffected.
+
+
+## Drawbacks
+
+None known.
+
+## Rationale and Alternatives
+
+No alternative was evaluated.
+
+## Unresolved questions
+
+None known.
diff --git a/docs/generated/sql/bnf/stmt_block.bnf b/docs/generated/sql/bnf/stmt_block.bnf
@@ -2091,7 +2091,10 @@ character_without_length ::=
 
 const_datetime ::=
 	'DATE'
+	| 'TIME' opt_timezone
+	| 'TIME' '(' iconst32 ')' opt_timezone
 	| 'TIMETZ'
+	| 'TIMETZ' '(' iconst32 ')'
 	| 'TIMESTAMP' opt_timezone
 	| 'TIMESTAMP' '(' iconst32 ')' opt_timezone
 	| 'TIMESTAMPTZ'

diff --git a/pkg/ccl/changefeedccl/avro.go b/pkg/ccl/changefeedccl/avro.go
@@ -230,7 +230,7 @@ func columnDescToAvroSchema(colDesc *sqlbase.ColumnDescriptor) (*avroSchemaField
 			return d.(*tree.DTimeTZ).TimeTZ.String(), nil
 		}
 		schema.decodeFn = func(x interface{}) (tree.Datum, error) {
-			return tree.ParseDTimeTZ(nil, x.(string))
+			return tree.ParseDTimeTZ(nil, x.(string), time.Microsecond)
 		}
 	case types.TimestampFamily:
 		avroType = avroLogicalType{

diff --git a/pkg/sql/logictest/testdata/logic_test/time b/pkg/sql/logictest/testdata/logic_test/time
@@ -343,3 +343,66 @@ SELECT extract('day' from time '12:00:00')
 
 query error pgcode 22023 extract\(\): unsupported timespan: day
 SELECT extract('DAY' from time '12:00:00')
+
+subtest precision_tests
+
+query error precision 7 out of range
+select '1:00:00.001':::TIME(7)
+
+statement ok
+CREATE TABLE time_precision_test (
+  id integer PRIMARY KEY,
+  t TIME(5)
+)
+
+statement ok
+INSERT INTO time_precision_test VALUES
+  (1,'12:00:00.123456+03:00'),
+  (2,'12:00:00.12345+03:00'),
+  (3,'12:00:00.1234+03:00'),
+  (4,'12:00:00.123+03:00'),
+  (5,'12:00:00.12+03:00'),
+  (6,'12:00:00.1+03:00'),
+  (7,'12:00:00+03:00')
+
+query IT
+SELECT * FROM time_precision_test ORDER BY id ASC
+----
+1  0000-01-01 12:00:00.12346 +0000 UTC
+2  0000-01-01 12:00:00.12345 +0000 UTC
+3  0000-01-01 12:00:00.1234 +0000 UTC
+4  0000-01-01 12:00:00.123 +0000 UTC
+5  0000-01-01 12:00:00.12 +0000 UTC
+6  0000-01-01 12:00:00.1 +0000 UTC
+7  0000-01-01 12:00:00 +0000 UTC
+
+query TT
+select column_name, data_type FROM [SHOW COLUMNS FROM time_precision_test] ORDER BY column_name
+----
+id  INT8
+t   TIME(5)
+
+statement ok
+ALTER TABLE time_precision_test ALTER COLUMN t TYPE time(6)
+
+statement ok
+INSERT INTO time_precision_test VALUES
+  (100,'12:00:00.123456+03:00')
+
+query IT
+SELECT * FROM time_precision_test ORDER BY id ASC
+----
+1    0000-01-01 12:00:00.12346 +0000 UTC
+2    0000-01-01 12:00:00.12345 +0000 UTC
+3    0000-01-01 12:00:00.1234 +0000 UTC
+4    0000-01-01 12:00:00.123 +0000 UTC
+5    0000-01-01 12:00:00.12 +0000 UTC
+6    0000-01-01 12:00:00.1 +0000 UTC
+7    0000-01-01 12:00:00 +0000 UTC
+100  0000-01-01 12:00:00.123456 +0000 UTC
+
+query TT
+select column_name, data_type FROM [SHOW COLUMNS FROM time_precision_test] ORDER BY column_name
+----
+id  INT8
+t   TIME(6)