opt: Project merged Upsert columns #33681

andy-kimball · 2019-01-12T01:48:35Z

Currently, the Upsert operator's insert and update columns are
separately created and maintained. For example:

CREATE TABLE abc (a INT PRIMARY KEY, b INT)
INSERT INTO abc VALUES (1) ON CONFLICT (a) DO UPDATE SET b=5

Here are the projected columns:

SELECT
a AS fetch_a, b AS fetch_b,
1 AS ins_a, NULL AS ins_b,
5 AS upd_b
FROM ...

This commit constructs CASE statements that merge the insert and
update columns, like this:

SELECT
a AS fetch_a, b AS fetch_b,
CASE WHEN fetch_a IS NULL THEN 1 ELSE fetch_a END AS ups_a,
CASE WHEN fetch_a IS NULL THEN NULL::INT ELSE 5 END AS ups_b
FROM ...

The advantages to making this change are:

This sets us up to implement check constraints, since they will now
be able to operate on the combined columns.
This often avoids evaluating the update expression when an insert
needs to occur for a given row.
This simplifies the property derivation, making Upsert act just like
Update, Insert, and Delete. It's now possible to cleanly map each
mutation output column to a single input column.

cockroach-teamcity · 2019-01-12T01:48:43Z

This change is

RaduBerinde

Reviewed 9 of 9 files at r1, 8 of 10 files at r2, 11 of 11 files at r3.
Reviewable status: complete! 1 of 0 LGTMs obtained

pkg/sql/opt/memo/expr.go, line 290 at r3 (raw file):

// (which holds the existing value of the column). And for the Upsert case, use
// either the UpdateCols/FetchCols or the InsertCols value, as long as it is
// non-zero. If both are non-zero, then they must be the same, since they are

[nit] maybe them being the same could be asserted inside the function? (unless it's performance concern)

pkg/sql/opt/optbuilder/insert.go, line 883 at r3 (raw file):

		// column. This will be used by the Upsert operator in place of the
		// original columns.
		if mb.insertColList[i] != 0 {

[nit] Consider doing this inside each case above (it should simplify a lot in each case), I think it would be easier to follow what's going on.

andy-kimball

Reviewable status: complete! 1 of 0 LGTMs obtained

pkg/sql/opt/memo/expr.go, line 290 at r3 (raw file):

Previously, RaduBerinde wrote…

[nit] maybe them being the same could be asserted inside the function? (unless it's performance concern)

Just messier code, but I went ahead and made that change.

pkg/sql/opt/optbuilder/insert.go, line 883 at r3 (raw file):

Previously, RaduBerinde wrote…

[nit] Consider doing this inside each case above (it should simplify a lot in each case), I think it would be easier to follow what's going on.

I tried it, but I still have to do the if statement in case of updateColList, so I think it ends up being messier. I did update the comment a bit in hopes of making it clearer.

RaduBerinde · 2019-01-15T19:45:40Z

pkg/sql/opt/optbuilder/insert.go, line 883 at r3 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

I tried it, but I still have to do the if statement in case of updateColList, so I think it ends up being messier. I did update the comment a bit in hopes of making it clearer.

I don't think you pushed your changes.

It is unnecessary to pass a table descriptor by value to TablesNeededForFKs. This causes the entire struct to be copied by value onto the stack, and also triggers a heap allocation when assigning the address of the stack struct when setting the TableLookup.Table field. Release note: None

When the type check phase returns a NULL value, wrap it with a cast to the desired type (if it's not types.Any). Without this cast, the expression's type is types.Unknown, which can cause typing issues elsewhere in the code. In particular, it was causing a panic after modifying the Upsert operator to use CASE expressions: CREATE TABLE ab (a INT, b INT[] AS (ARRAY[a]) STORED INSERT INTO ab (a) VALUES (NULL) ON CONFLICT (a) DO UPDATE SET b=ARRAY[1] A projection like this is created and causes a panic: SELECT CASE WHEN canary IS NULL THEN ARRAY[NULL] ELSE ARRAY[1] END ... The CASE operator is typed as UNKNOWN[], and DistSQL panics at runtime when it encounters the INT[1] array. The extra type cast ensures that we assign NULL values a static type that will be propagated throughout the expression and prevent cases like this. Release note: None

Currently, the Upsert operator's insert and update columns are separately created and maintained. For example: CREATE TABLE abc (a INT PRIMARY KEY, b INT) INSERT INTO abc VALUES (1) ON CONFLICT (a) DO UPDATE SET b=5 Here are the projected columns: SELECT a AS fetch_a, b AS fetch_b, 1 AS ins_a, NULL AS ins_b, 5 AS upd_b FROM ... This commit constructs CASE statements that merge the insert and update columns, like this: SELECT a AS fetch_a, b AS fetch_b, CASE WHEN fetch_a IS NULL THEN 1 ELSE fetch_a END AS ups_a, CASE WHEN fetch_a IS NULL THEN NULL::INT ELSE 5 END AS ups_b FROM ... The advantages to making this change are: 1. This sets us up to implement check constraints, since they will now be able to operate on the combined columns. 2. This often avoids evaluating the update expression when an insert needs to occur for a given row. 3. This simplifies the property derivation, making Upsert act just like Update, Insert, and Delete. It's now possible to cleanly map each mutation output column to a single input column. Release note: None

andy-kimball

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)

pkg/sql/opt/optbuilder/insert.go, line 883 at r3 (raw file):

Previously, RaduBerinde wrote…

I don't think you pushed your changes.

I ended up finding a bug once I added the assert, and had to change a few things. Should be correct now.

RaduBerinde

Reviewed 5 of 24 files at r4, 8 of 10 files at r5, 13 of 13 files at r6.
Reviewable status: complete! 1 of 0 LGTMs obtained

pkg/sql/opt/optbuilder/insert.go, line 883 at r3 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

I ended up finding a bug once I added the assert, and had to change a few things. Should be correct now.

Nice!

andy-kimball · 2019-01-16T20:03:13Z

bors r+

craig · 2019-01-16T20:41:24Z

Build failed (retrying...)

GitHub CI (Cockroach)

craig · 2019-01-16T21:40:30Z

Build failed

GitHub CI (Cockroach)

andy-kimball · 2019-01-16T22:32:20Z

bors r+

craig · 2019-01-16T22:59:03Z

Build failed (retrying...)

GitHub CI (Cockroach)

craig · 2019-01-16T23:58:43Z

Build failed

GitHub CI (Cockroach)

andy-kimball · 2019-01-17T01:41:39Z

bors r+

33681: opt: Project merged Upsert columns r=andy-kimball a=andy-kimball Currently, the Upsert operator's insert and update columns are separately created and maintained. For example: CREATE TABLE abc (a INT PRIMARY KEY, b INT) INSERT INTO abc VALUES (1) ON CONFLICT (a) DO UPDATE SET b=5 Here are the projected columns: SELECT a AS fetch_a, b AS fetch_b, 1 AS ins_a, NULL AS ins_b, 5 AS upd_b FROM ... This commit constructs CASE statements that merge the insert and update columns, like this: SELECT a AS fetch_a, b AS fetch_b, CASE WHEN fetch_a IS NULL THEN 1 ELSE fetch_a END AS ups_a, CASE WHEN fetch_a IS NULL THEN NULL::INT ELSE 5 END AS ups_b FROM ... The advantages to making this change are: 1. This sets us up to implement check constraints, since they will now be able to operate on the combined columns. 2. This often avoids evaluating the update expression when an insert needs to occur for a given row. 3. This simplifies the property derivation, making Upsert act just like Update, Insert, and Delete. It's now possible to cleanly map each mutation output column to a single input column. Co-authored-by: Andrew Kimball <andyk@cockroachlabs.com>

craig · 2019-01-17T02:16:23Z

Build succeeded

GitHub CI (Cockroach)

andy-kimball requested review from justinj, rytaft, RaduBerinde and a team January 12, 2019 01:48

andy-kimball requested a review from a team as a code owner January 12, 2019 01:48

andy-kimball requested review from a team January 12, 2019 01:48

RaduBerinde approved these changes Jan 14, 2019

View reviewed changes

andy-kimball commented Jan 15, 2019

View reviewed changes

andy-kimball added 3 commits January 16, 2019 10:32

andy-kimball force-pushed the merge branch from 9a4598c to 829125c Compare January 16, 2019 18:44

andy-kimball commented Jan 16, 2019

View reviewed changes

RaduBerinde approved these changes Jan 16, 2019

View reviewed changes

craig bot merged commit 829125c into cockroachdb:master Jan 17, 2019

andy-kimball deleted the merge branch January 17, 2019 22:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opt: Project merged Upsert columns #33681

opt: Project merged Upsert columns #33681

andy-kimball commented Jan 12, 2019

cockroach-teamcity commented Jan 12, 2019

RaduBerinde left a comment

andy-kimball left a comment

RaduBerinde commented Jan 15, 2019

andy-kimball left a comment

RaduBerinde left a comment

andy-kimball commented Jan 16, 2019

craig bot commented Jan 16, 2019

craig bot commented Jan 16, 2019

andy-kimball commented Jan 16, 2019

craig bot commented Jan 16, 2019

craig bot commented Jan 16, 2019

andy-kimball commented Jan 17, 2019

craig bot commented Jan 17, 2019

opt: Project merged Upsert columns #33681

opt: Project merged Upsert columns #33681

Conversation

andy-kimball commented Jan 12, 2019

cockroach-teamcity commented Jan 12, 2019

RaduBerinde left a comment

Choose a reason for hiding this comment

andy-kimball left a comment

Choose a reason for hiding this comment

RaduBerinde commented Jan 15, 2019

andy-kimball left a comment

Choose a reason for hiding this comment

RaduBerinde left a comment

Choose a reason for hiding this comment

andy-kimball commented Jan 16, 2019

craig bot commented Jan 16, 2019

Build failed (retrying...)

craig bot commented Jan 16, 2019

Build failed

andy-kimball commented Jan 16, 2019

craig bot commented Jan 16, 2019

Build failed (retrying...)

craig bot commented Jan 16, 2019

Build failed

andy-kimball commented Jan 17, 2019

craig bot commented Jan 17, 2019

Build succeeded