Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: Project merged Upsert columns #33681

Merged
merged 3 commits into from
Jan 17, 2019
Merged

Conversation

andy-kimball
Copy link
Contributor

Currently, the Upsert operator's insert and update columns are
separately created and maintained. For example:

CREATE TABLE abc (a INT PRIMARY KEY, b INT)
INSERT INTO abc VALUES (1) ON CONFLICT (a) DO UPDATE SET b=5

Here are the projected columns:

SELECT
a AS fetch_a, b AS fetch_b,
1 AS ins_a, NULL AS ins_b,
5 AS upd_b
FROM ...

This commit constructs CASE statements that merge the insert and
update columns, like this:

SELECT
a AS fetch_a, b AS fetch_b,
CASE WHEN fetch_a IS NULL THEN 1 ELSE fetch_a END AS ups_a,
CASE WHEN fetch_a IS NULL THEN NULL::INT ELSE 5 END AS ups_b
FROM ...

The advantages to making this change are:

  1. This sets us up to implement check constraints, since they will now
    be able to operate on the combined columns.
  2. This often avoids evaluating the update expression when an insert
    needs to occur for a given row.
  3. This simplifies the property derivation, making Upsert act just like
    Update, Insert, and Delete. It's now possible to cleanly map each
    mutation output column to a single input column.

@andy-kimball andy-kimball requested a review from a team as a code owner January 12, 2019 01:48
@andy-kimball andy-kimball requested review from a team January 12, 2019 01:48
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 9 of 9 files at r1, 8 of 10 files at r2, 11 of 11 files at r3.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained


pkg/sql/opt/memo/expr.go, line 290 at r3 (raw file):

// (which holds the existing value of the column). And for the Upsert case, use
// either the UpdateCols/FetchCols or the InsertCols value, as long as it is
// non-zero. If both are non-zero, then they must be the same, since they are

[nit] maybe them being the same could be asserted inside the function? (unless it's performance concern)


pkg/sql/opt/optbuilder/insert.go, line 883 at r3 (raw file):

		// column. This will be used by the Upsert operator in place of the
		// original columns.
		if mb.insertColList[i] != 0 {

[nit] Consider doing this inside each case above (it should simplify a lot in each case), I think it would be easier to follow what's going on.

Copy link
Contributor Author

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained


pkg/sql/opt/memo/expr.go, line 290 at r3 (raw file):

Previously, RaduBerinde wrote…

[nit] maybe them being the same could be asserted inside the function? (unless it's performance concern)

Just messier code, but I went ahead and made that change.


pkg/sql/opt/optbuilder/insert.go, line 883 at r3 (raw file):

Previously, RaduBerinde wrote…

[nit] Consider doing this inside each case above (it should simplify a lot in each case), I think it would be easier to follow what's going on.

I tried it, but I still have to do the if statement in case of updateColList, so I think it ends up being messier. I did update the comment a bit in hopes of making it clearer.

@RaduBerinde
Copy link
Member


pkg/sql/opt/optbuilder/insert.go, line 883 at r3 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

I tried it, but I still have to do the if statement in case of updateColList, so I think it ends up being messier. I did update the comment a bit in hopes of making it clearer.

I don't think you pushed your changes.

It is unnecessary to pass a table descriptor by value to TablesNeededForFKs.
This causes the entire struct to be copied by value onto the stack, and also
triggers a heap allocation when assigning the address of the stack struct
when setting the TableLookup.Table field.

Release note: None
When the type check phase returns a NULL value, wrap it with a cast
to the desired type (if it's not types.Any). Without this cast, the
expression's type is types.Unknown, which can cause typing issues
elsewhere in the code. In particular, it was causing a panic after
modifying the Upsert operator to use CASE expressions:

  CREATE TABLE ab (a INT, b INT[] AS (ARRAY[a]) STORED
  INSERT INTO ab (a) VALUES (NULL) ON CONFLICT (a) DO UPDATE SET b=ARRAY[1]

A projection like this is created and causes a panic:

  SELECT
    CASE WHEN canary IS NULL THEN ARRAY[NULL] ELSE ARRAY[1] END
  ...

The CASE operator is typed as UNKNOWN[], and DistSQL panics at runtime
when it encounters the INT[1] array.

The extra type cast ensures that we assign NULL values a static type that
will be propagated throughout the expression and prevent cases like this.

Release note: None
Currently, the Upsert operator's insert and update columns are
separately created and maintained. For example:

  CREATE TABLE abc (a INT PRIMARY KEY, b INT)
  INSERT INTO abc VALUES (1) ON CONFLICT (a) DO UPDATE SET b=5

Here are the projected columns:

  SELECT
    a AS fetch_a, b AS fetch_b,
    1 AS ins_a, NULL AS ins_b,
    5 AS upd_b
  FROM ...

This commit constructs CASE statements that merge the insert and
update columns, like this:

  SELECT
    a AS fetch_a, b AS fetch_b,
    CASE WHEN fetch_a IS NULL THEN 1 ELSE fetch_a END AS ups_a,
    CASE WHEN fetch_a IS NULL THEN NULL::INT ELSE 5 END AS ups_b
  FROM ...

The advantages to making this change are:

1. This sets us up to implement check constraints, since they will now
   be able to operate on the combined columns.
2. This often avoids evaluating the update expression when an insert
   needs to occur for a given row.
3. This simplifies the property derivation, making Upsert act just like
   Update, Insert, and Delete. It's now possible to cleanly map each
   mutation output column to a single input column.

Release note: None
Copy link
Contributor Author

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/sql/opt/optbuilder/insert.go, line 883 at r3 (raw file):

Previously, RaduBerinde wrote…

I don't think you pushed your changes.

I ended up finding a bug once I added the assert, and had to change a few things. Should be correct now.

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 5 of 24 files at r4, 8 of 10 files at r5, 13 of 13 files at r6.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained


pkg/sql/opt/optbuilder/insert.go, line 883 at r3 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

I ended up finding a bug once I added the assert, and had to change a few things. Should be correct now.

Nice!

@andy-kimball
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jan 16, 2019

Build failed (retrying...)

@craig
Copy link
Contributor

craig bot commented Jan 16, 2019

Build failed

@andy-kimball
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jan 16, 2019

Build failed (retrying...)

@craig
Copy link
Contributor

craig bot commented Jan 16, 2019

Build failed

@andy-kimball
Copy link
Contributor Author

bors r+

craig bot pushed a commit that referenced this pull request Jan 17, 2019
33681: opt: Project merged Upsert columns r=andy-kimball a=andy-kimball

Currently, the Upsert operator's insert and update columns are
separately created and maintained. For example:

  CREATE TABLE abc (a INT PRIMARY KEY, b INT)
  INSERT INTO abc VALUES (1) ON CONFLICT (a) DO UPDATE SET b=5

Here are the projected columns:

  SELECT
    a AS fetch_a, b AS fetch_b,
    1 AS ins_a, NULL AS ins_b,
    5 AS upd_b
  FROM ...

This commit constructs CASE statements that merge the insert and
update columns, like this:

  SELECT
    a AS fetch_a, b AS fetch_b,
    CASE WHEN fetch_a IS NULL THEN 1 ELSE fetch_a END AS ups_a,
    CASE WHEN fetch_a IS NULL THEN NULL::INT ELSE 5 END AS ups_b
  FROM ...

The advantages to making this change are:

1. This sets us up to implement check constraints, since they will now
   be able to operate on the combined columns.
2. This often avoids evaluating the update expression when an insert
   needs to occur for a given row.
3. This simplifies the property derivation, making Upsert act just like
   Update, Insert, and Delete. It's now possible to cleanly map each
   mutation output column to a single input column.


Co-authored-by: Andrew Kimball <andyk@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented Jan 17, 2019

Build succeeded

@craig craig bot merged commit 829125c into cockroachdb:master Jan 17, 2019
@andy-kimball andy-kimball deleted the merge branch January 17, 2019 22:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants