Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: Adding support for integral aliases. #16548

Merged
merged 1 commit into from
Jun 26, 2017

Conversation

m-schneider
Copy link
Contributor

Before this patch we would discard more precise type informations for types
such as INT8 and INT64. SHOW CREATE TABLE would show INT even if the user
specified either of those.

After this patch we'll save this information. SHOW CREATE TABLE will now
display the correct type.

@CLAassistant
Copy link

CLAassistant commented Jun 15, 2017

CLA assistant check
All committers have signed the CLA.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@jordanlewis
Copy link
Member

Nice - this looks good structurally.


Review status: 0 of 5 files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


pkg/sql/sqlbase/structured.proto, line 71 at r1 (raw file):

    NONE = 0;
    INT8 = 1;
    INT64 = 2;

Hah, I didn't realize we actually had references to INT64 in our codebase. That is very confusing because INT64 is not a standard sql type name - I don't think it exists in any other database. SQL databases seem to refer to integer types with either imprecise english names (smallint, int, bigint, and... mediumint in mysql which is apparently a 3 byte integer?) or precise names that have the number of bytes appended - such as int2, int4, int8.

If I had my druthers we'd purge the INT64 type alias from our database entirely as it's nonstandard and confusing. @knz @petermattis thoughts on that?

In any event, I think this looks good structurally. I believe to achieve our goals of compatibility we will only need 2 tags in here for the int type family - int4 and int2. That's because we don't care to preserve the particular name that the user specified for a 2-byte integer, for example - so whether they typed smallint or int2 is irrelevant - we just care that they specified a 2-byte integer at all.


pkg/sql/sqlbase/table.go, line 112 at r1 (raw file):

		}

		if subtypeMap != nil {

I'm not sure we need to do this nil check dance unless you're anticipating something about protobuf changes that I'm not familiar with. Can you instead use the ColumnType_SubKind_value map that should get generated in structured.pb.go? ISTM that would avoid having to call proto.EnumValueMap above as well.


Comments from Reviewable

@bdarnell
Copy link
Contributor

Review status: 0 of 5 files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


pkg/sql/sqlbase/structured.proto, line 71 at r1 (raw file):

INT64 is not a standard sql type name - I don't think it exists in any other database

Spanner has it, and it is their only integer type (and FLOAT64 is their only float type).

If we were pre-1.0 I'd say get rid of INT64, but since it was documented in 1.0 I think we ought to keep it (even though its presence makes int8 ambiguous).

I'm not sure that we need both INT8 and INT64 at this level though. We have the intColType enums in the parser package to make sure that we can reproduce the original type as written, even for exactly-equivalent types like INT and INTEGER. I think at this level we only need to know the size of the integers instead of the type as written.


Comments from Reviewable

@maddyblue
Copy link
Contributor

Review status: 0 of 5 files reviewed at latest revision, 3 unresolved discussions, all commit checks successful.


pkg/sql/sqlbase/structured.proto, line 68 at r2 (raw file):

  }

  enum SubKind {

Does it make sense to use the width or precision fields instead of adding a new one? They convey a similar limitation and wouldn't require a proto change.


Comments from Reviewable

@knz
Copy link
Contributor

knz commented Jun 16, 2017

Reviewed 4 of 5 files at r1, 1 of 1 files at r2.
Review status: all files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.


pkg/sql/sqlbase/structured.proto, line 71 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

INT64 is not a standard sql type name - I don't think it exists in any other database

Spanner has it, and it is their only integer type (and FLOAT64 is their only float type).

If we were pre-1.0 I'd say get rid of INT64, but since it was documented in 1.0 I think we ought to keep it (even though its presence makes int8 ambiguous).

I'm not sure that we need both INT8 and INT64 at this level though. We have the intColType enums in the parser package to make sure that we can reproduce the original type as written, even for exactly-equivalent types like INT and INTEGER. I think at this level we only need to know the size of the integers instead of the type as written.

hugh then

  1. can we take this opportunity to remove intColType altogether and use this new mechanism instead

  2. for better or worse int8 is supported by postgres and we support it for postgres compatibility. I think it doesn't hurt to keep it, but I would recommend that a doc issue be filed concomitant to this PR to highlight what is the purpose of all these types, and have the explanation say that "int" is what we recommend, whereas 2.a) "integer" is for ansi sql compat 2.b) "intN" is for pg compat and 2.c) "int64" is for spanner compat


pkg/sql/sqlbase/structured.proto, line 68 at r2 (raw file):

Previously, mjibson (Matt Jibson) wrote…

Does it make sense to use the width or precision fields instead of adding a new one? They convey a similar limitation and wouldn't require a proto change.

Maybe precision then -- it seems to me that width would be a restriction on top of the type, e.g. to use fewer bits. I'm not sure -- perhaps an int should just not have a precision field at all and instead have its "kind"


pkg/sql/sqlbase/structured.proto, line 68 at r2 (raw file):

  }

  enum SubKind {
  1. I am not fond of the name "kind". FYI both Arjun and I are miffed by the existing use of the word "kind" above and elsewhere in the code base -- what we call "kind" is really what should be called "type"

  2. the idea of "sub"-kind (and also "sub"-type) has actually a particular meaning in type theory and it's not clear to me this is what is happening here (one-time hint: that's my code word for "I think it doesn't make sense").

So perhaps here the better name would be "ApparentType" ("Apparent" because that's what the user sees even if the actual implementation type is something else) or maybe just "TypeAlias" so that is better resonates with the name of the struct member you define below.

then

  1. perhaps this enum (and the corresponding field below) deserves a comment to clarify that despite the current fact that it's only used for integral types, the mechanism is rather general and can probably be used to define apparent aliases for any type.

pkg/sql/sqlbase/table.go, line 97 at r2 (raw file):

	colDatumType := parser.CastTargetToDatumType(d.Type)
	col.Type = DatumTypeToColumnType(colDatumType)
	var subtypeMap = proto.EnumValueMap("cockroach.sql.sqlbase.ColumnType_SubKind")

s/subtype/subType/
then
s/subType/apparentType/ or s/subType/typeAlias/


Comments from Reviewable

@knz
Copy link
Contributor

knz commented Jun 16, 2017

Don't forget to squash your commits here since they are related.


Review status: all files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.


Comments from Reviewable

@m-schneider
Copy link
Contributor Author

Review status: 2 of 5 files reviewed at latest revision, 5 unresolved discussions, some commit checks pending.


pkg/sql/sqlbase/structured.proto, line 71 at r1 (raw file):

Previously, knz (kena) wrote…

hugh then

  1. can we take this opportunity to remove intColType altogether and use this new mechanism instead

  2. for better or worse int8 is supported by postgres and we support it for postgres compatibility. I think it doesn't hurt to keep it, but I would recommend that a doc issue be filed concomitant to this PR to highlight what is the purpose of all these types, and have the explanation say that "int" is what we recommend, whereas 2.a) "integer" is for ansi sql compat 2.b) "intN" is for pg compat and 2.c) "int64" is for spanner compat

  1. By getting rid of intColType do you mean creating more integral col types?
  2. I'll file a docs issue! It should probably be a bit more broad that for integral types though because I'm planning on adding support for FLOAT4 and FLOAT8 and other aliased numerical types as well in a followup PR.

pkg/sql/sqlbase/structured.proto, line 68 at r2 (raw file):

Previously, knz (kena) wrote…

Maybe precision then -- it seems to me that width would be a restriction on top of the type, e.g. to use fewer bits. I'm not sure -- perhaps an int should just not have a precision field at all and instead have its "kind"

I was thinking of reusing the precision field, but then we'd still have the issue that we don't echo back the user's precise specifications back at them when they issue a SHOW CREATE TABLE command because int8 would just become an int again.


pkg/sql/sqlbase/structured.proto, line 68 at r2 (raw file):

Previously, knz (kena) wrote…
  1. I am not fond of the name "kind". FYI both Arjun and I are miffed by the existing use of the word "kind" above and elsewhere in the code base -- what we call "kind" is really what should be called "type"

  2. the idea of "sub"-kind (and also "sub"-type) has actually a particular meaning in type theory and it's not clear to me this is what is happening here (one-time hint: that's my code word for "I think it doesn't make sense").

So perhaps here the better name would be "ApparentType" ("Apparent" because that's what the user sees even if the actual implementation type is something else) or maybe just "TypeAlias" so that is better resonates with the name of the struct member you define below.

then

  1. perhaps this enum (and the corresponding field below) deserves a comment to clarify that despite the current fact that it's only used for integral types, the mechanism is rather general and can probably be used to define apparent aliases for any type.
  1. Yes I agree Kind is a bit of a strange word for describing what is essentially a type.
  2. What do you think of AliasType vs TypeAlias which to me has too many connotations in programming languages in general.
  3. Agreed, will make the comment clearer.

pkg/sql/sqlbase/table.go, line 112 at r1 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

I'm not sure we need to do this nil check dance unless you're anticipating something about protobuf changes that I'm not familiar with. Can you instead use the ColumnType_SubKind_value map that should get generated in structured.pb.go? ISTM that would avoid having to call proto.EnumValueMap above as well.

Thanks for the Go pointer!


pkg/sql/sqlbase/table.go, line 97 at r2 (raw file):

Previously, knz (kena) wrote…

s/subtype/subType/
then
s/subType/apparentType/ or s/subType/typeAlias/

What are you thoughts on renaming the proto type to AliasType.


Comments from Reviewable

@knz
Copy link
Contributor

knz commented Jun 18, 2017

Reviewed 3 of 3 files at r3.
Review status: all files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.


pkg/sql/sqlbase/structured.proto, line 71 at r1 (raw file):

Previously, m-schneider (Masha Schneider) wrote…
  1. By getting rid of intColType do you mean creating more integral col types?
  2. I'll file a docs issue! It should probably be a bit more broad that for integral types though because I'm planning on adding support for FLOAT4 and FLOAT8 and other aliased numerical types as well in a followup PR.

1 - yes, absolutely! less code is good code
2 - agreed


pkg/sql/sqlbase/structured.proto, line 68 at r2 (raw file):

Previously, m-schneider (Masha Schneider) wrote…

I was thinking of reusing the precision field, but then we'd still have the issue that we don't echo back the user's precise specifications back at them when they issue a SHOW CREATE TABLE command because int8 would just become an int again.

👍


pkg/sql/sqlbase/structured.proto, line 68 at r2 (raw file):

Previously, m-schneider (Masha Schneider) wrote…
  1. Yes I agree Kind is a bit of a strange word for describing what is essentially a type.
  2. What do you think of AliasType vs TypeAlias which to me has too many connotations in programming languages in general.
  3. Agreed, will make the comment clearer.

I had to think about this more today, especially in the light of your other comment that is suggesting to extend also FLOAT with aliases.

First of all to clarify the rest of this comment and subsequent discussion, let's name things properly: we'll have the semantic type ("SemType") corresponding to the current "kind", which influences how computations are performed and which Go type is storing the value, and the visible type ("VisType" - sorry I'll punt on "alias" for now), what you are introducing, which influences what clients see.

I have two orthogonal proposals which I would like to bring to the table:

A) encode both semtype and vistype into a single field, using e.g. low-order bits for semtype and high-order bits for vistype. This idea stems from the observation it's wasteful to spend 2 times 32 bits in memory (when the struct is loaded - protobuf encodes efficiently) when very few bits would be sufficient. The advantage of this proposal is that it would make the introduction of vistypes backward-compatible: a high-order word set to 0 says that the vistype is the "canonical" vistype. With that proposal you would not introduce a separate enum here but rather extend the list of possible values in the enum above, for example int4 = (1 << 16 | 1) , smallint = (2 << 16 | 1), int8 = (3 << 16 | 1), int64 = (4 << 16 | 1), etc. This then naturally allows us to alias any other semtype the same way.

B) regardless of whether you encode them together (prop A above) or not (your current code), take the opportunity to rename kind -> semtype and subkind -> vistype throughout the code 😸

To come back to the word "alias" I am not too keen on introducing this now, as you said the word "alias" is super-overloaded and also it doesn't completely clarify (in the name) what an alias really means. Hence my idea of semtype vs vistype.


pkg/sql/sqlbase/table.go, line 97 at r2 (raw file):

Previously, m-schneider (Masha Schneider) wrote…

What are you thoughts on renaming the proto type to AliasType.

see above


Comments from Reviewable

@eisenstatdavid
Copy link

Review status: all files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.


pkg/sql/sqlbase/structured.proto, line 68 at r2 (raw file):

Previously, knz (kena) wrote…

👍

The precision field for INT columns is 0 currently, so we could let 0 -> INT and 8 -> INT8 (though distinguishing int64 from int8 could be tough).


Comments from Reviewable

@jordanlewis
Copy link
Member

Review status: all files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.


pkg/sql/sqlbase/structured.proto, line 68 at r2 (raw file):

Previously, eisenstatdavid (David Eisenstat) wrote…

The precision field for INT columns is 0 currently, so we could let 0 -> INT and 8 -> INT8 (though distinguishing int64 from int8 could be tough).

We do not need to distinguish between types of the same length like int64/int8/bigint/int - just between types of different length like int8/int4/int2.


Comments from Reviewable

@knz
Copy link
Contributor

knz commented Jun 19, 2017

@jordanlewis Just curious: do we need to also actually compute differently because pg thinks they have different lengths? Would clients notice? e.g. with regards to overflow/range. (this is actually a question David was asking earlier, but now I'm curious too)

@m-schneider
Copy link
Contributor Author

We're planning on replicating PG behavior by throwing error messages when clients try to insert out of range values. I could see clients running legacy pg code using the error message.


Review status: 1 of 5 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.


pkg/sql/sqlbase/structured.proto, line 71 at r1 (raw file):

Previously, knz (kena) wrote…

1 - yes, absolutely! less code is good code
2 - agreed

  1. Would it be less code though? Would we want more datum types? I'm thinking that probably not because we would be able to reuse the precision field at that level.
  2. I talked to Jesse and I'll add docs todos in the followup PR.

pkg/sql/sqlbase/structured.proto, line 68 at r2 (raw file):

Previously, knz (kena) wrote…

I had to think about this more today, especially in the light of your other comment that is suggesting to extend also FLOAT with aliases.

First of all to clarify the rest of this comment and subsequent discussion, let's name things properly: we'll have the semantic type ("SemType") corresponding to the current "kind", which influences how computations are performed and which Go type is storing the value, and the visible type ("VisType" - sorry I'll punt on "alias" for now), what you are introducing, which influences what clients see.

I have two orthogonal proposals which I would like to bring to the table:

A) encode both semtype and vistype into a single field, using e.g. low-order bits for semtype and high-order bits for vistype. This idea stems from the observation it's wasteful to spend 2 times 32 bits in memory (when the struct is loaded - protobuf encodes efficiently) when very few bits would be sufficient. The advantage of this proposal is that it would make the introduction of vistypes backward-compatible: a high-order word set to 0 says that the vistype is the "canonical" vistype. With that proposal you would not introduce a separate enum here but rather extend the list of possible values in the enum above, for example int4 = (1 << 16 | 1) , smallint = (2 << 16 | 1), int8 = (3 << 16 | 1), int64 = (4 << 16 | 1), etc. This then naturally allows us to alias any other semtype the same way.

B) regardless of whether you encode them together (prop A above) or not (your current code), take the opportunity to rename kind -> semtype and subkind -> vistype throughout the code 😸

To come back to the word "alias" I am not too keen on introducing this now, as you said the word "alias" is super-overloaded and also it doesn't completely clarify (in the name) what an alias really means. Hence my idea of semtype vs vistype.

So now that we have a reasonable name for Semantic Type, I've renamed AliasType to VisibleType. I think the bit shifting will add extra complexity for a minor space saving in the order of 32 bits per column, so I'm keeping the second field.


pkg/sql/sqlbase/structured.proto, line 68 at r2 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

We do not need to distinguish between types of the same length like int64/int8/bigint/int - just between types of different length like int8/int4/int2.

If we replicate pg behavior then we would distinguish between types of the same length.


Comments from Reviewable

@knz
Copy link
Contributor

knz commented Jun 22, 2017

Reviewed 4 of 4 files at r4.
Review status: all files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.


pkg/sql/sqlbase/structured.proto, line 71 at r1 (raw file):

Previously, m-schneider (Masha Schneider) wrote…
  1. Would it be less code though? Would we want more datum types? I'm thinking that probably not because we would be able to reuse the precision field at that level.
  2. I talked to Jesse and I'll add docs todos in the followup PR.

See my comment above. I think it is probably necessary for semantic correctness to unify them.


pkg/sql/sqlbase/structured.proto, line 68 at r2 (raw file):

Previously, m-schneider (Masha Schneider) wrote…

If we replicate pg behavior then we would distinguish between types of the same length.

Yes not only we need to distinguish between them but also propagate them; for example int4column + 123 should have type int4 not int. That's why I think a mechanism based on OidWrapper or similar for typing is going to be necessary. Perhaps in a later iteration?


pkg/sql/sqlbase/structured.proto, line 68 at r2 (raw file):

Previously, m-schneider (Masha Schneider) wrote…

So now that we have a reasonable name for Semantic Type, I've renamed AliasType to VisibleType. I think the bit shifting will add extra complexity for a minor space saving in the order of 32 bits per column, so I'm keeping the second field.

ack


Comments from Reviewable

@knz
Copy link
Contributor

knz commented Jun 22, 2017

LGTM for this PR assuming you fix the broken tests in show_trace that broke due to the new descriptor format.

However please ensure either in the next PRs or by opening a follow-up issue that the question of what to do with equivalent types but with different names (e.g. int8 vs int4) and how they propagate throughout type checking is properly addressed and taken care of. See discussion below.


Review status: all files reviewed at latest revision, 2 unresolved discussions, some commit checks failed.


Comments from Reviewable

@m-schneider
Copy link
Contributor Author

Review status: 5 of 6 files reviewed at latest revision, 2 unresolved discussions, some commit checks failed.


pkg/sql/sqlbase/structured.proto, line 71 at r1 (raw file):

Previously, knz (kena) wrote…

See my comment above. I think it is probably necessary for semantic correctness to unify them.

I'm answering both this and your comment above here. I'm thinking that changing DInt should be in a follow up PR. I looked at DOidWrapper and that looks like the right way to go. The question is do we want to add that complexity? We don't currently do anything for decimal precision beyond update and insert.


Comments from Reviewable

Before this patch we would discard more precise type informations for types
such as INT8 and INT64. SHOW CREATE TABLE would show INT even if the user
specified either of those.

After this patch we'll save this information. SHOW CREATE TABLE will now
display the correct type.
@m-schneider
Copy link
Contributor Author

Everything is passing! I'll send out the the next pr for type checking.


Review status: 5 of 6 files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


Comments from Reviewable

@m-schneider m-schneider merged commit ae0872a into cockroachdb:master Jun 26, 2017
@m-schneider m-schneider deleted the int4 branch June 26, 2017 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants