sql: Adding support for integral aliases. #16548

m-schneider · 2017-06-15T21:24:06Z

Before this patch we would discard more precise type informations for types
such as INT8 and INT64. SHOW CREATE TABLE would show INT even if the user
specified either of those.

After this patch we'll save this information. SHOW CREATE TABLE will now
display the correct type.

CLAassistant · 2017-06-15T21:24:13Z

All committers have signed the CLA.

cockroach-teamcity · 2017-06-15T21:24:13Z

This change is

jordanlewis · 2017-06-15T21:55:48Z

Nice - this looks good structurally.

Review status: 0 of 5 files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.

pkg/sql/sqlbase/structured.proto, line 71 at r1 (raw file):

    NONE = 0;
    INT8 = 1;
    INT64 = 2;

Hah, I didn't realize we actually had references to INT64 in our codebase. That is very confusing because INT64 is not a standard sql type name - I don't think it exists in any other database. SQL databases seem to refer to integer types with either imprecise english names (smallint, int, bigint, and... mediumint in mysql which is apparently a 3 byte integer?) or precise names that have the number of bytes appended - such as int2, int4, int8.

If I had my druthers we'd purge the INT64 type alias from our database entirely as it's nonstandard and confusing. @knz @petermattis thoughts on that?

In any event, I think this looks good structurally. I believe to achieve our goals of compatibility we will only need 2 tags in here for the int type family - int4 and int2. That's because we don't care to preserve the particular name that the user specified for a 2-byte integer, for example - so whether they typed smallint or int2 is irrelevant - we just care that they specified a 2-byte integer at all.

pkg/sql/sqlbase/table.go, line 112 at r1 (raw file):

		}

		if subtypeMap != nil {

I'm not sure we need to do this nil check dance unless you're anticipating something about protobuf changes that I'm not familiar with. Can you instead use the ColumnType_SubKind_value map that should get generated in structured.pb.go? ISTM that would avoid having to call proto.EnumValueMap above as well.

Comments from Reviewable

bdarnell · 2017-06-15T22:46:16Z

Review status: 0 of 5 files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.

pkg/sql/sqlbase/structured.proto, line 71 at r1 (raw file):

INT64 is not a standard sql type name - I don't think it exists in any other database

Spanner has it, and it is their only integer type (and FLOAT64 is their only float type).

If we were pre-1.0 I'd say get rid of INT64, but since it was documented in 1.0 I think we ought to keep it (even though its presence makes int8 ambiguous).

I'm not sure that we need both INT8 and INT64 at this level though. We have the intColType enums in the parser package to make sure that we can reproduce the original type as written, even for exactly-equivalent types like INT and INTEGER. I think at this level we only need to know the size of the integers instead of the type as written.

Comments from Reviewable

maddyblue · 2017-06-16T05:14:27Z

Review status: 0 of 5 files reviewed at latest revision, 3 unresolved discussions, all commit checks successful.

pkg/sql/sqlbase/structured.proto, line 68 at r2 (raw file):

  }

  enum SubKind {

Does it make sense to use the width or precision fields instead of adding a new one? They convey a similar limitation and wouldn't require a proto change.

Comments from Reviewable

knz · 2017-06-16T09:05:07Z

Reviewed 4 of 5 files at r1, 1 of 1 files at r2.
Review status: all files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.

pkg/sql/sqlbase/structured.proto, line 71 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

INT64 is not a standard sql type name - I don't think it exists in any other database

Spanner has it, and it is their only integer type (and FLOAT64 is their only float type).

If we were pre-1.0 I'd say get rid of INT64, but since it was documented in 1.0 I think we ought to keep it (even though its presence makes int8 ambiguous).

I'm not sure that we need both INT8 and INT64 at this level though. We have the intColType enums in the parser package to make sure that we can reproduce the original type as written, even for exactly-equivalent types like INT and INTEGER. I think at this level we only need to know the size of the integers instead of the type as written.

hugh then

can we take this opportunity to remove intColType altogether and use this new mechanism instead
for better or worse int8 is supported by postgres and we support it for postgres compatibility. I think it doesn't hurt to keep it, but I would recommend that a doc issue be filed concomitant to this PR to highlight what is the purpose of all these types, and have the explanation say that "int" is what we recommend, whereas 2.a) "integer" is for ansi sql compat 2.b) "intN" is for pg compat and 2.c) "int64" is for spanner compat

pkg/sql/sqlbase/structured.proto, line 68 at r2 (raw file):

Previously, mjibson (Matt Jibson) wrote…

Does it make sense to use the width or precision fields instead of adding a new one? They convey a similar limitation and wouldn't require a proto change.

Maybe precision then -- it seems to me that width would be a restriction on top of the type, e.g. to use fewer bits. I'm not sure -- perhaps an int should just not have a precision field at all and instead have its "kind"

pkg/sql/sqlbase/structured.proto, line 68 at r2 (raw file):

  }

  enum SubKind {

I am not fond of the name "kind". FYI both Arjun and I are miffed by the existing use of the word "kind" above and elsewhere in the code base -- what we call "kind" is really what should be called "type"
the idea of "sub"-kind (and also "sub"-type) has actually a particular meaning in type theory and it's not clear to me this is what is happening here (one-time hint: that's my code word for "I think it doesn't make sense").

So perhaps here the better name would be "ApparentType" ("Apparent" because that's what the user sees even if the actual implementation type is something else) or maybe just "TypeAlias" so that is better resonates with the name of the struct member you define below.

then

perhaps this enum (and the corresponding field below) deserves a comment to clarify that despite the current fact that it's only used for integral types, the mechanism is rather general and can probably be used to define apparent aliases for any type.

pkg/sql/sqlbase/table.go, line 97 at r2 (raw file):

	colDatumType := parser.CastTargetToDatumType(d.Type)
	col.Type = DatumTypeToColumnType(colDatumType)
	var subtypeMap = proto.EnumValueMap("cockroach.sql.sqlbase.ColumnType_SubKind")

s/subtype/subType/
then
s/subType/apparentType/ or s/subType/typeAlias/

Comments from Reviewable

knz · 2017-06-16T09:07:05Z

Don't forget to squash your commits here since they are related.

Review status: all files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.

Comments from Reviewable

m-schneider · 2017-06-16T15:46:56Z

Review status: 2 of 5 files reviewed at latest revision, 5 unresolved discussions, some commit checks pending.

pkg/sql/sqlbase/structured.proto, line 71 at r1 (raw file):

Previously, knz (kena) wrote…

hugh then

can we take this opportunity to remove intColType altogether and use this new mechanism instead

for better or worse int8 is supported by postgres and we support it for postgres compatibility. I think it doesn't hurt to keep it, but I would recommend that a doc issue be filed concomitant to this PR to highlight what is the purpose of all these types, and have the explanation say that "int" is what we recommend, whereas 2.a) "integer" is for ansi sql compat 2.b) "intN" is for pg compat and 2.c) "int64" is for spanner compat

By getting rid of intColType do you mean creating more integral col types?
I'll file a docs issue! It should probably be a bit more broad that for integral types though because I'm planning on adding support for FLOAT4 and FLOAT8 and other aliased numerical types as well in a followup PR.

pkg/sql/sqlbase/structured.proto, line 68 at r2 (raw file):

Previously, knz (kena) wrote…

Maybe precision then -- it seems to me that width would be a restriction on top of the type, e.g. to use fewer bits. I'm not sure -- perhaps an int should just not have a precision field at all and instead have its "kind"

I was thinking of reusing the precision field, but then we'd still have the issue that we don't echo back the user's precise specifications back at them when they issue a SHOW CREATE TABLE command because int8 would just become an int again.

pkg/sql/sqlbase/structured.proto, line 68 at r2 (raw file):

Previously, knz (kena) wrote…

I am not fond of the name "kind". FYI both Arjun and I are miffed by the existing use of the word "kind" above and elsewhere in the code base -- what we call "kind" is really what should be called "type"

the idea of "sub"-kind (and also "sub"-type) has actually a particular meaning in type theory and it's not clear to me this is what is happening here (one-time hint: that's my code word for "I think it doesn't make sense").

So perhaps here the better name would be "ApparentType" ("Apparent" because that's what the user sees even if the actual implementation type is something else) or maybe just "TypeAlias" so that is better resonates with the name of the struct member you define below.

then

perhaps this enum (and the corresponding field below) deserves a comment to clarify that despite the current fact that it's only used for integral types, the mechanism is rather general and can probably be used to define apparent aliases for any type.

Yes I agree Kind is a bit of a strange word for describing what is essentially a type.
What do you think of AliasType vs TypeAlias which to me has too many connotations in programming languages in general.
Agreed, will make the comment clearer.

pkg/sql/sqlbase/table.go, line 112 at r1 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

I'm not sure we need to do this nil check dance unless you're anticipating something about protobuf changes that I'm not familiar with. Can you instead use the ColumnType_SubKind_value map that should get generated in structured.pb.go? ISTM that would avoid having to call proto.EnumValueMap above as well.

Thanks for the Go pointer!

pkg/sql/sqlbase/table.go, line 97 at r2 (raw file):

Previously, knz (kena) wrote…

s/subtype/subType/
then
s/subType/apparentType/ or s/subType/typeAlias/

What are you thoughts on renaming the proto type to AliasType.

Comments from Reviewable

knz · 2017-06-18T21:23:02Z

Reviewed 3 of 3 files at r3.
Review status: all files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.

pkg/sql/sqlbase/structured.proto, line 71 at r1 (raw file):

Previously, m-schneider (Masha Schneider) wrote…

By getting rid of intColType do you mean creating more integral col types?

I'll file a docs issue! It should probably be a bit more broad that for integral types though because I'm planning on adding support for FLOAT4 and FLOAT8 and other aliased numerical types as well in a followup PR.

1 - yes, absolutely! less code is good code
2 - agreed

pkg/sql/sqlbase/structured.proto, line 68 at r2 (raw file):

Previously, m-schneider (Masha Schneider) wrote…

I was thinking of reusing the precision field, but then we'd still have the issue that we don't echo back the user's precise specifications back at them when they issue a SHOW CREATE TABLE command because int8 would just become an int again.

👍

pkg/sql/sqlbase/structured.proto, line 68 at r2 (raw file):

Previously, m-schneider (Masha Schneider) wrote…

Yes I agree Kind is a bit of a strange word for describing what is essentially a type.

What do you think of AliasType vs TypeAlias which to me has too many connotations in programming languages in general.

Agreed, will make the comment clearer.

I had to think about this more today, especially in the light of your other comment that is suggesting to extend also FLOAT with aliases.

First of all to clarify the rest of this comment and subsequent discussion, let's name things properly: we'll have the semantic type ("SemType") corresponding to the current "kind", which influences how computations are performed and which Go type is storing the value, and the visible type ("VisType" - sorry I'll punt on "alias" for now), what you are introducing, which influences what clients see.

I have two orthogonal proposals which I would like to bring to the table:

A) encode both semtype and vistype into a single field, using e.g. low-order bits for semtype and high-order bits for vistype. This idea stems from the observation it's wasteful to spend 2 times 32 bits in memory (when the struct is loaded - protobuf encodes efficiently) when very few bits would be sufficient. The advantage of this proposal is that it would make the introduction of vistypes backward-compatible: a high-order word set to 0 says that the vistype is the "canonical" vistype. With that proposal you would not introduce a separate enum here but rather extend the list of possible values in the enum above, for example int4 = (1 << 16 | 1) , smallint = (2 << 16 | 1), int8 = (3 << 16 | 1), int64 = (4 << 16 | 1), etc. This then naturally allows us to alias any other semtype the same way.

B) regardless of whether you encode them together (prop A above) or not (your current code), take the opportunity to rename kind -> semtype and subkind -> vistype throughout the code 😸

To come back to the word "alias" I am not too keen on introducing this now, as you said the word "alias" is super-overloaded and also it doesn't completely clarify (in the name) what an alias really means. Hence my idea of semtype vs vistype.

pkg/sql/sqlbase/table.go, line 97 at r2 (raw file):

Previously, m-schneider (Masha Schneider) wrote…

What are you thoughts on renaming the proto type to AliasType.

see above

Comments from Reviewable

eisenstatdavid · 2017-06-19T16:12:42Z

Review status: all files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.

pkg/sql/sqlbase/structured.proto, line 68 at r2 (raw file):

Previously, knz (kena) wrote…

👍

The precision field for INT columns is 0 currently, so we could let 0 -> INT and 8 -> INT8 (though distinguishing int64 from int8 could be tough).

Comments from Reviewable

jordanlewis · 2017-06-19T17:34:58Z

Review status: all files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.

pkg/sql/sqlbase/structured.proto, line 68 at r2 (raw file):

Previously, eisenstatdavid (David Eisenstat) wrote…

The precision field for INT columns is 0 currently, so we could let 0 -> INT and 8 -> INT8 (though distinguishing int64 from int8 could be tough).

We do not need to distinguish between types of the same length like int64/int8/bigint/int - just between types of different length like int8/int4/int2.

Comments from Reviewable

knz · 2017-06-19T17:37:29Z

@jordanlewis Just curious: do we need to also actually compute differently because pg thinks they have different lengths? Would clients notice? e.g. with regards to overflow/range. (this is actually a question David was asking earlier, but now I'm curious too)

m-schneider · 2017-06-21T15:26:30Z

We're planning on replicating PG behavior by throwing error messages when clients try to insert out of range values. I could see clients running legacy pg code using the error message.

Review status: 1 of 5 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.

pkg/sql/sqlbase/structured.proto, line 71 at r1 (raw file):

Previously, knz (kena) wrote…

1 - yes, absolutely! less code is good code
2 - agreed

Would it be less code though? Would we want more datum types? I'm thinking that probably not because we would be able to reuse the precision field at that level.
I talked to Jesse and I'll add docs todos in the followup PR.

pkg/sql/sqlbase/structured.proto, line 68 at r2 (raw file):

Previously, knz (kena) wrote…

I had to think about this more today, especially in the light of your other comment that is suggesting to extend also FLOAT with aliases.

First of all to clarify the rest of this comment and subsequent discussion, let's name things properly: we'll have the semantic type ("SemType") corresponding to the current "kind", which influences how computations are performed and which Go type is storing the value, and the visible type ("VisType" - sorry I'll punt on "alias" for now), what you are introducing, which influences what clients see.

I have two orthogonal proposals which I would like to bring to the table:

A) encode both semtype and vistype into a single field, using e.g. low-order bits for semtype and high-order bits for vistype. This idea stems from the observation it's wasteful to spend 2 times 32 bits in memory (when the struct is loaded - protobuf encodes efficiently) when very few bits would be sufficient. The advantage of this proposal is that it would make the introduction of vistypes backward-compatible: a high-order word set to 0 says that the vistype is the "canonical" vistype. With that proposal you would not introduce a separate enum here but rather extend the list of possible values in the enum above, for example int4 = (1 << 16 | 1) , smallint = (2 << 16 | 1), int8 = (3 << 16 | 1), int64 = (4 << 16 | 1), etc. This then naturally allows us to alias any other semtype the same way.

B) regardless of whether you encode them together (prop A above) or not (your current code), take the opportunity to rename kind -> semtype and subkind -> vistype throughout the code 😸

To come back to the word "alias" I am not too keen on introducing this now, as you said the word "alias" is super-overloaded and also it doesn't completely clarify (in the name) what an alias really means. Hence my idea of semtype vs vistype.

So now that we have a reasonable name for Semantic Type, I've renamed AliasType to VisibleType. I think the bit shifting will add extra complexity for a minor space saving in the order of 32 bits per column, so I'm keeping the second field.

pkg/sql/sqlbase/structured.proto, line 68 at r2 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

We do not need to distinguish between types of the same length like int64/int8/bigint/int - just between types of different length like int8/int4/int2.

If we replicate pg behavior then we would distinguish between types of the same length.

Comments from Reviewable

knz · 2017-06-22T14:40:45Z

Reviewed 4 of 4 files at r4.
Review status: all files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.

pkg/sql/sqlbase/structured.proto, line 71 at r1 (raw file):

Previously, m-schneider (Masha Schneider) wrote…

Would it be less code though? Would we want more datum types? I'm thinking that probably not because we would be able to reuse the precision field at that level.

I talked to Jesse and I'll add docs todos in the followup PR.

See my comment above. I think it is probably necessary for semantic correctness to unify them.

pkg/sql/sqlbase/structured.proto, line 68 at r2 (raw file):

Previously, m-schneider (Masha Schneider) wrote…

If we replicate pg behavior then we would distinguish between types of the same length.

Yes not only we need to distinguish between them but also propagate them; for example int4column + 123 should have type int4 not int. That's why I think a mechanism based on OidWrapper or similar for typing is going to be necessary. Perhaps in a later iteration?

pkg/sql/sqlbase/structured.proto, line 68 at r2 (raw file):

Previously, m-schneider (Masha Schneider) wrote…

So now that we have a reasonable name for Semantic Type, I've renamed AliasType to VisibleType. I think the bit shifting will add extra complexity for a minor space saving in the order of 32 bits per column, so I'm keeping the second field.

ack

Comments from Reviewable

knz · 2017-06-22T14:42:01Z

LGTM for this PR assuming you fix the broken tests in show_trace that broke due to the new descriptor format.

However please ensure either in the next PRs or by opening a follow-up issue that the question of what to do with equivalent types but with different names (e.g. int8 vs int4) and how they propagate throughout type checking is properly addressed and taken care of. See discussion below.

Review status: all files reviewed at latest revision, 2 unresolved discussions, some commit checks failed.

Comments from Reviewable

m-schneider · 2017-06-23T22:21:52Z

Review status: 5 of 6 files reviewed at latest revision, 2 unresolved discussions, some commit checks failed.

pkg/sql/sqlbase/structured.proto, line 71 at r1 (raw file):

Previously, knz (kena) wrote…

See my comment above. I think it is probably necessary for semantic correctness to unify them.

I'm answering both this and your comment above here. I'm thinking that changing DInt should be in a follow up PR. I looked at DOidWrapper and that looks like the right way to go. The question is do we want to add that complexity? We don't currently do anything for decimal precision beyond update and insert.

Comments from Reviewable

Before this patch we would discard more precise type informations for types such as INT8 and INT64. SHOW CREATE TABLE would show INT even if the user specified either of those. After this patch we'll save this information. SHOW CREATE TABLE will now display the correct type.

m-schneider · 2017-06-26T15:04:00Z

Everything is passing! I'll send out the the next pr for type checking.

Review status: 5 of 6 files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.

Comments from Reviewable

m-schneider assigned jordanlewis Jun 15, 2017

m-schneider force-pushed the int4 branch from 92d6c33 to 3ae182b Compare June 15, 2017 21:36

m-schneider force-pushed the int4 branch from 3ae182b to f6e4592 Compare June 16, 2017 15:30

m-schneider mentioned this pull request Jun 19, 2017

sql: Refactoring structured.proto. Renaming ColumnType Kind enum to SemanticType. #16606

Merged

m-schneider force-pushed the int4 branch 3 times, most recently from 4f4b9f8 to bb37ca7 Compare June 21, 2017 14:57

m-schneider force-pushed the int4 branch 4 times, most recently from b0c2c34 to c538baf Compare June 23, 2017 22:02

m-schneider force-pushed the int4 branch from c538baf to 91e87da Compare June 23, 2017 23:40

knz approved these changes Jun 26, 2017

View reviewed changes

m-schneider merged commit ae0872a into cockroachdb:master Jun 26, 2017

m-schneider deleted the int4 branch June 26, 2017 17:22

jseldess mentioned this pull request Jun 28, 2017

sql: Adding support for integral aliases cockroachdb/docs#1634

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: Adding support for integral aliases. #16548

sql: Adding support for integral aliases. #16548

m-schneider commented Jun 15, 2017

CLAassistant commented Jun 15, 2017 •

edited

Loading

cockroach-teamcity commented Jun 15, 2017

jordanlewis commented Jun 15, 2017

bdarnell commented Jun 15, 2017

maddyblue commented Jun 16, 2017

knz commented Jun 16, 2017

knz commented Jun 16, 2017

m-schneider commented Jun 16, 2017

knz commented Jun 18, 2017

eisenstatdavid commented Jun 19, 2017

jordanlewis commented Jun 19, 2017

knz commented Jun 19, 2017

m-schneider commented Jun 21, 2017

knz commented Jun 22, 2017

knz commented Jun 22, 2017

m-schneider commented Jun 23, 2017

m-schneider commented Jun 26, 2017

sql: Adding support for integral aliases. #16548

sql: Adding support for integral aliases. #16548

Conversation

m-schneider commented Jun 15, 2017

CLAassistant commented Jun 15, 2017 • edited Loading

cockroach-teamcity commented Jun 15, 2017

jordanlewis commented Jun 15, 2017

bdarnell commented Jun 15, 2017

maddyblue commented Jun 16, 2017

knz commented Jun 16, 2017

knz commented Jun 16, 2017

m-schneider commented Jun 16, 2017

knz commented Jun 18, 2017

eisenstatdavid commented Jun 19, 2017

jordanlewis commented Jun 19, 2017

knz commented Jun 19, 2017

m-schneider commented Jun 21, 2017

knz commented Jun 22, 2017

knz commented Jun 22, 2017

m-schneider commented Jun 23, 2017

m-schneider commented Jun 26, 2017

CLAassistant commented Jun 15, 2017 •

edited

Loading