Skip to content

Commit

Permalink
[RFC] Clarify the Summer RFC and create an executive overview
Browse files Browse the repository at this point in the history
[ci skip]
  • Loading branch information
knz authored and nvanbenschoten committed Apr 28, 2016
1 parent 1c8c73e commit 8879af7
Showing 1 changed file with 92 additions and 9 deletions.
101 changes: 92 additions & 9 deletions docs/RFCS/typing.md
Original file line number Diff line number Diff line change
Expand Up @@ -346,18 +346,92 @@ Where Summer will always pick a type and be able to explain it.

## Proposed typing strategy

### High-level overview

To explain Summer to a newcomer it would be mostly correct to say
"Summer first determines the types of the operands of a complex
expression, then based on the operand types decides the type of the
complex expression", ie. the intuitive description of a bottom-up type
inference.

The reason why Summer is more complex than this in reality (and the principle
underlying its design) is threefold:

- Expressions containing placeholders often contain insufficient
information to determine a proper type in a bottom-up fashion. For
example in the expression `floor($1 * $2)` we cannot type the
placeholders unless we take into account the accepted argument types
of `floor`.

- SQL number literals are usually valid values in multiple types
(`int`, `float`, `decimal`). Not only do users expect a minimum
amount of automatic type coercion, so that expressions like `1.5 +
123` are not rejected. Also there is a conflict of interest between
flexibility for the SQL user (which suggests picking the largest
type) and performance (which suggests picking the smallest type).
Summer does extra work to reach a balance in there. For example
`greatest(1, 1.2)` will pick `float` whereas `greatest(1,
1.2e10000)` will pick `decimal`.

- SQL has overloaded functions. If there are multiple candidates and
the operand types do not match the candidates' expected types
"exactly" Summer does extra work to find an acceptable candidate.
So another way to explain Summer that is somewhat less incorrect
than the naive explanation above would be:
1. the type of constant literals (numbers, strings, null) and
placeholders are mostly determined by their parent expression
depending on other rules (especially the expected type at that
position), not themselves. For example Summer does not "know"
(determines) the constant "123" to be an `int` until it looks at
its parent in the syntax tree. For complex expressions involving
number constants, this requires Summer to first perform constant
folding so that the immediate parent of a constant, often an
overloaded operator, has enough information from its other
operand(s) to decide a type for the constant. This constant folding
is performed using exact arithmetic.
2. for functions that require homogenous types (e.g. `GREATEST`, `CASE
.. THEN` etc), the type expected by the context, if any, is used to
restrict the operand types (rule 6.2) otherwise the first operand
with a "possibly useful" type is used to restrict the type of the
other operands (rules 6.3 and 6.4).
3. during overload resolution, the candidate list is first restricted
to the candidates that are *compatible* with the arguments (rules
7.1 to 7.3), then filtered down by compatibility between the
candidate return types and the context (7.4), then by minimizing
the amount of type conversions for literals (7.5), then by
preferring homogenous argument lists (7.6).
### Language extension
We introduce a new expression node "type annotation".
In order to clarify the typing rules below and to exercise
the proposed system, we found it was useful to "force" certain
expressions to be of a certain type.
Unfortunately the SQL cast expression (`CAST(... AS ...)` or
`...::...`) is not appropriate for this, because although it
guarantees a type to the surrounding expression it does not constrain
its argument. For example `sign(1.2)::int` does not disambiguate which
overload of `sign` to use.
Therefore we propose the following SQL extension, which is not
required to implement the typing system but offers opportunities to
better exercise it in tests. The explanatory examples below also use
this extension for explanatory purposes.
The extension is a new expression node "type annotation".
We also introduce the new SQL syntax for this: "E : T".
We also propose the following SQL syntax for this: "E : T".
For example: `1:int` of `1 : int`.
The meaning of this at a first order approximation is "interpret the expression on the left giving
a preference to the type on the right".
The meaning of this at a first order approximation is "interpret the
expression on the left using the type on the right".
This is different from casts, see below.
This is different from casts, as explain below.
The need for this type of extension is also implicitly
present/expressed in the alternate proposals Rick and Morty.
Expand Down Expand Up @@ -494,7 +568,7 @@ tree where each node is unable to be properly introspect about its own return
type into a typed tree which can provide its inferred result type, and as such
can be evaluated later.
#### Implementation Note
#### Implementation Notes
_In an effort to make this distinction clearer in code, a `TypedExpr` interface
will be created, which is a superset of the `Expr` interface, but also has the
Expand Down Expand Up @@ -574,18 +648,27 @@ subsequent step, we check the remaining overload set:
Then the overload candidates are filtered based on the resulting types. If any argument of the call
receives type null, then it is not used for filtering.

For example: `select mod(extract(seconds from now()), $1*20)`. There
are 3 candidates for `mod`, on `int`, `float` and `decimal`. The
first argument `extract` is typed without a desired type and
resolves to `int`. This selects the candidate `mod(int, int)`. From then on only one candidate
remains so `$1*20` gets typed using desired type `int` and `$1` gets typed as `int`.

3. (7.3) candidates are filtered based on the resolvable type set types of constant number literals.
Remember at this point all constant literals already have a resolvable type set since constant folding.

The filtering is done left to right, eliminating at each argument all candidates that do not accept
one of the types in the resolvable set at that position.

Example: `select sign(1.2)`. `sign` has 3 candidates for `int`, `float` and `decimal`. Step 7.3 eliminates
the candidate for `int`.

After this point,
the number of candidates left will be checked now and after each following step.

4. (7.4) candidates are filtered based on the desired return type, if one is provided

Example: `insert into (str_col) values (left($1, 1))
Example: `insert into (str_col) values (left($1, 1))`
With only rules 7.2 and 7.3 above we still have 2 candidates: `left(string, int)` and `left(bytes, int)`.
With rule 7.4 `left(string, int)` is selected.

Expand Down Expand Up @@ -615,7 +698,7 @@ subsequent step, we check the remaining overload set:
that accepts this type in the yet untyped positions,
choose that candidate.

Example: `select (1 + $1)`
Example: `select div(1, $1)` still has candidates for `int`, `float` and `decimal`.

Another approach would be to go through each overload and attempt to type check each
argument expression with the parameter's type. If any of these expressions type checked to a
Expand Down

0 comments on commit 8879af7

Please sign in to comment.