Skip to content

Commit

Permalink
[SPARK-37714][SQL] ANSI mode: allow casting between numeric type and …
Browse files Browse the repository at this point in the history
…timestamp type

### What changes were proposed in this pull request?

* Allow casting between numeric type and timestamp type under ANSI mode
* Remove the user-facing configuration `spark.sql.ansi.allowCastBetweenDatetimeAndNumeric`

### Why are the changes needed?

Same reason as mentioned in #34459. It is for better adoption of ANSI SQL mode since users are relying on it:
- As we did some data science, we found that many Spark SQL users are actually using `Cast(Timestamp as Numeric)` and `Cast(Numeric as Timestamp)`.
- The Spark SQL connector for Tableau is using this feature for DateTime math. e.g.
 `CAST(FROM_UNIXTIME(CAST(CAST(%1 AS BIGINT) + (%2 * 86400) AS BIGINT)) AS TIMESTAMP)`

### Does this PR introduce _any_ user-facing change?

Yes, casting between numeric type and timestamp type is allowed by default under ANSI SQL mode

### How was this patch tested?

Unit tests.
Here is the screenshot of the document change:
![image](https://user-images.githubusercontent.com/1097932/147194455-b86847f7-59ea-4d29-97ea-9615ec3a758e.png)

Closes #34985 from gengliangwang/changeSubConf.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
  • Loading branch information
gengliangwang committed Dec 23, 2021
1 parent e76ae9a commit 198b90c
Show file tree
Hide file tree
Showing 6 changed files with 67 additions and 151 deletions.
18 changes: 9 additions & 9 deletions docs/sql-ref-ansi-compliance.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,23 +70,21 @@ SELECT abs(-2147483648);

When `spark.sql.ansi.enabled` is set to `true`, explicit casting by `CAST` syntax throws a runtime exception for illegal cast patterns defined in the standard, e.g. casts from a string to an integer.

The `CAST` clause of Spark ANSI mode follows the syntax rules of section 6.13 "cast specification" in [ISO/IEC 9075-2:2011 Information technology — Database languages - SQL — Part 2: Foundation (SQL/Foundation)](https://www.iso.org/standard/53682.html), except it specially allows the following
straightforward type conversions which are disallowed as per the ANSI standard:
* NumericType <=> BooleanType
* StringType <=> BinaryType
* ArrayType => String
* MapType => String
* StructType => String
Besides, the ANSI SQL mode disallows the following type conversions which are allowed when ANSI mode is off:
* Numeric <=> Binary
* Date <=> Boolean
* Timestamp <=> Boolean
* Date => Numeric

The valid combinations of source and target data type in a `CAST` expression are given by the following table.
“Y” indicates that the combination is syntactically valid without restriction and “N” indicates that the combination is not valid.

| Source\Target | Numeric | String | Date | Timestamp | Interval | Boolean | Binary | Array | Map | Struct |
|-----------|---------|--------|------|-----------|----------|---------|--------|-------|-----|--------|
| Numeric | <span style="color:red">**Y**</span> | Y | N | N | N | Y | N | N | N | N |
| Numeric | <span style="color:red">**Y**</span> | Y | N | N | <span style="color:red">**Y**</span> | Y | N | N | N | N |
| String | <span style="color:red">**Y**</span> | Y | <span style="color:red">**Y**</span> | <span style="color:red">**Y**</span> | <span style="color:red">**Y**</span> | <span style="color:red">**Y**</span> | Y | N | N | N |
| Date | N | Y | Y | Y | N | N | N | N | N | N |
| Timestamp | N | Y | Y | Y | N | N | N | N | N | N |
| Timestamp | <span style="color:red">**Y**</span> | Y | Y | Y | N | N | N | N | N | N |
| Interval | N | Y | N | N | Y | N | N | N | N | N |
| Boolean | Y | Y | N | N | N | Y | N | N | N | N |
| Binary | N | Y | N | N | N | N | Y | N | N | N |
Expand All @@ -97,6 +95,8 @@ The `CAST` clause of Spark ANSI mode follows the syntax rules of section 6.13 "c
In the table above, all the `CAST`s that can cause runtime exceptions are marked as red <span style="color:red">**Y**</span>:
* CAST(Numeric AS Numeric): raise an overflow exception if the value is out of the target data type's range.
* CAST(String AS (Numeric/Date/Timestamp/Interval/Boolean)): raise a runtime exception if the value can't be parsed as the target data type.
* CAST(Timestamp AS Numeric): raise an overflow exception if the number of seconds since epoch is out of the target data type's range.
* CAST(Numeric AS Timestamp): raise an overflow exception if numeric value times 1000000(microseconds per second) is out of the range of Long type.
* CAST(Array AS Array): raise an exception if there is any on the conversion of the elements.
* CAST(Map AS Map): raise an exception if there is any on the conversion of the keys and the values.
* CAST(Struct AS Struct): raise an exception if there is any on the conversion of the struct fields.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2202,7 +2202,7 @@ object AnsiCast {
case (StringType, TimestampType) => true
case (DateType, TimestampType) => true
case (TimestampNTZType, TimestampType) => true
case (_: NumericType, TimestampType) => SQLConf.get.allowCastBetweenDatetimeAndNumericInAnsi
case (_: NumericType, TimestampType) => true

case (StringType, TimestampNTZType) => true
case (DateType, TimestampNTZType) => true
Expand All @@ -2222,8 +2222,7 @@ object AnsiCast {
case (_: NumericType, _: NumericType) => true
case (StringType, _: NumericType) => true
case (BooleanType, _: NumericType) => true
case (TimestampType, _: NumericType) => SQLConf.get.allowCastBetweenDatetimeAndNumericInAnsi
case (DateType, _: NumericType) => SQLConf.get.allowCastBetweenDatetimeAndNumericInAnsi
case (TimestampType, _: NumericType) => true

case (ArrayType(fromType, fn), ArrayType(toType, tn)) =>
canCast(fromType, toType) &&
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2665,15 +2665,6 @@ object SQLConf {
.booleanConf
.createWithDefault(true)

val ALLOW_CAST_BETWEEN_DATETIME_AND_NUMERIC_IN_ANSI =
buildConf("spark.sql.ansi.allowCastBetweenDatetimeAndNumeric")
.doc("When true, the data type conversions between datetime types and numeric types are " +
"allowed in ANSI SQL mode. This configuration is only effective when " +
s"'${ANSI_ENABLED.key}' is true.")
.version("3.3.0")
.booleanConf
.createWithDefault(false)

val SORT_BEFORE_REPARTITION =
buildConf("spark.sql.execution.sortBeforeRepartition")
.internal()
Expand Down Expand Up @@ -4139,9 +4130,6 @@ class SQLConf extends Serializable with Logging {

def enforceReservedKeywords: Boolean = ansiEnabled && getConf(ENFORCE_RESERVED_KEYWORDS)

def allowCastBetweenDatetimeAndNumericInAnsi: Boolean =
getConf(ALLOW_CAST_BETWEEN_DATETIME_AND_NUMERIC_IN_ANSI)

def timestampType: AtomicType = getConf(TIMESTAMP_TYPE) match {
case "TIMESTAMP_LTZ" =>
// For historical reason, the TimestampType maps to TIMESTAMP WITH LOCAL TIME ZONE
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,14 @@

package org.apache.spark.sql.catalyst.expressions

import java.sql.{Date, Timestamp}
import java.sql.Timestamp
import java.time.DateTimeException

import org.apache.spark.SparkArithmeticException
import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.catalyst.util.DateTimeConstants.MILLIS_PER_SECOND
import org.apache.spark.sql.catalyst.util.DateTimeTestUtils
import org.apache.spark.sql.catalyst.util.DateTimeTestUtils.{withDefaultTimeZone, UTC, UTC_OPT}
import org.apache.spark.sql.catalyst.util.DateTimeUtils.fromJavaTimestamp
import org.apache.spark.sql.catalyst.util.DateTimeTestUtils.{withDefaultTimeZone, UTC}
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.types._
import org.apache.spark.unsafe.types.UTF8String
Expand Down Expand Up @@ -139,20 +138,6 @@ abstract class AnsiCastSuiteBase extends CastSuiteBase {
cast(Literal(134.12), DecimalType(3, 2)), "cannot be represented")
}

test("ANSI mode: disallow type conversions between Numeric types and Timestamp type") {
import DataTypeTestUtils.numericTypes
checkInvalidCastFromNumericType(TimestampType)
var errorMsg =
"you can use functions TIMESTAMP_SECONDS/TIMESTAMP_MILLIS/TIMESTAMP_MICROS instead"
verifyCastFailure(cast(Literal(0L), TimestampType), Some(errorMsg))

val timestampLiteral = Literal(1L, TimestampType)
errorMsg = "you can use functions UNIX_SECONDS/UNIX_MILLIS/UNIX_MICROS instead."
numericTypes.foreach { numericType =>
verifyCastFailure(cast(timestampLiteral, numericType), Some(errorMsg))
}
}

test("ANSI mode: disallow type conversions between Numeric types and Date type") {
import DataTypeTestUtils.numericTypes
checkInvalidCastFromNumericType(DateType)
Expand Down Expand Up @@ -276,95 +261,38 @@ abstract class AnsiCastSuiteBase extends CastSuiteBase {
}

test("cast from timestamp II") {
withSQLConf(SQLConf.ALLOW_CAST_BETWEEN_DATETIME_AND_NUMERIC_IN_ANSI.key -> "true") {
checkCastToTimestampError(Literal(Double.NaN), TimestampType)
checkCastToTimestampError(Literal(1.0 / 0.0), TimestampType)
checkCastToTimestampError(Literal(Float.NaN), TimestampType)
checkCastToTimestampError(Literal(1.0f / 0.0f), TimestampType)
Seq(Long.MinValue.toDouble, Long.MaxValue.toDouble, Long.MinValue.toFloat,
Long.MaxValue.toFloat).foreach { v =>
checkExceptionInExpression[SparkArithmeticException](
cast(Literal(v), TimestampType), "overflow")
}
checkCastToTimestampError(Literal(Double.NaN), TimestampType)
checkCastToTimestampError(Literal(1.0 / 0.0), TimestampType)
checkCastToTimestampError(Literal(Float.NaN), TimestampType)
checkCastToTimestampError(Literal(1.0f / 0.0f), TimestampType)
Seq(Long.MinValue.toDouble, Long.MaxValue.toDouble, Long.MinValue.toFloat,
Long.MaxValue.toFloat).foreach { v =>
checkExceptionInExpression[SparkArithmeticException](
cast(Literal(v), TimestampType), "overflow")
}
}

test("cast a timestamp before the epoch 1970-01-01 00:00:00Z II") {
withSQLConf(SQLConf.ALLOW_CAST_BETWEEN_DATETIME_AND_NUMERIC_IN_ANSI.key -> "true") {
withDefaultTimeZone(UTC) {
val negativeTs = Timestamp.valueOf("1900-05-05 18:34:56.1")
assert(negativeTs.getTime < 0)
Seq(ByteType, ShortType, IntegerType).foreach { dt =>
checkExceptionInExpression[SparkArithmeticException](
cast(negativeTs, dt), s"to ${dt.catalogString} causes overflow")
}
withDefaultTimeZone(UTC) {
val negativeTs = Timestamp.valueOf("1900-05-05 18:34:56.1")
assert(negativeTs.getTime < 0)
Seq(ByteType, ShortType, IntegerType).foreach { dt =>
checkExceptionInExpression[SparkArithmeticException](
cast(negativeTs, dt), s"to ${dt.catalogString} causes overflow")
}
}
}

test("cast from timestamp") {
withSQLConf(SQLConf.ALLOW_CAST_BETWEEN_DATETIME_AND_NUMERIC_IN_ANSI.key -> "true") {
val millis = 15 * 1000 + 3
val seconds = millis * 1000 + 3
val ts = new Timestamp(millis)
val tss = new Timestamp(seconds)
checkEvaluation(cast(ts, ShortType), 15.toShort)
checkEvaluation(cast(ts, IntegerType), 15)
checkEvaluation(cast(ts, LongType), 15.toLong)
checkEvaluation(cast(ts, FloatType), 15.003f)
checkEvaluation(cast(ts, DoubleType), 15.003)

checkEvaluation(cast(cast(tss, ShortType), TimestampType),
fromJavaTimestamp(ts) * MILLIS_PER_SECOND)
checkEvaluation(cast(cast(tss, IntegerType), TimestampType),
fromJavaTimestamp(ts) * MILLIS_PER_SECOND)
checkEvaluation(cast(cast(tss, LongType), TimestampType),
fromJavaTimestamp(ts) * MILLIS_PER_SECOND)
checkEvaluation(
cast(cast(millis.toFloat / MILLIS_PER_SECOND, TimestampType), FloatType),
millis.toFloat / MILLIS_PER_SECOND)
checkEvaluation(
cast(cast(millis.toDouble / MILLIS_PER_SECOND, TimestampType), DoubleType),
millis.toDouble / MILLIS_PER_SECOND)
checkEvaluation(
cast(cast(Decimal(1), TimestampType), DecimalType.SYSTEM_DEFAULT),
Decimal(1))

// A test for higher precision than millis
checkEvaluation(cast(cast(0.000001, TimestampType), DoubleType), 0.000001)
}
}

test("cast a timestamp before the epoch 1970-01-01 00:00:00Z") {
withSQLConf(SQLConf.ALLOW_CAST_BETWEEN_DATETIME_AND_NUMERIC_IN_ANSI.key -> "true") {
withDefaultTimeZone(UTC) {
val negativeTs = Timestamp.valueOf("1900-05-05 18:34:56.1")
assert(negativeTs.getTime < 0)
Seq(ByteType, ShortType, IntegerType).foreach { dt =>
checkExceptionInExpression[SparkArithmeticException](
cast(negativeTs, dt), s"to ${dt.catalogString} causes overflow")
}
val expectedSecs = Math.floorDiv(negativeTs.getTime, MILLIS_PER_SECOND)
checkEvaluation(cast(negativeTs, LongType), expectedSecs)
withDefaultTimeZone(UTC) {
val negativeTs = Timestamp.valueOf("1900-05-05 18:34:56.1")
assert(negativeTs.getTime < 0)
Seq(ByteType, ShortType, IntegerType).foreach { dt =>
checkExceptionInExpression[SparkArithmeticException](
cast(negativeTs, dt), s"to ${dt.catalogString} causes overflow")
}
}
}

test("cast from date") {
withSQLConf(SQLConf.ALLOW_CAST_BETWEEN_DATETIME_AND_NUMERIC_IN_ANSI.key -> "true") {
val d = Date.valueOf("1970-01-01")
checkEvaluation(cast(d, ShortType), null)
checkEvaluation(cast(d, IntegerType), null)
checkEvaluation(cast(d, LongType), null)
checkEvaluation(cast(d, FloatType), null)
checkEvaluation(cast(d, DoubleType), null)
checkEvaluation(cast(d, DecimalType.SYSTEM_DEFAULT), null)
checkEvaluation(cast(d, DecimalType(10, 2)), null)
checkEvaluation(cast(d, StringType), "1970-01-01")

checkEvaluation(
cast(cast(d, TimestampType, UTC_OPT), StringType, UTC_OPT),
"1970-01-01 00:00:00")
val expectedSecs = Math.floorDiv(negativeTs.getTime, MILLIS_PER_SECOND)
checkEvaluation(cast(negativeTs, LongType), expectedSecs)
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@ import org.apache.spark.sql.catalyst.analysis.TypeCoercionSuite
import org.apache.spark.sql.catalyst.expressions.aggregate.{CollectList, CollectSet}
import org.apache.spark.sql.catalyst.util.DateTimeConstants._
import org.apache.spark.sql.catalyst.util.DateTimeTestUtils._
import org.apache.spark.sql.catalyst.util.DateTimeUtils._
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.types._
import org.apache.spark.sql.types.DayTimeIntervalType.{DAY, HOUR, MINUTE, SECOND}
Expand Down Expand Up @@ -455,36 +454,7 @@ class CastSuite extends CastSuiteBase {
"1970-01-01 00:00:00")
}

test("cast from timestamp") {
val millis = 15 * 1000 + 3
val seconds = millis * 1000 + 3
val ts = new Timestamp(millis)
val tss = new Timestamp(seconds)
checkEvaluation(cast(ts, ShortType), 15.toShort)
checkEvaluation(cast(ts, IntegerType), 15)
checkEvaluation(cast(ts, LongType), 15.toLong)
checkEvaluation(cast(ts, FloatType), 15.003f)
checkEvaluation(cast(ts, DoubleType), 15.003)

checkEvaluation(cast(cast(tss, ShortType), TimestampType),
fromJavaTimestamp(ts) * MILLIS_PER_SECOND)
checkEvaluation(cast(cast(tss, IntegerType), TimestampType),
fromJavaTimestamp(ts) * MILLIS_PER_SECOND)
checkEvaluation(cast(cast(tss, LongType), TimestampType),
fromJavaTimestamp(ts) * MILLIS_PER_SECOND)
checkEvaluation(
cast(cast(millis.toFloat / MILLIS_PER_SECOND, TimestampType), FloatType),
millis.toFloat / MILLIS_PER_SECOND)
checkEvaluation(
cast(cast(millis.toDouble / MILLIS_PER_SECOND, TimestampType), DoubleType),
millis.toDouble / MILLIS_PER_SECOND)
checkEvaluation(
cast(cast(Decimal(1), TimestampType), DecimalType.SYSTEM_DEFAULT),
Decimal(1))

// A test for higher precision than millis
checkEvaluation(cast(cast(0.000001, TimestampType), DoubleType), 0.000001)

test("cast from timestamp II") {
checkEvaluation(cast(Double.NaN, TimestampType), null)
checkEvaluation(cast(1.0 / 0.0, TimestampType), null)
checkEvaluation(cast(Float.NaN, TimestampType), null)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -295,6 +295,37 @@ abstract class CastSuiteBase extends SparkFunSuite with ExpressionEvalHelper {
assert(cast("abcdef", FloatType).nullable)
}

test("cast from timestamp") {
val millis = 15 * 1000 + 3
val seconds = millis * 1000 + 3
val ts = new Timestamp(millis)
val tss = new Timestamp(seconds)
checkEvaluation(cast(ts, ShortType), 15.toShort)
checkEvaluation(cast(ts, IntegerType), 15)
checkEvaluation(cast(ts, LongType), 15.toLong)
checkEvaluation(cast(ts, FloatType), 15.003f)
checkEvaluation(cast(ts, DoubleType), 15.003)

checkEvaluation(cast(cast(tss, ShortType), TimestampType),
fromJavaTimestamp(ts) * MILLIS_PER_SECOND)
checkEvaluation(cast(cast(tss, IntegerType), TimestampType),
fromJavaTimestamp(ts) * MILLIS_PER_SECOND)
checkEvaluation(cast(cast(tss, LongType), TimestampType),
fromJavaTimestamp(ts) * MILLIS_PER_SECOND)
checkEvaluation(
cast(cast(millis.toFloat / MILLIS_PER_SECOND, TimestampType), FloatType),
millis.toFloat / MILLIS_PER_SECOND)
checkEvaluation(
cast(cast(millis.toDouble / MILLIS_PER_SECOND, TimestampType), DoubleType),
millis.toDouble / MILLIS_PER_SECOND)
checkEvaluation(
cast(cast(Decimal(1), TimestampType), DecimalType.SYSTEM_DEFAULT),
Decimal(1))

// A test for higher precision than millis
checkEvaluation(cast(cast(0.000001, TimestampType), DoubleType), 0.000001)
}

test("data type casting") {
val sd = "1970-01-01"
val d = Date.valueOf(sd)
Expand Down

0 comments on commit 198b90c

Please sign in to comment.