[BUG] nullability on some decimal operations is wrong #9108

revans2 · 2023-08-24T16:28:21Z

Describe the bug
There are some expressions where we set nullable incorrectly.

CheckOverflow always has it set to true, but we have it set to child.nullable an inherited by unary.

In https://issues.apache.org/jira/browse/SPARK-39316 which made some changes to how arithmetic operations are calculated for decimal in changed the default nullable implementation for BinaryArithmatic

https://github.com/apache/spark/blob/ce12f6dbad2d713c6a2a52ac36d1f7a910399ad4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala#L253-L263

Ours no longer matches this and still does left.nullable || right.nullable. This can result in crashes if we do a ColumnarToRow operation and it is not expected to be null.

We should probably audit this for all expressions and see if we are setting it correctly.

Steps/Code to reproduce bug
The good news is that it is kind of hard to reproduce. You need to overflow the decimal operation on one of these, and have the input marked as not-nullable. But Spark tends to mark all inputs as nullable unless it is a constant or comes from spark.range so it would be good to come up with a way to test this generally, but I don't know of one right now.

The text was updated successfully, but these errors were encountered:

firestarman · 2023-11-02T07:50:44Z

You need to overflow the decimal operation on one of these, and have the input marked as not-nullable

Hi, does the input mean the input of a ColumnarToRow ?

firestarman · 2023-11-02T09:04:06Z

This can result in crashes if we do a ColumnarToRow operation and it is not expected to be null

Is it because we tell Spark the data is not nullable but there are nulls in rows ? Then Spark will not check the nullability of data by calling isNullAt when reading the data.

revans2 · 2023-11-02T13:51:59Z

does the input mean the input of a ColumanrToRow? Not it means the input to the query.

No it means the input to one of the decimal operations. Turns out this can cause data corruption too :(.

spark.range(20).selectExpr("CAST(id as DECIMAL(38,0)) as dec_num").selectExpr("99999999999999999999999999999999999991 + dec_num as over").show(false)

If you run it on the CPU you get a null as the output for the columns that would overflow the add, but if you run it on the GPU you get an empty string out. But if you do a collect instead of a show you get bad results back instead of nulls.

spark.range(20).selectExpr("CAST(id as DECIMAL(38,0)) as dec_num").selectExpr("99999999999999999999999999999999999991 + dec_num as over").collect.foreach(println)

This happens for all of the binary ops in some form.

spark.range(20).selectExpr("CAST(id as DECIMAL(38,0)) as dec_num").selectExpr("99999999999999999999999999999999999991 - -dec_num as over").collect.foreach(println)
spark.range(20).selectExpr("CAST(id as DECIMAL(38,0)) as dec_num").selectExpr("99999999999999999999999999999999999991 * dec_num as over").collect.foreach(println)

firestarman · 2023-11-03T04:10:34Z

Thanks a lot. I also figured out a repro case, but yours look simpler. And my case here has data corruption for all the Spark versions, which will be also fixed by my PR.

scala> spark.conf.set("spark.sql.decimalOperations.allowPrecisionLoss", "false")

scala> spark.range(9223372036854775800L, 9223372036854775807L).selectExpr("cast(id as decimal(38, 18)) as dec_col").selectExpr("dec_col * 22 as overflow").show
23/11/03 04:10:02 WARN GpuOverrides: 
*Exec <ProjectExec> will run on GPU
  *Expression <Alias> cast(CheckOverflow((promote_precision(cast(id#0L as decimal(38,18))) * 22.000000000000000000), DecimalType(38,18)) as string) AS overflow#7 will run on GPU
    *Expression <Cast> cast(CheckOverflow((promote_precision(cast(id#0L as decimal(38,18))) * 22.000000000000000000), DecimalType(38,18)) as string) will run on GPU
      *Expression <CheckOverflow> CheckOverflow((promote_precision(cast(id#0L as decimal(38,18))) * 22.000000000000000000), DecimalType(38,18)) will run on GPU
        *Expression <Multiply> (promote_precision(cast(id#0L as decimal(38,18))) * 22.000000000000000000) will run on GPU
          *Expression <PromotePrecision> promote_precision(cast(id#0L as decimal(38,18))) will run on GPU
            *Expression <Cast> cast(id#0L as decimal(38,18)) will run on GPU
  *Exec <RangeExec> will run on GPU

+--------+                                                                      
|overflow|
+--------+
|        |
|        |
|        |
|        |
|        |
|        |
|        |
+--------+

revans2 added bug Something isn't working ? - Needs Triage Need team to review and classify reliability Features to improve reliability or bugs that severly impact the reliability of the plugin labels Aug 24, 2023

mattahrens removed the ? - Needs Triage Need team to review and classify label Aug 30, 2023

GaryShen2008 assigned firestarman Oct 31, 2023

firestarman mentioned this issue Nov 2, 2023

Fix nullability issues for some decimal operations[databricks] #9608

Merged

1 task

revans2 closed this as completed in #9608 Nov 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] nullability on some decimal operations is wrong #9108

[BUG] nullability on some decimal operations is wrong #9108

revans2 commented Aug 24, 2023

firestarman commented Nov 2, 2023 •

edited

Loading

firestarman commented Nov 2, 2023 •

edited

Loading

revans2 commented Nov 2, 2023

firestarman commented Nov 3, 2023 •

edited

Loading

[BUG] nullability on some decimal operations is wrong #9108

[BUG] nullability on some decimal operations is wrong #9108

Comments

revans2 commented Aug 24, 2023

firestarman commented Nov 2, 2023 • edited Loading

firestarman commented Nov 2, 2023 • edited Loading

revans2 commented Nov 2, 2023

firestarman commented Nov 3, 2023 • edited Loading

firestarman commented Nov 2, 2023 •

edited

Loading

firestarman commented Nov 2, 2023 •

edited

Loading

firestarman commented Nov 3, 2023 •

edited

Loading