Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] nullability on some decimal operations is wrong #9108

Closed
revans2 opened this issue Aug 24, 2023 · 4 comments · Fixed by #9608
Closed

[BUG] nullability on some decimal operations is wrong #9108

revans2 opened this issue Aug 24, 2023 · 4 comments · Fixed by #9608
Assignees
Labels
bug Something isn't working reliability Features to improve reliability or bugs that severly impact the reliability of the plugin

Comments

@revans2
Copy link
Collaborator

revans2 commented Aug 24, 2023

Describe the bug
There are some expressions where we set nullable incorrectly.

CheckOverflow always has it set to true, but we have it set to child.nullable an inherited by unary.

In https://issues.apache.org/jira/browse/SPARK-39316 which made some changes to how arithmetic operations are calculated for decimal in changed the default nullable implementation for BinaryArithmatic

https://github.com/apache/spark/blob/ce12f6dbad2d713c6a2a52ac36d1f7a910399ad4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala#L253-L263

Ours no longer matches this and still does left.nullable || right.nullable. This can result in crashes if we do a ColumnarToRow operation and it is not expected to be null.

We should probably audit this for all expressions and see if we are setting it correctly.

Steps/Code to reproduce bug
The good news is that it is kind of hard to reproduce. You need to overflow the decimal operation on one of these, and have the input marked as not-nullable. But Spark tends to mark all inputs as nullable unless it is a constant or comes from spark.range so it would be good to come up with a way to test this generally, but I don't know of one right now.

@revans2 revans2 added bug Something isn't working ? - Needs Triage Need team to review and classify reliability Features to improve reliability or bugs that severly impact the reliability of the plugin labels Aug 24, 2023
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Aug 30, 2023
@firestarman
Copy link
Collaborator

firestarman commented Nov 2, 2023

You need to overflow the decimal operation on one of these, and have the input marked as not-nullable

Hi, does the input mean the input of a ColumnarToRow ?

@firestarman
Copy link
Collaborator

firestarman commented Nov 2, 2023

This can result in crashes if we do a ColumnarToRow operation and it is not expected to be null

Is it because we tell Spark the data is not nullable but there are nulls in rows ? Then Spark will not check the nullability of data by calling isNullAt when reading the data.

@revans2
Copy link
Collaborator Author

revans2 commented Nov 2, 2023

does the input mean the input of a ColumanrToRow? Not it means the input to the query.

No it means the input to one of the decimal operations. Turns out this can cause data corruption too :(.

spark.range(20).selectExpr("CAST(id as DECIMAL(38,0)) as dec_num").selectExpr("99999999999999999999999999999999999991 + dec_num as over").show(false)

If you run it on the CPU you get a null as the output for the columns that would overflow the add, but if you run it on the GPU you get an empty string out. But if you do a collect instead of a show you get bad results back instead of nulls.

spark.range(20).selectExpr("CAST(id as DECIMAL(38,0)) as dec_num").selectExpr("99999999999999999999999999999999999991 + dec_num as over").collect.foreach(println)

This happens for all of the binary ops in some form.

spark.range(20).selectExpr("CAST(id as DECIMAL(38,0)) as dec_num").selectExpr("99999999999999999999999999999999999991 - -dec_num as over").collect.foreach(println)
spark.range(20).selectExpr("CAST(id as DECIMAL(38,0)) as dec_num").selectExpr("99999999999999999999999999999999999991 * dec_num as over").collect.foreach(println)

@firestarman
Copy link
Collaborator

firestarman commented Nov 3, 2023

Thanks a lot. I also figured out a repro case, but yours look simpler. And my case here has data corruption for all the Spark versions, which will be also fixed by my PR.

scala> spark.conf.set("spark.sql.decimalOperations.allowPrecisionLoss", "false")

scala> spark.range(9223372036854775800L, 9223372036854775807L).selectExpr("cast(id as decimal(38, 18)) as dec_col").selectExpr("dec_col * 22 as overflow").show
23/11/03 04:10:02 WARN GpuOverrides: 
*Exec <ProjectExec> will run on GPU
  *Expression <Alias> cast(CheckOverflow((promote_precision(cast(id#0L as decimal(38,18))) * 22.000000000000000000), DecimalType(38,18)) as string) AS overflow#7 will run on GPU
    *Expression <Cast> cast(CheckOverflow((promote_precision(cast(id#0L as decimal(38,18))) * 22.000000000000000000), DecimalType(38,18)) as string) will run on GPU
      *Expression <CheckOverflow> CheckOverflow((promote_precision(cast(id#0L as decimal(38,18))) * 22.000000000000000000), DecimalType(38,18)) will run on GPU
        *Expression <Multiply> (promote_precision(cast(id#0L as decimal(38,18))) * 22.000000000000000000) will run on GPU
          *Expression <PromotePrecision> promote_precision(cast(id#0L as decimal(38,18))) will run on GPU
            *Expression <Cast> cast(id#0L as decimal(38,18)) will run on GPU
  *Exec <RangeExec> will run on GPU

+--------+                                                                      
|overflow|
+--------+
|        |
|        |
|        |
|        |
|        |
|        |
|        |
+--------+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working reliability Features to improve reliability or bugs that severly impact the reliability of the plugin
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants