Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] CastOpSuite and AnsiCastOpSuite failing with ArithmeticException on Spark 3.1 #1271

Closed
jlowe opened this issue Dec 4, 2020 · 2 comments · Fixed by #1402 or #1413
Closed

[BUG] CastOpSuite and AnsiCastOpSuite failing with ArithmeticException on Spark 3.1 #1271

jlowe opened this issue Dec 4, 2020 · 2 comments · Fixed by #1402 or #1413
Assignees
Labels
bug Something isn't working P0 Must have for release Spark 3.1+ Bugs only related to Spark 3.1 or higher

Comments

@jlowe
Copy link
Member

jlowe commented Dec 4, 2020

Describe the bug
Testing against the latest Apache Spark 3.1 SNAPSHOT the CastOpSuite fails like this:

CastOpSuite:
- Test all supported casts with in-range values *** FAILED ***
  Cast from FloatType to IntegerType failed; ansi=true org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1505.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1505.0 (TID 1518) (10.28.9.126 executor driver): java.lang.ArithmeticException: Casting 2.14748365E9 to int causes overflow
  	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
  	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
  	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
  	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
  	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
  	at org.apache.spark.util.random.SamplingUtils$.reservoirSampleAndCount(SamplingUtils.scala:41)
  	at org.apache.spark.RangePartitioner$.$anonfun$sketch$1(Partitioner.scala:306)
  	at org.apache.spark.RangePartitioner$.$anonfun$sketch$1$adapted(Partitioner.scala:304)
  	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915)
  	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:915)
  	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
  	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
  	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
  	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
  	at org.apache.spark.scheduler.Task.run(Task.scala:131)
  	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
  	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
  	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
  	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  	at java.lang.Thread.run(Thread.java:748)
  
  Driver stacktrace: (CastOpSuite.scala:103)

AnsiCastOpSuite has a similar failure.

Steps/Code to reproduce bug
Build and install latest Apache Spark 3.1.0-SNAPSHOT and run plugin unit tests against Spark 3.1.0 via:

mvn -Pspark310tests test
@jlowe jlowe added bug Something isn't working ? - Needs Triage Need team to review and classify Spark 3.1+ Bugs only related to Spark 3.1 or higher labels Dec 4, 2020
@andygrove andygrove self-assigned this Dec 4, 2020
@andygrove
Copy link
Contributor

andygrove commented Dec 4, 2020

This is caused by this change in 3.1.0: apache/spark#30585

When converting from float to int in 3.0.1, the generated Java code is effectively:

Math.floor(x) <= Int.MaxValue.toFloat && Math.ceil(x) >= Int.MinValue.toFloat

In 3.1.0 it changed to:

Math.floor(x) <= Int.MaxValue && Math.ceil(x) >= Int.MinValue

This breaks our tests that try to cast Int.MaxValue.toFloat to Int because Int.MaxValue.toFloat is not less than or equal to Math.ceil(Int.MaxValue).

@andygrove andygrove added this to the Dec 7 - Dec 18 milestone Dec 7, 2020
@sameerz sameerz added P0 Must have for release and removed ? - Needs Triage Need team to review and classify labels Dec 8, 2020
@andygrove
Copy link
Contributor

andygrove commented Dec 16, 2020

The tests were re-enabled in #1402

The remaining work for this issue is to add documentation about the supported ranges of floats that can be cast to integer and these ranges are different on the CPU depending on the Spark version, and the GPU behavior varies in some cases so we may want to add a config around this as well.

@andygrove andygrove linked a pull request Dec 16, 2020 that will close this issue
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this issue Nov 30, 2023
…ns that moved. (NVIDIA#1271)

* Merge cudf 23.08 with hash utility function moves.  Fix spark-rapids-jni to compensate.

* Add signoff

Signed-off-by: db <dbaranec@nvidia.com>

---------

Signed-off-by: db <dbaranec@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P0 Must have for release Spark 3.1+ Bugs only related to Spark 3.1 or higher
Projects
None yet
3 participants