[BUG] CastOpSuite and AnsiCastOpSuite failing with ArithmeticException on Spark 3.1 #1271

jlowe · 2020-12-04T16:48:27Z

Describe the bug
Testing against the latest Apache Spark 3.1 SNAPSHOT the CastOpSuite fails like this:

CastOpSuite:
- Test all supported casts with in-range values *** FAILED ***
  Cast from FloatType to IntegerType failed; ansi=true org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1505.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1505.0 (TID 1518) (10.28.9.126 executor driver): java.lang.ArithmeticException: Casting 2.14748365E9 to int causes overflow
  	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
  	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
  	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
  	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
  	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
  	at org.apache.spark.util.random.SamplingUtils$.reservoirSampleAndCount(SamplingUtils.scala:41)
  	at org.apache.spark.RangePartitioner$.$anonfun$sketch$1(Partitioner.scala:306)
  	at org.apache.spark.RangePartitioner$.$anonfun$sketch$1$adapted(Partitioner.scala:304)
  	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915)
  	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:915)
  	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
  	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
  	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
  	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
  	at org.apache.spark.scheduler.Task.run(Task.scala:131)
  	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
  	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
  	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
  	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  	at java.lang.Thread.run(Thread.java:748)
  
  Driver stacktrace: (CastOpSuite.scala:103)

AnsiCastOpSuite has a similar failure.

Steps/Code to reproduce bug
Build and install latest Apache Spark 3.1.0-SNAPSHOT and run plugin unit tests against Spark 3.1.0 via:

mvn -Pspark310tests test

The text was updated successfully, but these errors were encountered:

andygrove · 2020-12-04T18:07:45Z

This is caused by this change in 3.1.0: apache/spark#30585

When converting from float to int in 3.0.1, the generated Java code is effectively:

Math.floor(x) <= Int.MaxValue.toFloat && Math.ceil(x) >= Int.MinValue.toFloat

In 3.1.0 it changed to:

Math.floor(x) <= Int.MaxValue && Math.ceil(x) >= Int.MinValue

This breaks our tests that try to cast Int.MaxValue.toFloat to Int because Int.MaxValue.toFloat is not less than or equal to Math.ceil(Int.MaxValue).

andygrove · 2020-12-16T16:11:55Z

The tests were re-enabled in #1402

The remaining work for this issue is to add documentation about the supported ranges of floats that can be cast to integer and these ranges are different on the CPU depending on the Spark version, and the GPU behavior varies in some cases so we may want to add a config around this as well.

…ns that moved. (NVIDIA#1271) * Merge cudf 23.08 with hash utility function moves. Fix spark-rapids-jni to compensate. * Add signoff Signed-off-by: db <dbaranec@nvidia.com> --------- Signed-off-by: db <dbaranec@nvidia.com>

jlowe added bug Something isn't working ? - Needs Triage Need team to review and classify Spark 3.1+ Bugs only related to Spark 3.1 or higher labels Dec 4, 2020

andygrove self-assigned this Dec 4, 2020

andygrove mentioned this issue Dec 4, 2020

Temporarily disable more CAST tests for Spark 3.1.0 #1275

Merged

andygrove added this to the Dec 7 - Dec 18 milestone Dec 7, 2020

sameerz added P0 Must have for release and removed ? - Needs Triage Need team to review and classify labels Dec 8, 2020

andygrove mentioned this issue Dec 15, 2020

[WIP] re-enable cast float to int tests with 3.1.0 #1396

Closed

andygrove linked a pull request Dec 16, 2020 that will close this issue

Better GPU Cast type checks #1402

Merged

andygrove mentioned this issue Dec 16, 2020

Add config for cast float to integral types #1413

Merged

andygrove closed this as completed in #1413 Dec 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] CastOpSuite and AnsiCastOpSuite failing with ArithmeticException on Spark 3.1 #1271

[BUG] CastOpSuite and AnsiCastOpSuite failing with ArithmeticException on Spark 3.1 #1271

jlowe commented Dec 4, 2020

andygrove commented Dec 4, 2020 •

edited

Loading

andygrove commented Dec 16, 2020 •

edited

Loading

[BUG] CastOpSuite and AnsiCastOpSuite failing with ArithmeticException on Spark 3.1 #1271

[BUG] CastOpSuite and AnsiCastOpSuite failing with ArithmeticException on Spark 3.1 #1271

Comments

jlowe commented Dec 4, 2020

andygrove commented Dec 4, 2020 • edited Loading

andygrove commented Dec 16, 2020 • edited Loading

andygrove commented Dec 4, 2020 •

edited

Loading

andygrove commented Dec 16, 2020 •

edited

Loading