You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Running TpcdsLikeSpark.query("q59") at a large scale fails with a join mismatch error:
java.lang.IllegalStateException: Join needs to run on CPU but at least one input query stage ran on GPU
at com.nvidia.spark.rapids.SparkPlanMeta.makeShuffleConsistent(RapidsMeta.scala:572)
at com.nvidia.spark.rapids.SparkPlanMeta.fixUpJoinConsistencyIfNeeded(RapidsMeta.scala:587)
at com.nvidia.spark.rapids.SparkPlanMeta.$anonfun$fixUpJoinConsistencyIfNeeded$1(RapidsMeta.scala:584)
at com.nvidia.spark.rapids.SparkPlanMeta.$anonfun$fixUpJoinConsistencyIfNeeded$1$adapted(RapidsMeta.scala:584)
at scala.collection.immutable.List.foreach(List.scala:392)
at com.nvidia.spark.rapids.SparkPlanMeta.fixUpJoinConsistencyIfNeeded(RapidsMeta.scala:584)
at com.nvidia.spark.rapids.SparkPlanMeta.$anonfun$fixUpJoinConsistencyIfNeeded$1(RapidsMeta.scala:584)
at com.nvidia.spark.rapids.SparkPlanMeta.$anonfun$fixUpJoinConsistencyIfNeeded$1$adapted(RapidsMeta.scala:584)
at scala.collection.immutable.List.foreach(List.scala:392)
at com.nvidia.spark.rapids.SparkPlanMeta.fixUpJoinConsistencyIfNeeded(RapidsMeta.scala:584)
at com.nvidia.spark.rapids.SparkPlanMeta.runAfterTagRules(RapidsMeta.scala:640)
at com.nvidia.spark.rapids.GpuOverrides.apply(GpuOverrides.scala:2633)
at com.nvidia.spark.rapids.GpuQueryStagePrepOverrides.apply(GpuOverrides.scala:2612)
at com.nvidia.spark.rapids.GpuQueryStagePrepOverrides.apply(GpuOverrides.scala:2608)
at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec$.$anonfun$applyPhysicalRules$1(AdaptiveSparkPlanExec.scala:599)
at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
at scala.collection.immutable.List.foldLeft(List.scala:89)
at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec$.applyPhysicalRules(AdaptiveSparkPlanExec.scala:599)
at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.reOptimize(AdaptiveSparkPlanExec.scala:503)
at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$getFinalPhysicalPlan$1(AdaptiveSparkPlanExec.scala:219)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.getFinalPhysicalPlan(AdaptiveSparkPlanExec.scala:159)
at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.executeCollect(AdaptiveSparkPlanExec.scala:255)
at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3627)
at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2940)
at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3618)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3616)
at org.apache.spark.sql.Dataset.collect(Dataset.scala:2940)
... 49 elided
Steps/Code to reproduce bug
With AQE enabled (i.e.: spark.sql.adaptive.enabled=true) run TpcdsLikeSpark.query("q59")(spark).collect
Note that this error can be replicated at much smaller dataset scales if broadcast joins are effectively disabled (e.g.: by setting spark.sql.autoBroadcastJoinThreshold=1)
The text was updated successfully, but these errors were encountered:
Describe the bug
Running
TpcdsLikeSpark.query("q59")
at a large scale fails with a join mismatch error:Steps/Code to reproduce bug
With AQE enabled (i.e.:
spark.sql.adaptive.enabled=true
) runTpcdsLikeSpark.query("q59")(spark).collect
Note that this error can be replicated at much smaller dataset scales if broadcast joins are effectively disabled (e.g.: by setting
spark.sql.autoBroadcastJoinThreshold=1
)The text was updated successfully, but these errors were encountered: