Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Failing Test test_multi_table_hash_join for Databricks 13.3 #9491

Closed
razajafri opened this issue Oct 19, 2023 · 0 comments · Fixed by #9637
Closed

Fix Failing Test test_multi_table_hash_join for Databricks 13.3 #9491

razajafri opened this issue Oct 19, 2023 · 0 comments · Fixed by #9637
Assignees
Labels
task Work required that improves the product but is not user facing

Comments

@razajafri
Copy link
Collaborator

razajafri commented Oct 19, 2023

Running the test_multi_table_hash_join leads to the following exception

E                   py4j.protocol.Py4JJavaError: An error occurred while calling o567.collectToPython.
E                   : java.lang.IllegalStateException: the broadcast must be on the GPU too
E                       at com.nvidia.spark.rapids.shims.GpuBroadcastJoinMeta.verifyBuildSideWasReplaced(GpuBroadcastJoinMeta.scala:69)
E                       at org.apache.spark.sql.rapids.execution.GpuBroadcastHashJoinMeta.convertToGpu(GpuBroadcastHashJoinExec.scala:58)
E                       at org.apache.spark.sql.rapids.execution.GpuBroadcastHashJoinMeta.convertToGpu(GpuBroadcastHashJoinExec.scala:39)
E                       at com.nvidia.spark.rapids.SparkPlanMeta.convertIfNeeded(RapidsMeta.scala:799)
E                       at com.nvidia.spark.rapids.GpuOverrides$.com$nvidia$spark$rapids$GpuOverrides$$doConvertPlan(GpuOverrides.scala:4278)
E                       at com.nvidia.spark.rapids.GpuOverrides.applyOverrides(GpuOverrides.scala:4623)
E                       at com.nvidia.spark.rapids.GpuOverrides.$anonfun$applyWithContext$3(GpuOverrides.scala:4483)
E                       at com.nvidia.spark.rapids.GpuOverrides$.logDuration(GpuOverrides.scala:452)
E                       at com.nvidia.spark.rapids.GpuOverrides.$anonfun$applyWithContext$1(GpuOverrides.scala:4480)
E                       at com.nvidia.spark.rapids.GpuOverrideUtil$.$anonfun$tryOverride$1(GpuOverrides.scala:4446)
E                       at com.nvidia.spark.rapids.GpuOverrides.applyWithContext(GpuOverrides.scala:4500)
E                       at com.nvidia.spark.rapids.GpuQueryStagePrepOverrides.$anonfun$apply$1(GpuOverrides.scala:4463)
E                       at com.nvidia.spark.rapids.GpuOverrideUtil$.$anonfun$tryOverride$1(GpuOverrides.scala:4446)
E                       at com.nvidia.spark.rapids.GpuQueryStagePrepOverrides.apply(GpuOverrides.scala:4466)
E                       at com.nvidia.spark.rapids.GpuQueryStagePrepOverrides.apply(GpuOverrides.scala:4459)
E                       at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec$.$anonfun$executePhysicalRules$2(AdaptiveSparkPlanExec.scala:1545)
E                       at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
E                       at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec$.$anonfun$executePhysicalRules$1(AdaptiveSparkPlanExec.scala:1544)
E                       at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
E                       at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
E                       at scala.collection.immutable.List.foldLeft(List.scala:91)
E                       at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec$.executePhysicalRules(AdaptiveSparkPlanExec.scala:1542)
E                       at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec$.$anonfun$applyPhysicalRules$2(AdaptiveSparkPlanExec.scala:1530)
E                       at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
E                       at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:396)
E                       at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec$.executePhase(AdaptiveSparkPlanExec.scala:1510)
E                       at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec$.applyPhysicalRules(AdaptiveSparkPlanExec.scala:1530)
E                       at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.reOptimize(AdaptiveSparkPlanExec.scala:1285)
E                       at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$withFinalPlanUpdate$3(AdaptiveSparkPlanExec.scala:651)
E                       at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
E                       at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:166)
E                       at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$withFinalPlanUpdate$2(AdaptiveSparkPlanExec.scala:565)
E                       at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
E                       at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1113)
E                       at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$withFinalPlanUpdate$1(AdaptiveSparkPlanExec.scala:563)
E                       at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
E                       at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.withFinalPlanUpdate(AdaptiveSparkPlanExec.scala:558)
E                       at org.apache.spark.sql.execution.qrc.ResultCacheManager.computeResult(ResultCacheManager.scala:563)
E                       at org.apache.spark.sql.execution.qrc.ResultCacheManager.$anonfun$getOrComputeResultInternal$1(ResultCacheManager.scala:426)
E                       at scala.Option.getOrElse(Option.scala:189)
E                       at org.apache.spark.sql.execution.qrc.ResultCacheManager.getOrComputeResultInternal(ResultCacheManager.scala:419)
E                       at org.apache.spark.sql.execution.qrc.ResultCacheManager.getOrComputeResult(ResultCacheManager.scala:313)
E                       at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeCollectResult$1(SparkPlan.scala:519)
E                       at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
E                       at org.apache.spark.sql.execution.SparkPlan.executeCollectResult(SparkPlan.scala:516)
E                       at org.apache.spark.sql.Dataset.$anonfun$collectToPython$1(Dataset.scala:4271)
E                       at org.apache.spark.sql.Dataset.$anonfun$withAction$3(Dataset.scala:4544)
E                       at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:935)
E                       at org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:4542)
E                       at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$8(SQLExecution.scala:274)
E                       at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:498)
E                       at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:201)
E                       at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1113)
E                       at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:151)
E                       at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:447)
E                       at org.apache.spark.sql.Dataset.withAction(Dataset.scala:4542)
E                       at org.apache.spark.sql.Dataset.collectToPython(Dataset.scala:4269)
E                       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
E                       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
E                       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
E                       at java.lang.reflect.Method.invoke(Method.java:498)
E                       at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
E                       at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)
E                       at py4j.Gateway.invoke(Gateway.java:306)
E                       at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
E                       at py4j.commands.CallCommand.execute(CallCommand.java:79)
E                       at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
E                       at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
E                       at java.lang.Thread.run(Thread.java:750)

To reproduce apply this patch and run jenkins/databricks/test.sh

diff --git a/jenkins/databricks/test.sh b/jenkins/databricks/test.sh
index 6c96e45ff..75130fae2 100755
--- a/jenkins/databricks/test.sh
+++ b/jenkins/databricks/test.sh
@@ -84,28 +84,9 @@ rapids_shuffle_smoke_test() {
 }
 
 ## limit parallelism to avoid OOM kill
-export TEST_PARALLEL=${TEST_PARALLEL:-4}
+export TEST_PARALLEL=${TEST_PARALLEL:-1}
 
 if [[ $TEST_MODE == "DEFAULT" ]]; then
-    bash integration_tests/run_pyspark_from_build.sh --runtime_env="databricks" --test_type=$TEST_TYPE
+    bash integration_tests/run_pyspark_from_build.sh --runtime_env="databricks" --test_type=$TEST_TYPE -k test_multi_table_hash_join 
 
-    ## Run cache tests
-    if [[ "$IS_SPARK_321_OR_LATER" -eq "1" ]]; then
-        PYSP_TEST_spark_sql_cache_serializer=${PCBS_CONF} \
-            bash integration_tests/run_pyspark_from_build.sh --runtime_env="databricks" --test_type=$TEST_TYPE -k cache_test
-    fi
-fi
-
-## Run tests with jars building from the spark-rapids source code
-if [ "$(pwd)" == "$SOURCE_PATH" ]; then
-    if [[ "$TEST_MODE" == "DEFAULT" || "$TEST_MODE" == "DELTA_LAKE_ONLY" ]]; then
-        ## Run Delta Lake tests
-        SPARK_SUBMIT_FLAGS="$SPARK_CONF $DELTA_LAKE_CONFS" TEST_PARALLEL=1 \
-            bash integration_tests/run_pyspark_from_build.sh --runtime_env="databricks"  -m "delta_lake" --delta_lake --test_type=$TEST_TYPE
-    fi
-
-    if [[ "$TEST_MODE" == "DEFAULT" || "$TEST_MODE" == "MULTITHREADED_SHUFFLE" ]]; then
-        ## Mutithreaded Shuffle test
-        rapids_shuffle_smoke_test
-    fi
 fi
@razajafri razajafri changed the title Fix Integration Test Failures related to join_test.py Fix Integration Test Failures related to join_test.py for Databricks 13.3 Oct 19, 2023
@razajafri razajafri changed the title Fix Integration Test Failures related to join_test.py for Databricks 13.3 Fix Failing Test test_multi_table_hash_join for Databricks 13.3 Oct 19, 2023
@sameerz sameerz added the task Work required that improves the product but is not user facing label Oct 24, 2023
@razajafri razajafri self-assigned this Oct 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
task Work required that improves the product but is not user facing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants