Enabling AQE on [databricks] #6461

NVnavkumar · 2022-08-31T00:09:33Z

This branch fixes a couple of issues with enabling Spark Adaptive Query Execution (AQE) on the Databricks Spark environment. Currently this is marked as WIP since I'm still investigating whether there are any more obvious ongoing issues with enabling adaptive execution in Databricks systems. Here are the issues fixed so far:

implementing certain method calls that are currently required by the Databricks environment
implementing handling of a logical plan window optimization that happens in Databricks distributions but not currently in Apache Spark.
Fixed the Databricks shim for GpuShuffleExchangeExec for 10.4 to fix the issue with a duplicate submission of the map stage which causes a missing job id reference.
Added to the AQEUtils shim to handle 9.1 so that ShuffleExchangeExec will fallback to CPU when AQE is enabled.

…rs next Signed-off-by: Navin Kumar <navink@nvidia.com>

Signed-off-by: Navin Kumar <navink@nvidia.com>

…Stats fix Signed-off-by: Navin Kumar <navink@nvidia.com>

Signed-off-by: Navin Kumar <navink@nvidia.com>

… to AQE optimizations Signed-off-by: Navin Kumar <navink@nvidia.com>

…e_on_db

Signed-off-by: Navin Kumar <navink@nvidia.com>

NVnavkumar · 2022-08-31T18:20:56Z

build

NVnavkumar · 2022-08-31T22:13:52Z

Looks like I found a new failure when using SQL UNION, seems to crash the system:

E                   py4j.protocol.Py4JJavaError: An error occurred while calling o471.collectToPython.
E                   : org.apache.spark.SparkException: Job 3 cancelled as part of cancellation of all jobs
E                   	at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:3030)
E                   	at org.apache.spark.scheduler.DAGScheduler.handleJobCancellation(DAGScheduler.scala:2918)
E                   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$doCancelAllJobs$2(DAGScheduler.scala:1283)
E                   	at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23)
E                   	at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
E                   	at org.apache.spark.scheduler.DAGScheduler.doCancelAllJobs(DAGScheduler.scala:1282)
E                   	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onError(DAGScheduler.scala:3253)
E                   	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:53)

Might be caused by the NPE here:

22/08/31 22:12:18 ERROR GpuOverrideUtil: Encountered an exception applying GPU overrides java.lang.NullPointerException
java.lang.NullPointerException
	at org.apache.spark.sql.rapids.GpuShuffleEnv$.isRapidsShuffleAvailable(GpuShuffleEnv.scala:119)
	at org.apache.spark.sql.rapids.GpuShuffleEnv$.useGPUShuffle(GpuShuffleEnv.scala:136)
	at com.nvidia.spark.rapids.GpuTransitionOverrides.$anonfun$apply$3(GpuTransitionOverrides.scala:570)
	at com.nvidia.spark.rapids.GpuOverrides$.logDuration(GpuOverrides.scala:474)
	at com.nvidia.spark.rapids.GpuTransitionOverrides.$anonfun$apply$1(GpuTransitionOverrides.scala:564)
	at com.nvidia.spark.rapids.GpuOverrideUtil$.$anonfun$tryOverride$1(GpuOverrides.scala:4376)
	at com.nvidia.spark.rapids.GpuTransitionOverrides.apply(GpuTransitionOverrides.scala:604)
	at com.nvidia.spark.rapids.GpuTransitionOverrides.apply(GpuTransitionOverrides.scala:39)
	at org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions.$anonfun$apply$2(Columnar.scala:566)
	at org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions.$anonfun$apply$2$adapted(Columnar.scala:566)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions.apply(Columnar.scala:566)
	at org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions.apply(Columnar.scala:523)
	at org.apache.spark.sql.execution.QueryExecution$.$anonfun$prepareForExecution$2(QueryExecution.scala:596)
	at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
	at org.apache.spark.sql.execution.QueryExecution$.$anonfun$prepareForExecution$1(QueryExecution.scala:596)
	at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
	at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
	at scala.collection.immutable.List.foldLeft(List.scala:91)
	at org.apache.spark.sql.execution.QueryExecution$.prepareForExecution(QueryExecution.scala:595)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executedPlan$2(QueryExecution.scala:232)
	at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:151)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:265)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:968)
	at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:265)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:228)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:222)
	at org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:298)
	at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:361)
	at org.apache.spark.sql.execution.QueryExecution.explainStringLocal(QueryExecution.scala:325)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$8(SQLExecution.scala:202)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:386)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:186)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:968)
	at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:141)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:336)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:160)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:156)
	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:575)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:167)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:575)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:268)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:264)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:551)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$eagerlyExecuteCommands$1(QueryExecution.scala:156)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:324)
	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:156)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:141)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:132)
	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:225)
	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:104)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:968)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:101)
	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:803)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:968)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:798)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
	at py4j.Gateway.invoke(Gateway.java:295)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
	at java.lang.Thread.run(Thread.java:748)

… is enabled on Databricks Signed-off-by: Navin Kumar <navink@nvidia.com>

…qe_on_db

NVnavkumar · 2022-09-01T00:24:06Z

This is actually an in issue in GpuShuffleExchangeExec sending an incorrect job id (maybe stage id?) to the scheduler. Current workaround is to fallback ShuffleExchangeExec to CPU when using Databricks and AQE.

Looks like I found a new failure when using SQL UNION, seems to crash the system:

…s to using original Spark implementation to fix concurrency bug in shim Signed-off-by: Navin Kumar <navink@nvidia.com>

Signed-off-by: Navin Kumar <navink@nvidia.com>

NVnavkumar · 2022-09-01T23:57:20Z

build

NVnavkumar · 2022-09-02T00:02:14Z

Fixed the issue in GpuShuffleExchangeExec on Databricks 10.4. For Databricks 9.1, ShuffleExchangeExec will fallback to CPU when AQE is enabled in that environment, due to the complexities of how it handles the submission of the map stage. Updated the lead comment to reflect updates.

revans2 · 2022-09-02T14:54:57Z

Fixed the issue in GpuShuffleExchangeExec on Databricks 10.4. For Databricks 9.1, ShuffleExchangeExec will fallback to CPU when AQE is enabled in that environment, due to the complexities of how it handles the submission of the map stage. Updated the lead comment to reflect updates.

Have we measured the performance impact of this? In many cases AQE is not that big of a performance win, but sending the data to the CPU and back again is a really big performance hit.

NVnavkumar · 2022-09-02T15:36:49Z

Note that it is still on the GPU in Databricks 10.4. We can measure the performance impact for the older version which has this fallback if that makes sense. On Sep 2, 2022, at 7:55 AM, Robert (Bobby) Evans ***@***.***> wrote: Fixed the issue in GpuShuffleExchangeExec on Databricks 10.4. For Databricks 9.1, ShuffleExchangeExec will fallback to CPU when AQE is enabled in that environment, due to the complexities of how it handles the submission of the map stage. Updated the lead comment to reflect updates. Have we measured the performance impact of this? In many cases AQE is not that big of a performance win, but sending the data to the CPU and back again is a really big performance hit. — Reply to this email directly, view it on GitHub<#6461 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AXFDIM3CFV2WRLAYCW33BADV4IIMXANCNFSM6AAAAAAQA4JWIA>. You are receiving this because you were assigned.Message ID: ***@***.***>

tgravescs

overall seems fine, I thought at one point you had mentioned implementing computeStats per exec and trying to get a realistic size, did that not work out?

integration_tests/src/main/python/array_test.py

sql-plugin/src/main/311+-db/scala/com/nvidia/spark/rapids/shims/ShimLeafExecNode.scala

sql-plugin/src/main/321db/scala/com/nvidia/spark/rapids/shims/AQEUtils.scala

NVnavkumar · 2022-09-06T17:30:33Z

overall seems fine, I thought at one point you had mentioned implementing computeStats per exec and trying to get a realistic size, did that not work out?

If we wanted to do it correctly and most accurately, it is a bit an undertaking to implement the PlanVisitor for not just the leaf exec's but also some of the operations. In particular, we have to implement potentially our own Join estimation. Also, when I started looking into it, I got some strangely different numbers in our Parquet reading exec's vs what Databricks is reporting. I figure that can be done separately as potentially a task in the future (if there is any upside to the computation).

Signed-off-by: Navin Kumar <navink@nvidia.com>

NVnavkumar · 2022-09-06T18:12:19Z

Fixed the issue in GpuShuffleExchangeExec on Databricks 10.4. For Databricks 9.1, ShuffleExchangeExec will fallback to CPU when AQE is enabled in that environment, due to the complexities of how it handles the submission of the map stage. Updated the lead comment to reflect updates.

Have we measured the performance impact of this? In many cases AQE is not that big of a performance win, but sending the data to the CPU and back again is a really big performance hit.

Actually figured out how to get GPU Shuffle back in 9.1, will push an update shortly

…logic Signed-off-by: Navin Kumar <navink@nvidia.com>

NVnavkumar · 2022-09-06T21:14:30Z

build

integration_tests/src/main/python/aqe_test.py

sql-plugin/src/main/312db/scala/com/nvidia/spark/rapids/shims/AQEUtils.scala

sql-plugin/src/main/321db/scala/com/nvidia/spark/rapids/shims/AQEUtils.scala

tgravescs · 2022-09-07T13:24:57Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuWindowExec.scala

+            // stage, replacing them with aliases. Also, sometimes children are not provided in the
+            // initial list of expressions after optimizations, so we add them here, and they will
+            // be deduped anyways in the other passes
+            val newChildren = wf.children.map(ce =>


@mythrocks mind taking a look at this

Sorry for the delay.

I'm not particularly familiar with the logic behind isPreNeeded, etc. But on discussion with @NVnavkumar, one wonders if we should check why isPreNeeded is turning up false on Databricks, with AQE turned on. We might adjust how isPreNeeded is calculated.

So, isPreNeeded is false on Databricks, but it's a bit of a red herring in this case. We actually need to the window function children in the GpuWindowExec itself, it looks like the extra GpuProject does not help in this case.

After some exploration, I have determined that the bug this fixes is not AQE-specific and I'm not 100% confident that this fix is currently the right approach. I have reverted this fix and filed a new issue #6531 to track the bug here.

Signed-off-by: Navin Kumar <navink@nvidia.com>

NVnavkumar · 2022-09-07T23:20:55Z

build

NVnavkumar · 2022-09-08T17:17:09Z

build

… an AQE-specific bug Signed-off-by: Navin Kumar <navink@nvidia.com>

Signed-off-by: Navin Kumar <navink@nvidia.com>

NVnavkumar · 2022-09-09T01:13:53Z

build

…k in CI Signed-off-by: Navin Kumar <navink@nvidia.com>

NVnavkumar · 2022-09-09T18:43:34Z

build

NVnavkumar · 2022-09-09T20:52:41Z

build

NVnavkumar added 16 commits August 18, 2022 17:42

WIP: first pass at ShimLeafExecNode, need to update indirect inherito…

8d8b121

…rs next Signed-off-by: Navin Kumar <navink@nvidia.com>

Move this to until 3.4.0 for non-databricks spark versions

b3b516c

Signed-off-by: Navin Kumar <navink@nvidia.com>

Set this flag to true in 3.2.1 DB shim

c45cbe0

Signed-off-by: Navin Kumar <navink@nvidia.com>

WIP: some test updates with enabling AQE

5b3f954

Signed-off-by: Navin Kumar <navink@nvidia.com>

Move these shim implementations to right place

40e7044

Signed-off-by: Navin Kumar <navink@nvidia.com>

revert this test change for now, need a better solution

d0c5ccb

WIP: Re-enable aqe Databricks tests

acaa586

Signed-off-by: Navin Kumar <navink@nvidia.com>

Unblock these 2 tests on Databricks

e203e9c

Signed-off-by: Navin Kumar <navink@nvidia.com>

WIP: integration tests for AQE

91dc00a

WIP: AQE integration tests

1ce22b2

Signed-off-by: Navin Kumar <navink@nvidia.com>

Updated AQE tests to ensure that leafexecnodes are tested for Databricks

c3ebbbb

Add shim for DatasourceV2ExecBase to implement the equivalent compute…

4a1f591

…Stats fix Signed-off-by: Navin Kumar <navink@nvidia.com>

fix unused import on Spark 3.1.x

a64a484

Signed-off-by: Navin Kumar <navink@nvidia.com>

Add AQE unit test to handle window aggregate condition

1682f87

Signed-off-by: Navin Kumar <navink@nvidia.com>

Fix windowexec issue with missing references to child expressions due…

da750f0

… to AQE optimizations Signed-off-by: Navin Kumar <navink@nvidia.com>

Merge branch 'branch-22.10' of github.com:NVIDIA/spark-rapids into aq…

b9e2bad

…e_on_db

NVnavkumar self-assigned this Aug 31, 2022

sameerz added the task Work required that improves the product but is not user facing label Aug 31, 2022

Fix some style issues

d65c307

Signed-off-by: Navin Kumar <navink@nvidia.com>

NVnavkumar added 2 commits August 31, 2022 22:15

Found a potential union based join unit test that will crash when AQE…

cdb32c6

… is enabled on Databricks Signed-off-by: Navin Kumar <navink@nvidia.com>

Merge branch 'aqe_on_db' of github.com:NVnavkumar/spark-rapids into a…

4922fb4

…qe_on_db

NVnavkumar added 2 commits September 1, 2022 20:03

Disable GPU shuffle on older Databricks, and switch current Databrick…

12fe02c

…s to using original Spark implementation to fix concurrency bug in shim Signed-off-by: Navin Kumar <navink@nvidia.com>

Refactor unit tests for handling issues with Databricks 9.1

29da0c1

Signed-off-by: Navin Kumar <navink@nvidia.com>

tgravescs reviewed Sep 2, 2022

View reviewed changes

NVnavkumar added 2 commits September 6, 2022 17:55

Address feedback

541d91a

Signed-off-by: Navin Kumar <navink@nvidia.com>

Update comment

6e93d9c

Signed-off-by: Navin Kumar <navink@nvidia.com>

Enable GPU shuffle in AQE on Databricks 9.1, remove unnecessary shim …

afbdf2b

…logic Signed-off-by: Navin Kumar <navink@nvidia.com>

NVnavkumar marked this pull request as ready for review September 6, 2022 23:52

NVnavkumar changed the title ~~WIP: Enabling AQE on [databricks]~~ Enabling AQE on [databricks] Sep 6, 2022

tgravescs reviewed Sep 7, 2022

View reviewed changes

NVnavkumar added 2 commits September 7, 2022 15:38

cleanup and add comments to tests

41adfa9

Signed-off-by: Navin Kumar <navink@nvidia.com>

Add cache join test for AQE

c470b72

Signed-off-by: Navin Kumar <navink@nvidia.com>

NVnavkumar added 3 commits September 9, 2022 00:46

remove windowing fix, and move to a separate branch since this is not…

3bf9f21

… an AQE-specific bug Signed-off-by: Navin Kumar <navink@nvidia.com>

Merge branch 'branch-22.10' into aqe_on_db

819d54d

This should be allowed not to run on GPU since AQE can push it off

bf10236

Signed-off-by: Navin Kumar <navink@nvidia.com>

Allow ColumnarToRowExec to not run on GPU because it tends to fallbac…

9c4288b

…k in CI Signed-off-by: Navin Kumar <navink@nvidia.com>

NVnavkumar mentioned this pull request Sep 9, 2022

Delta Lake and AQE on Databricks 10.4 workaround #6539

Closed

tgravescs approved these changes Sep 9, 2022

View reviewed changes

NVnavkumar merged commit e62410a into NVIDIA:branch-22.10 Sep 12, 2022

tgravescs mentioned this pull request Sep 20, 2022

Delta Lake and AQE on Databricks 10.4 workaround [databricks] #6587

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enabling AQE on [databricks] #6461

Enabling AQE on [databricks] #6461

NVnavkumar commented Aug 31, 2022 •

edited

Loading

NVnavkumar commented Aug 31, 2022

NVnavkumar commented Aug 31, 2022

NVnavkumar commented Sep 1, 2022

NVnavkumar commented Sep 1, 2022

NVnavkumar commented Sep 2, 2022

revans2 commented Sep 2, 2022

NVnavkumar commented Sep 2, 2022 via email

tgravescs left a comment

NVnavkumar commented Sep 6, 2022

NVnavkumar commented Sep 6, 2022

NVnavkumar commented Sep 6, 2022

tgravescs Sep 7, 2022

mythrocks Sep 8, 2022

NVnavkumar Sep 8, 2022

NVnavkumar Sep 9, 2022

NVnavkumar commented Sep 7, 2022

NVnavkumar commented Sep 8, 2022

NVnavkumar commented Sep 9, 2022

NVnavkumar commented Sep 9, 2022

NVnavkumar commented Sep 9, 2022

Enabling AQE on [databricks] #6461

Enabling AQE on [databricks] #6461

Conversation

NVnavkumar commented Aug 31, 2022 • edited Loading

NVnavkumar commented Aug 31, 2022

NVnavkumar commented Aug 31, 2022

NVnavkumar commented Sep 1, 2022

NVnavkumar commented Sep 1, 2022

NVnavkumar commented Sep 2, 2022

revans2 commented Sep 2, 2022

NVnavkumar commented Sep 2, 2022 via email

tgravescs left a comment

Choose a reason for hiding this comment

NVnavkumar commented Sep 6, 2022

NVnavkumar commented Sep 6, 2022

NVnavkumar commented Sep 6, 2022

tgravescs Sep 7, 2022

Choose a reason for hiding this comment

mythrocks Sep 8, 2022

Choose a reason for hiding this comment

NVnavkumar Sep 8, 2022

Choose a reason for hiding this comment

NVnavkumar Sep 9, 2022

Choose a reason for hiding this comment

NVnavkumar commented Sep 7, 2022

NVnavkumar commented Sep 8, 2022

NVnavkumar commented Sep 9, 2022

NVnavkumar commented Sep 9, 2022

NVnavkumar commented Sep 9, 2022

NVnavkumar commented Aug 31, 2022 •

edited

Loading