Support GpuSubqueryBroadcast for DPP [databricks] #4150

sperlingxx · 2021-11-18T10:09:35Z

Signed-off-by: sperlingxx lovedreamf@gmail.com

Current PR is to support reusing broadcast exchange for SubqueryBroadcast (which inserted by DPP) on the GPU. To achieve this goal, following steps are essential:

Transforms dynamic partition filters of FileSourceScanExec. Captures, tags and converts SubqueryBroadcastExec inside DynamicPruningExpression. We shall build independent RapidsMeta for SubqueryBroadcastExec inside dynamic partiton filters rather than adding them as the children of scan meta, because it is possible that the FileSourceScan is on the CPU, while the dynamic partitionFilters are on the GPU. And vice versa.
Converts SubqueryBroadcastExec and underlying exchange to GPU if possible. The rule PlanDynamicPruningFilters will insert SubqueryBroadcastExec if there exists available broadcast exchange for reuse. The plan stack looks like:
SubqueryBroadcast -> BroadcastExchange -> executedPlan
Since the GPU overrides rule has been applied on executedPlan, if the wrapped subquery can run on the GPU, the plan stack becomes:
SubqueryBroadcast -> BroadcastExchange -> GpuColumnarToRow -> GpuPlanStack...
To reuse BroadcastExchange on the GPU, we shall transform above pattern into:
GpuSubqueryBroadcast -> GpuBroadcastExchange -> GpuPlanStack...
Runs GpuSubqueryBroadcastExec, which is similiar to GpuBroadcastToCpuExec. The major difference is whether to reuse existing GpuBroadcastExec.

In addition, current PR can only reuse GpuBroadcast when AQE is off. We need to modify GpuBroadcastToCpuExec to reuse
GpuBroadcast with AQE on.

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

sperlingxx · 2021-11-18T10:10:32Z

build

…broadcast

sperlingxx · 2021-11-24T04:15:26Z

build

…broadcast

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

sperlingxx · 2021-11-26T04:55:54Z

build

pxLi · 2021-11-26T05:10:43Z

build

pxLi · 2021-11-26T05:11:17Z

add [databricks] to enable CI stages

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

sperlingxx · 2021-11-29T06:12:46Z

build

…broadcast

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

sperlingxx · 2021-11-29T08:05:53Z

build

revans2 · 2021-11-29T15:44:08Z

integration_tests/src/main/python/dpp_test.py

 @pytest.mark.parametrize('store_format', ['parquet', 'orc'], ids=idfn)
 @pytest.mark.parametrize('s_index', list(range(len(_statements))), ids=idfn)
 @pytest.mark.skipif(is_before_spark_320(), reason="Only in Spark 3.2.0+ AQE and DPP can be both enabled")
-def test_dpp_reuse_broadcast_exchange(aqe_on, store_format, s_index, spark_tmp_table_factory):
+def test_dpp_reuse_broadcast_exchange_aqe_on(store_format, s_index, spark_tmp_table_factory):


Did we miss the corresponding test with AQE off, or is that covered in some other existing test and was not really needed?

I named the test of AQE off as test_dpp_reuse_broadcast_exchange. I appended the suffix _aqe_off to clarify the intention of the tests.

revans2 · 2021-11-29T15:45:14Z

integration_tests/src/main/python/dpp_test.py

+# When AQE enabled, the broadcast exchange can not be reused in current, because spark-rapids
+# will plan GpuBroadcastToCpu for exchange reuse. Meanwhile, the original broadcast exchange is
+# simply replaced by GpuBroadcastExchange. Therefore, the reuse can not work since
+# GpuBroadcastToCpu is not semantically equal to GpuBroadcastExchange.


Is this something that we should fix? Should be combine the two classes together so that they are the same thing and it does not matter if you are reading the data on the CPU or the GPU?

I think so. IMO, with the help of the new method SerializeConcatHostBuffersDeserializeBatch.hostBatches, we can change the role of GpuBroadcastToCpu, making it as a wrapper of GpuBroadcastExchangeExec. Therefore, we can reuse the GpuBroadcast in terms of serialized host buffers. I tried in my local environment, it works. I would like to create a separate PR for this change.

Sounds like a good plan.

revans2 · 2021-11-29T15:53:11Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala

+              willNotWorkOnGpu("underlying BroadcastExchange can not run on the GPU.")
+            }
+          case _ =>
+            willNotWorkOnGpu("no available BroadcastExchange for reuse.")


I am really confused by this. We cannot run the SubqueryBroadcastExec on the GPU because "no available BroadcastExchange for reuse."? Can we have a better explanation? Our end users will read this and get confused. I am also a little concerned that they will think it is an issue that they need to try and fix.

Yes, I refined the reason here.

revans2 · 2021-11-29T16:05:32Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala

+        override val childPlans: Seq[SparkPlanMeta[SparkPlan]] = Nil
+
+        override def tagPlanForGpu(): Unit = s.child match {
+          case ex @ BroadcastExchangeExec(_, c2r: GpuColumnarToRowExecParent) =>


I am more than a little confused. When exactly does this happen? Moving the comments from below up closer to the top would be good.

revans2 · 2021-11-29T16:07:51Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala

+        //      GpuSubqueryBroadcast -> GpuBroadcastExchange -> GpuPlanStack...
+        override def convertToGpu(): GpuExec = s.child match {
+          case ex @ BroadcastExchangeExec(_, c2r: GpuColumnarToRowExecParent) =>
+            val exMeta = new GpuBroadcastMeta(ex.copy(child = c2r.child), conf, p, r)


nit: we do this twice. Once to tag and once here. It would be nice if we could cache it so we are not wasting work in the common case.

revans2 · 2021-11-29T16:16:31Z

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuSubqueryBroadcastExec.scala

+      SQLExecution.withExecutionId(sparkSession, executionId) {
+        withResource(new NvtxWithMetrics("broadcast collect", NvtxColor.GREEN,
+          collectTime)) { _ =>
+          val serBatch = child.executeBroadcast[SerializeConcatHostBuffersDeserializeBatch]().value


This is running on the driver, but assumes that it has access to a GPU. It does not. We have to do any/all transformation on the CPU.

Hi @revans2, I overlooked the access of GPU at the first time. I refactored the implementation. For now, the GpuSubqueryBroadcast is entirely on host.

Could we please try to test this on the YARN cluster or some place where there is no GPU and ideally no CUDA on the nodes that are running the driver? I just want to be sure that we don't accidentally initialize the CUDA context when we try to touch the HostColumnVectors. I think in the other places we only touched buffers.

Hi @revans2, I ran a spark yarn test with a driver image which didn't contain NVIDIA-driver and CUDA. The GpuSubqueryBroadcast didn't throw any exception.

sperlingxx · 2021-12-03T07:25:37Z

build

…broadcast

sperlingxx · 2021-12-07T10:29:40Z

build

revans2

I think this is better than we have today, but because this is related to DPP and AQE I really would like more eyes on this. @jlowe and @andygrove could you also take a look?

revans2 · 2021-12-07T12:53:04Z

integration_tests/src/main/python/dpp_test.py

+# When AQE enabled, the broadcast exchange can not be reused in current, because spark-rapids
+# will plan GpuBroadcastToCpu for exchange reuse. Meanwhile, the original broadcast exchange is
+# simply replaced by GpuBroadcastExchange. Therefore, the reuse can not work since
+# GpuBroadcastToCpu is not semantically equal to GpuBroadcastExchange.


Sounds like a good plan.

…broadcast

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

sperlingxx · 2021-12-13T13:44:00Z

build

andygrove · 2021-12-16T17:55:51Z

I think this is better than we have today, but because this is related to DPP and AQE I really would like more eyes on this. @jlowe and @andygrove could you also take a look?

Sorry, I missed this notification. I am going to review this today.

andygrove

LGTM. I would like to review the follow-on PR related to refactoring how we handle GpuBroadcastToCpu as well.

sperlingxx added 2 commits November 17, 2021 17:53

cache

96ac231

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

update

49a6d7c

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

sperlingxx requested review from andygrove and revans2 November 18, 2021 10:09

sperlingxx added 3 commits November 19, 2021 11:39

Merge remote-tracking branch 'origin/branch-22.02' into gpu_subquery_…

05a3cc5

…broadcast

Merge remote-tracking branch 'origin/branch-22.02' into gpu_subquery_…

ac123f0

…broadcast

Merge remote-tracking branch 'origin/branch-22.02' into gpu_subquery_…

57f19d7

…broadcast

sperlingxx requested review from GaryShen2008, jlowe, NvTimLiu and tgravescs as code owners November 24, 2021 04:12

sperlingxx changed the base branch from branch-21.12 to branch-22.02 November 24, 2021 04:12

sperlingxx added 2 commits November 26, 2021 11:51

Merge remote-tracking branch 'origin/branch-22.02' into gpu_subquery_…

489cee8

…broadcast

fix style

904af4f

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

pxLi changed the title ~~Support GpuSubqueryBroadcast for DPP~~ Support GpuSubqueryBroadcast for DPP [databricks] Nov 26, 2021

fix

4a73dc2

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

sperlingxx added 2 commits November 29, 2021 15:45

Merge remote-tracking branch 'origin/branch-22.02' into gpu_subquery_…

262664b

…broadcast

fix

466d053

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

revans2 requested changes Nov 29, 2021

View reviewed changes

sperlingxx added 3 commits December 7, 2021 17:21

update

18ec34d

Merge remote-tracking branch 'origin/branch-22.02' into gpu_subquery_…

85c2cc7

…broadcast

fix

a7d8036

sperlingxx added 2 commits December 7, 2021 18:20

remove

f7529f9

fix

544bc47

sperlingxx requested a review from revans2 December 7, 2021 10:43

revans2 previously approved these changes Dec 7, 2021

View reviewed changes

Merge remote-tracking branch 'origin/branch-22.02' into gpu_subquery_…

bbcaf43

…broadcast

sameerz added performance A performance related task/issue and removed performance A performance related task/issue labels Dec 10, 2021

update

809bf17

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

sperlingxx dismissed revans2’s stale review via 809bf17 December 13, 2021 11:43

andygrove approved these changes Dec 16, 2021

View reviewed changes

sperlingxx merged commit 0953911 into NVIDIA:branch-22.02 Dec 17, 2021

sperlingxx deleted the gpu_subquery_broadcast branch December 17, 2021 01:45

tgravescs mentioned this pull request Dec 17, 2021

[BUG] dpp tests fail #4381

Closed

sperlingxx mentioned this pull request Dec 29, 2021

[FEA] Enable GPU broadcast exchange reuse for DPP when AQE enabled #4439

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support GpuSubqueryBroadcast for DPP [databricks] #4150

Support GpuSubqueryBroadcast for DPP [databricks] #4150

sperlingxx commented Nov 18, 2021 •

edited

Loading

sperlingxx commented Nov 18, 2021

sperlingxx commented Nov 24, 2021

sperlingxx commented Nov 26, 2021

pxLi commented Nov 26, 2021

pxLi commented Nov 26, 2021

sperlingxx commented Nov 29, 2021

sperlingxx commented Nov 29, 2021

revans2 Nov 29, 2021

sperlingxx Dec 7, 2021

revans2 Nov 29, 2021

sperlingxx Dec 7, 2021

revans2 Dec 7, 2021

revans2 Nov 29, 2021

sperlingxx Dec 7, 2021

revans2 Nov 29, 2021

sperlingxx Dec 7, 2021

revans2 Nov 29, 2021

sperlingxx Dec 7, 2021

revans2 Nov 29, 2021

sperlingxx Dec 7, 2021

revans2 Dec 7, 2021

sperlingxx Dec 13, 2021 •

edited

Loading

sperlingxx commented Dec 3, 2021

sperlingxx commented Dec 7, 2021

revans2 left a comment

revans2 Dec 7, 2021

sperlingxx commented Dec 13, 2021

andygrove commented Dec 16, 2021

andygrove left a comment

Support GpuSubqueryBroadcast for DPP [databricks] #4150

Support GpuSubqueryBroadcast for DPP [databricks] #4150

Conversation

sperlingxx commented Nov 18, 2021 • edited Loading

sperlingxx commented Nov 18, 2021

sperlingxx commented Nov 24, 2021

sperlingxx commented Nov 26, 2021

pxLi commented Nov 26, 2021

pxLi commented Nov 26, 2021

sperlingxx commented Nov 29, 2021

sperlingxx commented Nov 29, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sperlingxx Dec 13, 2021 • edited Loading

Choose a reason for hiding this comment

sperlingxx commented Dec 3, 2021

sperlingxx commented Dec 7, 2021

revans2 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sperlingxx commented Dec 13, 2021

andygrove commented Dec 16, 2021

andygrove left a comment

Choose a reason for hiding this comment

sperlingxx commented Nov 18, 2021 •

edited

Loading

sperlingxx Dec 13, 2021 •

edited

Loading