Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transition to v2 shims [Databricks] #4857

Merged
merged 35 commits into from
Mar 8, 2022
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
6137ef7
301 compiling
razajafri Feb 2, 2022
b4ceda0
Merged the shims
razajafri Feb 3, 2022
68c9a3b
db build errors
razajafri Feb 18, 2022
06c98dd
renamed folder
razajafri Feb 18, 2022
bb5c3b8
modify binary-dedupe.sh to reflect the new package and remove call to…
razajafri Feb 19, 2022
c003b06
some more changes to return the static classname for shuffle managers
razajafri Feb 23, 2022
cc671b5
init shim when getting shufflemanager
razajafri Feb 23, 2022
ca5c0be
getVersion changes
razajafri Feb 23, 2022
e4dd885
add hypot back
razajafri Feb 23, 2022
2282417
removed buildShim
razajafri Feb 24, 2022
69dc48d
clean up
razajafri Feb 24, 2022
2c23552
removed package v2
razajafri Feb 24, 2022
de8a56b
reference the correct package
razajafri Feb 24, 2022
980eba3
removed duplicate versions of RapidsShuffleManager
razajafri Feb 25, 2022
a2a6b7e
addressed review comments
razajafri Feb 25, 2022
cf5bd29
fix db build
razajafri Feb 26, 2022
6dc37c3
Revert "fix db build"
razajafri Feb 28, 2022
d1c0fd8
Revert "addressed review comments"
razajafri Feb 28, 2022
e485926
Revert "removed duplicate versions of RapidsShuffleManager"
razajafri Feb 28, 2022
3014c21
removed the non-existent folder
razajafri Feb 28, 2022
38aca27
removed unused import
razajafri Feb 28, 2022
1c6b7f8
reverted shuffle manager and internal manager change
razajafri Feb 28, 2022
f1bbbed
revert spark2diffs changes
razajafri Mar 1, 2022
26f1368
Fix 301db build
razajafri Mar 1, 2022
3b3ed5a
removed reference of ShimLoader.getSparkShims from doc
razajafri Mar 1, 2022
d941b83
Revert 312db build fix
razajafri Mar 1, 2022
1846c58
merge
razajafri Mar 7, 2022
568e4f6
merge conflicts
razajafri Mar 7, 2022
b873c53
fix db build
razajafri Mar 7, 2022
02bd9e5
fix 301db
razajafri Mar 7, 2022
3f8fb9b
fixed 304
razajafri Mar 7, 2022
8f6dce4
fixed 330 build errors
razajafri Mar 8, 2022
57b5dbd
Merge remote-tracking branch 'origin/branch-22.04' into shim-work-2
razajafri Mar 8, 2022
6469765
Merge remote-tracking branch 'origin/branch-22.04' into shim-work-2
razajafri Mar 8, 2022
e36248d
fixed imports
razajafri Mar 8, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2020-2021, NVIDIA CORPORATION.
* Copyright (c) 2020-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -20,6 +20,7 @@ import scala.reflect.api
import scala.reflect.runtime.universe._

import com.nvidia.spark.rapids._
import com.nvidia.spark.rapids.shims.SparkShimImpl

import org.apache.spark.internal.Logging

Expand Down Expand Up @@ -70,7 +71,7 @@ object ApiValidation extends Logging {
var printNewline = false

val sparkToShimMap = Map("3.0.1" -> "spark301", "3.1.1" -> "spark311")
val sparkVersion = ShimLoader.getSparkShims.getSparkShimVersion.toString
val sparkVersion = SparkShimImpl.getSparkShimVersion.toString
val shimVersion = sparkToShimMap(sparkVersion)

gpuKeys.foreach { e =>
Expand Down
7 changes: 2 additions & 5 deletions dist/scripts/binary-dedupe.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/bin/bash

# Copyright (c) 2021, NVIDIA CORPORATION.
# Copyright (c) 2021-2022, NVIDIA CORPORATION.
tgravescs marked this conversation as resolved.
Show resolved Hide resolved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -170,10 +170,7 @@ function verify_same_sha_for_unshimmed() {

class_file_quoted=$(printf '%q' "$class_file")

# TODO currently RapidsShuffleManager is "removed" from /spark3* by construction in
# dist pom.xml via ant. We could delegate this logic to this script
# and make both simmpler
if [[ ! "$class_file_quoted" =~ (com/nvidia/spark/rapids/spark3.*/.*ShuffleManager.class|org/apache/spark/sql/rapids/shims/spark3.*/ProxyRapidsShuffleInternalManager.class) ]]; then
if [[ ! "$class_file_quoted" =~ (com/nvidia/spark/rapids/.*ShuffleManager.class|org/apache/spark/sql/rapids/shims/ProxyRapidsShuffleInternalManager.class) ]]; then

if ! grep -q "/spark.\+/$class_file_quoted" "$SPARK3XX_COMMON_TXT"; then
echo >&2 "$class_file is not bitwise-identical across shims"
Expand Down
4 changes: 2 additions & 2 deletions dist/unshimmed-from-each-spark3xx.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
com/nvidia/spark/rapids/*/RapidsShuffleManager*
org/apache/spark/sql/rapids/shims/*/ProxyRapidsShuffleInternalManager*
com/nvidia/spark/rapids/RapidsShuffleManager*
org/apache/spark/sql/rapids/shims/ProxyRapidsShuffleInternalManager*
spark-*-info.properties
30 changes: 5 additions & 25 deletions docs/additional-functionality/rapids-shuffle.md
Original file line number Diff line number Diff line change
Expand Up @@ -282,30 +282,13 @@ In this section, we are using a docker container built using the sample dockerfi

### Spark App Configuration

1. Choose the version of the shuffle manager that matches your Spark version.
Currently we support:

| Spark Shim | spark.shuffle.manager value |
| --------------| -------------------------------------------------------- |
| 3.0.1 | com.nvidia.spark.rapids.spark301.RapidsShuffleManager |
| 3.0.2 | com.nvidia.spark.rapids.spark302.RapidsShuffleManager |
Copy link
Collaborator

@gerashegalov gerashegalov Feb 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did we end up solving incompatible ShuffleManager base across Apache Spark versions. I thought having a single facade class across shims required dynamic bytecode generation at Runtime depending on the shim because compile-time method signatures cannot be consistent for each Spark version?

I believe one of the issues in 30x vs 32x

where there is a reference to ShuffleBlockResolver https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/shuffle/ShuffleManager.scala#L92

where in 31x we have a transitive reference to MergedBlockMeta https://github.com/apache/spark/blob/branch-3.2/core/src/main/scala/org/apache/spark/shuffle/ShuffleBlockResolver.scala#L56

However there is no MergedBlockMeta in branch-3.0. And we should get a class verification exception

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested everything after 3.1.1+ and it worked. I have reverted the change as it wasn't my original intention to remove the duplicate ShuffleManagers. The original intention was to get rid of the dynamic lookup of the shims at runtime.

| 3.0.3 | com.nvidia.spark.rapids.spark303.RapidsShuffleManager |
| 3.1.1 | com.nvidia.spark.rapids.spark311.RapidsShuffleManager |
| 3.1.1 CDH | com.nvidia.spark.rapids.spark311cdh.RapidsShuffleManager |
| 3.1.2 | com.nvidia.spark.rapids.spark312.RapidsShuffleManager |
| 3.1.3 | com.nvidia.spark.rapids.spark313.RapidsShuffleManager |
| 3.2.0 | com.nvidia.spark.rapids.spark320.RapidsShuffleManager |
| 3.2.1 | com.nvidia.spark.rapids.spark321.RapidsShuffleManager |
| Databricks 7.3| com.nvidia.spark.rapids.spark301db.RapidsShuffleManager |
| Databricks 9.1| com.nvidia.spark.rapids.spark312db.RapidsShuffleManager |

2. Settings for UCX 1.11.2+:

Minimum configuration:
Settings for UCX 1.11.2+:

Minimum configuration:

```shell
...
--conf spark.shuffle.manager=com.nvidia.spark.rapids.[shim package].RapidsShuffleManager \
--conf spark.shuffle.manager=com.nvidia.spark.rapids.RapidsShuffleManager \
--conf spark.shuffle.service.enabled=false \
--conf spark.dynamicAllocation.enabled=false \
--conf spark.executor.extraClassPath=${SPARK_CUDF_JAR}:${SPARK_RAPIDS_PLUGIN_JAR} \
Expand All @@ -317,7 +300,7 @@ In this section, we are using a docker container built using the sample dockerfi

```shell
...
--conf spark.shuffle.manager=com.nvidia.spark.rapids.[shim package].RapidsShuffleManager \
--conf spark.shuffle.manager=com.nvidia.spark.rapids.RapidsShuffleManager \
--conf spark.shuffle.service.enabled=false \
--conf spark.dynamicAllocation.enabled=false \
--conf spark.executor.extraClassPath=${SPARK_CUDF_JAR}:${SPARK_RAPIDS_PLUGIN_JAR} \
Expand All @@ -329,9 +312,6 @@ In this section, we are using a docker container built using the sample dockerfi
--conf spark.executorEnv.UCX_MAX_RNDV_RAILS=1
```

Please replace `[shim package]` with the appropriate value. For example, the full class name for
Apache Spark 3.1.3 is: `com.nvidia.spark.rapids.spark313.RapidsShuffleManager`.

Please note `LD_LIBRARY_PATH` should optionally be set if the UCX library is installed in a
non-standard location.

Expand Down
2 changes: 1 addition & 1 deletion docs/dev/shims.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ Upstream base classes we derive from might be incompatible in the sense that one
requires us to implement/override the method `M` whereas the other prohibits it by marking
the base implementation `final`, E.g. `org.apache.spark.sql.catalyst.trees.TreeNode` changes
between Spark 3.1.x and Spark 3.2.x. So instead of deriving from such classes directly we
inject an intermediate trait e.g. `com.nvidia.spark.rapids.shims.v2.ShimExpression` that
inject an intermediate trait e.g. `com.nvidia.spark.rapids.shims.ShimExpression` that
has a varying source code depending on the Spark version we compile against to overcome this
issue as you can see e.g., comparing TreeNode:
1. [ShimExpression For 3.0.x and 3.1.x](https://github.com/NVIDIA/spark-rapids/blob/main/sql-plugin/src/main/post320-treenode/scala/com/nvidia/spark/rapids/shims/v2/TreeNode.scala#L23)
Expand Down
2 changes: 1 addition & 1 deletion jenkins/spark-premerge-build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ rapids_shuffle_smoke_test() {
PYSP_TEST_spark_cores_max=2 \
PYSP_TEST_spark_executor_cores=1 \
SPARK_SUBMIT_FLAGS="--conf spark.executorEnv.UCX_ERROR_SIGNALS=" \
PYSP_TEST_spark_shuffle_manager=com.nvidia.spark.rapids.$SHUFFLE_SPARK_SHIM.RapidsShuffleManager \
PYSP_TEST_spark_shuffle_manager=com.nvidia.spark.rapids.RapidsShuffleManager \
PYSP_TEST_spark_rapids_memory_gpu_minAllocFraction=0 \
PYSP_TEST_spark_rapids_memory_gpu_maxAllocFraction=0.1 \
PYSP_TEST_spark_rapids_memory_gpu_allocFraction=0.1 \
Expand Down
25 changes: 22 additions & 3 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,8 @@
<sources>
<source>${project.basedir}/src/main/301+-nondb/scala</source>
<source>${project.basedir}/src/main/301/scala</source>
<source>${project.basedir}/src/main/301until304/scala</source>
tgravescs marked this conversation as resolved.
Show resolved Hide resolved
<source>${project.basedir}/src/main/30X/scala</source>
<source>${project.basedir}/src/main/301until310-all/scala</source>
<source>${project.basedir}/src/main/301until310-nondb/scala</source>
<source>${project.basedir}/src/main/301until320-all/scala</source>
Expand Down Expand Up @@ -164,6 +166,8 @@
<sources>
<source>${project.basedir}/src/main/301+-nondb/scala</source>
<source>${project.basedir}/src/main/302/scala</source>
<source>${project.basedir}/src/main/301until304/scala</source>
<source>${project.basedir}/src/main/30X/scala</source>
<source>${project.basedir}/src/main/301until310-all/scala</source>
<source>${project.basedir}/src/main/301until310-nondb/scala</source>
<source>${project.basedir}/src/main/301until320-all/scala</source>
Expand Down Expand Up @@ -222,6 +226,8 @@
<sources>
<source>${project.basedir}/src/main/301+-nondb/scala</source>
<source>${project.basedir}/src/main/303/scala</source>
<source>${project.basedir}/src/main/301until304/scala</source>
<source>${project.basedir}/src/main/30X/scala</source>
<source>${project.basedir}/src/main/301until310-all/scala</source>
<source>${project.basedir}/src/main/301until310-nondb/scala</source>
<source>${project.basedir}/src/main/301until320-all/scala</source>
Expand Down Expand Up @@ -275,6 +281,7 @@
<sources>
<source>${project.basedir}/src/main/301+-nondb/scala</source>
<source>${project.basedir}/src/main/304/scala</source>
<source>${project.basedir}/src/main/30X/scala</source>
<source>${project.basedir}/src/main/301until310-all/scala</source>
<source>${project.basedir}/src/main/301until310-nondb/scala</source>
<source>${project.basedir}/src/main/301until320-all/scala</source>
Expand Down Expand Up @@ -327,7 +334,8 @@
<configuration>
<sources>
<source>${project.basedir}/src/main/301+-nondb/scala</source>
<source>${project.basedir}/src/main/311/scala</source>
<source>${project.basedir}/src/main/311-nondb/scala</source>
<source>${project.basedir}/src/main/31X-33X/scala</source>
<source>${project.basedir}/src/main/301until320-all/scala</source>
<source>${project.basedir}/src/main/301until320-noncdh/scala</source>
<source>${project.basedir}/src/main/301until320-nondb/scala</source>
Expand Down Expand Up @@ -457,6 +465,7 @@
<configuration>
<sources>
<source>${project.basedir}/src/main/301until320-all/scala</source>
<source>${project.basedir}/src/main/30X-33X/scala</source>
<source>${project.basedir}/src/main/301until320-noncdh/scala</source>
<source>${project.basedir}/src/main/312db/scala</source>
<source>${project.basedir}/src/main/301until330-all/scala</source>
Expand Down Expand Up @@ -509,7 +518,8 @@
<configuration>
<sources>
<source>${project.basedir}/src/main/301+-nondb/scala</source>
<source>${project.basedir}/src/main/312/scala</source>
<source>${project.basedir}/src/main/312-nondb/scala</source>
<source>${project.basedir}/src/main/31X-33X/scala</source>
<source>${project.basedir}/src/main/301until320-all/scala</source>
<source>${project.basedir}/src/main/301until320-noncdh/scala</source>
<source>${project.basedir}/src/main/301until320-nondb/scala</source>
Expand Down Expand Up @@ -568,6 +578,7 @@
<sources>
<source>${project.basedir}/src/main/301+-nondb/scala</source>
<source>${project.basedir}/src/main/313/scala</source>
<source>${project.basedir}/src/main/31X-33X/scala</source>
<source>${project.basedir}/src/main/301until320-all/scala</source>
<source>${project.basedir}/src/main/301until320-noncdh/scala</source>
<source>${project.basedir}/src/main/301until320-nondb/scala</source>
Expand Down Expand Up @@ -625,6 +636,7 @@
<configuration>
<sources>
<source>${project.basedir}/src/main/301+-nondb/scala</source>
<source>${project.basedir}/src/main/31X-33X/scala</source>
<source>${project.basedir}/src/main/320/scala</source>
<source>${project.basedir}/src/main/301until330-all/scala</source>
<source>${project.basedir}/src/main/311+-all/scala</source>
Expand Down Expand Up @@ -688,11 +700,11 @@
<configuration>
<sources>
<source>${project.basedir}/src/main/301+-nondb/scala</source>
<source>${project.basedir}/src/main/31X-33X/scala</source>
<source>${project.basedir}/src/main/321/scala</source>
<source>${project.basedir}/src/main/301until330-all/scala</source>
<source>${project.basedir}/src/main/311+-all/scala</source>
<source>${project.basedir}/src/main/311+-nondb/scala</source>
<source>${project.basedir}/src/main/311until330-all/scala</source>
<source>${project.basedir}/src/main/320+/scala</source>
<source>${project.basedir}/src/main/321+/scala</source>
<source>${project.basedir}/src/main/post320-treenode/scala</source>
Expand Down Expand Up @@ -751,13 +763,15 @@
<configuration>
<sources>
<source>${project.basedir}/src/main/301+-nondb/scala</source>
<source>${project.basedir}/src/main/31X-33X/scala</source>
<source>${project.basedir}/src/main/322/scala</source>
<source>${project.basedir}/src/main/301until330-all/scala</source>
<source>${project.basedir}/src/main/311+-all/scala</source>
<source>${project.basedir}/src/main/311+-nondb/scala</source>
<source>${project.basedir}/src/main/311until330-all/scala</source>
<source>${project.basedir}/src/main/320+/scala</source>
<source>${project.basedir}/src/main/321+/scala</source>
<source>${project.basedir}/src/main/322+/scala</source>
<source>${project.basedir}/src/main/post320-treenode/scala</source>
</sources>
</configuration>
Expand Down Expand Up @@ -814,11 +828,13 @@
<configuration>
<sources>
<source>${project.basedir}/src/main/301+-nondb/scala</source>
<source>${project.basedir}/src/main/31X-33X/scala</source>
<source>${project.basedir}/src/main/330/scala</source>
<source>${project.basedir}/src/main/311+-all/scala</source>
<source>${project.basedir}/src/main/311+-nondb/scala</source>
<source>${project.basedir}/src/main/320+/scala</source>
<source>${project.basedir}/src/main/321+/scala</source>
<source>${project.basedir}/src/main/322+/scala</source>
<source>${project.basedir}/src/main/330+/scala</source>
<source>${project.basedir}/src/main/post320-treenode/scala</source>
</sources>
Expand Down Expand Up @@ -876,6 +892,9 @@
<configuration>
<sources>
<source>${project.basedir}/src/main/301+-nondb/scala</source>
<source>${project.basedir}/src/main/31X-33X/scala</source>
<source>${project.basedir}/src/main/311-nondb/scala</source>
<source>${project.basedir}/src/main/311cdh/scala</source>
<source>${project.basedir}/src/main/301until320-all/scala</source>
<source>${project.basedir}/src/main/301until320-nondb/scala</source>
<source>${project.basedir}/src/main/301until330-all/scala</source>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ final class CastExprMeta[INPUT <: Cast](
// NOOP for anything prior to 3.2.0
case (_: StringType, dt:DecimalType) =>
// Spark 2.x: removed check for
// !ShimLoader.getSparkShims.isCastingStringToNegDecimalScaleSupported
// !SparkShimImpl.isCastingStringToNegDecimalScaleSupported
// this dealt with handling a bug fix that is only in newer versions of Spark
// (https://issues.apache.org/jira/browse/SPARK-37451)
// Since we don't know what version of Spark 3 they will be using
Expand Down
Loading