Skip to content

Commit

Permalink
Remove cudf jar from more docs/notebooks
Browse files Browse the repository at this point in the history
Signed-off-by: Hao Zhu <hazhu@nvidia.com>
  • Loading branch information
viadea committed May 31, 2022
1 parent be496cd commit eda4762
Show file tree
Hide file tree
Showing 6 changed files with 8 additions and 24 deletions.
4 changes: 2 additions & 2 deletions docs/additional-functionality/rapids-shuffle.md
Original file line number Diff line number Diff line change
Expand Up @@ -298,7 +298,7 @@ In this section, we are using a docker container built using the sample dockerfi
--conf spark.shuffle.manager=com.nvidia.spark.rapids.[shim package].RapidsShuffleManager \
--conf spark.shuffle.service.enabled=false \
--conf spark.dynamicAllocation.enabled=false \
--conf spark.executor.extraClassPath=${SPARK_CUDF_JAR}:${SPARK_RAPIDS_PLUGIN_JAR} \
--conf spark.executor.extraClassPath=${SPARK_RAPIDS_PLUGIN_JAR} \
--conf spark.executorEnv.UCX_ERROR_SIGNALS= \
--conf spark.executorEnv.UCX_MEMTYPE_CACHE=n
```
Expand All @@ -310,7 +310,7 @@ In this section, we are using a docker container built using the sample dockerfi
--conf spark.shuffle.manager=com.nvidia.spark.rapids.[shim package].RapidsShuffleManager \
--conf spark.shuffle.service.enabled=false \
--conf spark.dynamicAllocation.enabled=false \
--conf spark.executor.extraClassPath=${SPARK_CUDF_JAR}:${SPARK_RAPIDS_PLUGIN_JAR} \
--conf spark.executor.extraClassPath=${SPARK_RAPIDS_PLUGIN_JAR} \
--conf spark.executorEnv.UCX_ERROR_SIGNALS= \
--conf spark.executorEnv.UCX_MEMTYPE_CACHE=n \
--conf spark.executorEnv.UCX_IB_RX_QUEUE_LEN=1024 \
Expand Down
2 changes: 1 addition & 1 deletion docs/demo/GCP/mortgage-xgboost4j-gpu-scala.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@
{
"cell_type": "markdown",
"metadata": {},
"source": "## Create a new spark session and load data\n\nA new spark session should be created to continue all the following spark operations.\n\nNOTE: in this notebook, the dependency jars have been loaded when installing toree kernel. Alternatively the jars can be loaded into notebook by [%AddJar magic](https://toree.incubator.apache.org/docs/current/user/faq/). However, there\u0027s one restriction for `%AddJar`: the jar uploaded can only be available when `AddJar` is called just after a new spark session is created. Do it as below:\n\n```scala\nimport org.apache.spark.sql.SparkSession\nval spark \u003d SparkSession.builder().appName(\"mortgage-GPU\").getOrCreate\n%AddJar file:/data/libs/cudf-XXX-cuda10.jar\n%AddJar file:/data/libs/rapids-4-spark-XXX.jar\n%AddJar file:/data/libs/xgboost4j_3.0-XXX.jar\n%AddJar file:/data/libs/xgboost4j-spark_3.0-XXX.jar\n// ...\n```\n\n##### Please note the new jar \"rapids-4-spark-XXX.jar\" is only needed for GPU version, you can not add it to dependence list for CPU version."
"source": "## Create a new spark session and load data\n\nA new spark session should be created to continue all the following spark operations.\n\nNOTE: in this notebook, the dependency jars have been loaded when installing toree kernel. Alternatively the jars can be loaded into notebook by [%AddJar magic](https://toree.incubator.apache.org/docs/current/user/faq/). However, there\u0027s one restriction for `%AddJar`: the jar uploaded can only be available when `AddJar` is called just after a new spark session is created. Do it as below:\n\n```scala\nimport org.apache.spark.sql.SparkSession\nval spark \u003d SparkSession.builder().appName(\"mortgage-GPU\").getOrCreate\n%AddJar file:/data/libs/rapids-4-spark-XXX.jar\n%AddJar file:/data/libs/xgboost4j_3.0-XXX.jar\n%AddJar file:/data/libs/xgboost4j-spark_3.0-XXX.jar\n// ...\n```\n\n##### Please note the new jar \"rapids-4-spark-XXX.jar\" is only needed for GPU version, you can not add it to dependence list for CPU version."
},
{
"cell_type": "code",
Expand Down
4 changes: 2 additions & 2 deletions docs/demo/GCP/mortgage-xgboost4j-gpu-scala.zpln
Original file line number Diff line number Diff line change
Expand Up @@ -250,7 +250,7 @@
"$$hashKey": "object:11091"
},
{
"text": "%md\n## Create a new spark session and load data\n\nA new spark session should be created to continue all the following spark operations.\n\nNOTE: in this notebook, the dependency jars have been loaded when installing toree kernel. Alternatively the jars can be loaded into notebook by [%AddJar magic](https://toree.incubator.apache.org/docs/current/user/faq/). However, there's one restriction for `%AddJar`: the jar uploaded can only be available when `AddJar` is called just after a new spark session is created. Do it as below:\n\n```scala\nimport org.apache.spark.sql.SparkSession\nval spark = SparkSession.builder().appName(\"mortgage-GPU\").getOrCreate\n%AddJar file:/data/libs/cudf-XXX-cuda10.jar\n%AddJar file:/data/libs/rapids-4-spark-XXX.jar\n%AddJar file:/data/libs/xgboost4j_3.0-XXX.jar\n%AddJar file:/data/libs/xgboost4j-spark_3.0-XXX.jar\n// ...\n```\n\n##### Please note the new jar \"rapids-4-spark-XXX.jar\" is only needed for GPU version, you can not add it to dependence list for CPU version.",
"text": "%md\n## Create a new spark session and load data\n\nA new spark session should be created to continue all the following spark operations.\n\nNOTE: in this notebook, the dependency jars have been loaded when installing toree kernel. Alternatively the jars can be loaded into notebook by [%AddJar magic](https://toree.incubator.apache.org/docs/current/user/faq/). However, there's one restriction for `%AddJar`: the jar uploaded can only be available when `AddJar` is called just after a new spark session is created. Do it as below:\n\n```scala\nimport org.apache.spark.sql.SparkSession\nval spark = SparkSession.builder().appName(\"mortgage-GPU\").getOrCreate\n%AddJar file:/data/libs/rapids-4-spark-XXX.jar\n%AddJar file:/data/libs/xgboost4j_3.0-XXX.jar\n%AddJar file:/data/libs/xgboost4j-spark_3.0-XXX.jar\n// ...\n```\n\n##### Please note the new jar \"rapids-4-spark-XXX.jar\" is only needed for GPU version, you can not add it to dependence list for CPU version.",
"user": "anonymous",
"dateUpdated": "2020-07-13T02:18:47+0000",
"config": {
Expand All @@ -274,7 +274,7 @@
"msg": [
{
"type": "HTML",
"data": "<div class=\"markdown-body\">\n<h2>Create a new spark session and load data</h2>\n<p>A new spark session should be created to continue all the following spark operations.</p>\n<p>NOTE: in this notebook, the dependency jars have been loaded when installing toree kernel. Alternatively the jars can be loaded into notebook by <a href=\"https://toree.incubator.apache.org/docs/current/user/faq/\">%AddJar magic</a>. However, there&rsquo;s one restriction for <code>%AddJar</code>: the jar uploaded can only be available when <code>AddJar</code> is called just after a new spark session is created. Do it as below:</p>\n<pre><code class=\"language-scala\">import org.apache.spark.sql.SparkSession\nval spark = SparkSession.builder().appName(&quot;mortgage-GPU&quot;).getOrCreate\n%AddJar file:/data/libs/cudf-XXX-cuda10.jar\n%AddJar file:/data/libs/rapids-4-spark-XXX.jar\n%AddJar file:/data/libs/xgboost4j_3.0-XXX.jar\n%AddJar file:/data/libs/xgboost4j-spark_3.0-XXX.jar\n// ...\n</code></pre>\n<h5>Please note the new jar &ldquo;rapids-4-spark-XXX.jar&rdquo; is only needed for GPU version, you can not add it to dependence list for CPU version.</h5>\n\n</div>"
"data": "<div class=\"markdown-body\">\n<h2>Create a new spark session and load data</h2>\n<p>A new spark session should be created to continue all the following spark operations.</p>\n<p>NOTE: in this notebook, the dependency jars have been loaded when installing toree kernel. Alternatively the jars can be loaded into notebook by <a href=\"https://toree.incubator.apache.org/docs/current/user/faq/\">%AddJar magic</a>. However, there&rsquo;s one restriction for <code>%AddJar</code>: the jar uploaded can only be available when <code>AddJar</code> is called just after a new spark session is created. Do it as below:</p>\n<pre><code class=\"language-scala\">import org.apache.spark.sql.SparkSession\nval spark = SparkSession.builder().appName(&quot;mortgage-GPU&quot;).getOrCreate\n%AddJar file:/data/libs/rapids-4-spark-XXX.jar\n%AddJar file:/data/libs/xgboost4j_3.0-XXX.jar\n%AddJar file:/data/libs/xgboost4j-spark_3.0-XXX.jar\n// ...\n</code></pre>\n<h5>Please note the new jar &ldquo;rapids-4-spark-XXX.jar&rdquo; is only needed for GPU version, you can not add it to dependence list for CPU version.</h5>\n\n</div>"
}
]
},
Expand Down
17 changes: 1 addition & 16 deletions docs/dev/nvtx_profiling.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,22 +10,7 @@ once captured can be visually analyzed using
[NVIDIA NSight Systems](https://developer.nvidia.com/nsight-systems).
This document is specific to the RAPIDS Spark Plugin profiling.

### STEP 1:

In order to get NVTX ranges to work you need to recompile your cuDF with NVTX flag enabled:

```
//from the cpp/build directory
cmake .. -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX -DCMAKE_CXX11_ABI=ON -DUSE_NVTX=1
make -j <num_threads>
```
If you are using the java cuDF layer, recompile your jar as usual using maven.
```
mvn clean package -DskipTests
```
### STEP 2:
### STEPS:

We need to pass a flag to the spark executors / driver in order to enable NVTX collection.
This can be done for spark shell by adding the following configuration keys:
Expand Down
2 changes: 1 addition & 1 deletion docs/get-started/getting-started-databricks.md
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,7 @@ cluster.
```

7. Once you’ve added the Spark config, click “Confirm and Restart”.
8. Once the cluster comes back up, it is now enabled for GPU-accelerated Spark with RAPIDS and cuDF.
8. Once the cluster comes back up, it is now enabled for GPU-accelerated Spark.

## Import the GPU Mortgage Example Notebook
Import the example [notebook](../demo/gpu-mortgage_accelerated.ipynb) from the repo into your
Expand Down
3 changes: 1 addition & 2 deletions docs/get-started/getting-started-gcp.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,8 +108,7 @@ rest as a training set, saving to respective GCS locations. Using the default n
configuration the first stage should take ~110 seconds (1/3 of CPU execution time with same config)
and the second stage takes ~170 seconds (1/7 of CPU execution time with same config). The notebook
depends on the pre-compiled [Spark RAPIDS SQL
plugin](https://mvnrepository.com/artifact/com.nvidia/rapids-4-spark) and
[cuDF](https://mvnrepository.com/artifact/ai.rapids/cudf), which are pre-downloaded by the GCP
plugin](https://mvnrepository.com/artifact/com.nvidia/rapids-4-spark) which is pre-downloaded by the GCP
Dataproc [RAPIDS init
script](https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/rapids).

Expand Down

0 comments on commit eda4762

Please sign in to comment.