Remove cudf jar from more docs/notebooks

Signed-off-by: Hao Zhu <hazhu@nvidia.com>
NVIDIA · May 31, 2022 · eda4762 · eda4762
1 parent be496cd
commit eda4762
Show file tree

Hide file tree

Showing 6 changed files with 8 additions and 24 deletions.
diff --git a/docs/additional-functionality/rapids-shuffle.md b/docs/additional-functionality/rapids-shuffle.md
@@ -298,7 +298,7 @@ In this section, we are using a docker container built using the sample dockerfi
     --conf spark.shuffle.manager=com.nvidia.spark.rapids.[shim package].RapidsShuffleManager \
     --conf spark.shuffle.service.enabled=false \
     --conf spark.dynamicAllocation.enabled=false \
-    --conf spark.executor.extraClassPath=${SPARK_CUDF_JAR}:${SPARK_RAPIDS_PLUGIN_JAR} \
+    --conf spark.executor.extraClassPath=${SPARK_RAPIDS_PLUGIN_JAR} \
     --conf spark.executorEnv.UCX_ERROR_SIGNALS= \
     --conf spark.executorEnv.UCX_MEMTYPE_CACHE=n
     ```
@@ -310,7 +310,7 @@ In this section, we are using a docker container built using the sample dockerfi
     --conf spark.shuffle.manager=com.nvidia.spark.rapids.[shim package].RapidsShuffleManager \
     --conf spark.shuffle.service.enabled=false \
     --conf spark.dynamicAllocation.enabled=false \
-    --conf spark.executor.extraClassPath=${SPARK_CUDF_JAR}:${SPARK_RAPIDS_PLUGIN_JAR} \
+    --conf spark.executor.extraClassPath=${SPARK_RAPIDS_PLUGIN_JAR} \
     --conf spark.executorEnv.UCX_ERROR_SIGNALS= \
     --conf spark.executorEnv.UCX_MEMTYPE_CACHE=n \
     --conf spark.executorEnv.UCX_IB_RX_QUEUE_LEN=1024 \

diff --git a/docs/demo/GCP/mortgage-xgboost4j-gpu-scala.ipynb b/docs/demo/GCP/mortgage-xgboost4j-gpu-scala.ipynb
@@ -62,7 +62,7 @@
     {
       "cell_type": "markdown",
       "metadata": {},
-      "source": "## Create a new spark session and load data\n\nA new spark session should be created to continue all the following spark operations.\n\nNOTE: in this notebook, the dependency jars have been loaded when installing toree kernel. Alternatively the jars can be loaded into notebook by [%AddJar magic](https://toree.incubator.apache.org/docs/current/user/faq/). However, there\u0027s one restriction for `%AddJar`: the jar uploaded can only be available when `AddJar` is called just after a new spark session is created. Do it as below:\n\n```scala\nimport org.apache.spark.sql.SparkSession\nval spark \u003d SparkSession.builder().appName(\"mortgage-GPU\").getOrCreate\n%AddJar file:/data/libs/cudf-XXX-cuda10.jar\n%AddJar file:/data/libs/rapids-4-spark-XXX.jar\n%AddJar file:/data/libs/xgboost4j_3.0-XXX.jar\n%AddJar file:/data/libs/xgboost4j-spark_3.0-XXX.jar\n// ...\n```\n\n##### Please note the new jar \"rapids-4-spark-XXX.jar\" is only needed for GPU version, you can not add it to dependence list for CPU version."
+      "source": "## Create a new spark session and load data\n\nA new spark session should be created to continue all the following spark operations.\n\nNOTE: in this notebook, the dependency jars have been loaded when installing toree kernel. Alternatively the jars can be loaded into notebook by [%AddJar magic](https://toree.incubator.apache.org/docs/current/user/faq/). However, there\u0027s one restriction for `%AddJar`: the jar uploaded can only be available when `AddJar` is called just after a new spark session is created. Do it as below:\n\n```scala\nimport org.apache.spark.sql.SparkSession\nval spark \u003d SparkSession.builder().appName(\"mortgage-GPU\").getOrCreate\n%AddJar file:/data/libs/rapids-4-spark-XXX.jar\n%AddJar file:/data/libs/xgboost4j_3.0-XXX.jar\n%AddJar file:/data/libs/xgboost4j-spark_3.0-XXX.jar\n// ...\n```\n\n##### Please note the new jar \"rapids-4-spark-XXX.jar\" is only needed for GPU version, you can not add it to dependence list for CPU version."
     },
     {
       "cell_type": "code",

diff --git a/docs/demo/GCP/mortgage-xgboost4j-gpu-scala.zpln b/docs/demo/GCP/mortgage-xgboost4j-gpu-scala.zpln
@@ -250,7 +250,7 @@
       "$$hashKey": "object:11091"
     },
     {
-      "text": "%md\n## Create a new spark session and load data\n\nA new spark session should be created to continue all the following spark operations.\n\nNOTE: in this notebook, the dependency jars have been loaded when installing toree kernel. Alternatively the jars can be loaded into notebook by [%AddJar magic](https://toree.incubator.apache.org/docs/current/user/faq/). However, there's one restriction for `%AddJar`: the jar uploaded can only be available when `AddJar` is called just after a new spark session is created. Do it as below:\n\n```scala\nimport org.apache.spark.sql.SparkSession\nval spark = SparkSession.builder().appName(\"mortgage-GPU\").getOrCreate\n%AddJar file:/data/libs/cudf-XXX-cuda10.jar\n%AddJar file:/data/libs/rapids-4-spark-XXX.jar\n%AddJar file:/data/libs/xgboost4j_3.0-XXX.jar\n%AddJar file:/data/libs/xgboost4j-spark_3.0-XXX.jar\n// ...\n```\n\n##### Please note the new jar \"rapids-4-spark-XXX.jar\" is only needed for GPU version, you can not add it to dependence list for CPU version.",
+      "text": "%md\n## Create a new spark session and load data\n\nA new spark session should be created to continue all the following spark operations.\n\nNOTE: in this notebook, the dependency jars have been loaded when installing toree kernel. Alternatively the jars can be loaded into notebook by [%AddJar magic](https://toree.incubator.apache.org/docs/current/user/faq/). However, there's one restriction for `%AddJar`: the jar uploaded can only be available when `AddJar` is called just after a new spark session is created. Do it as below:\n\n```scala\nimport org.apache.spark.sql.SparkSession\nval spark = SparkSession.builder().appName(\"mortgage-GPU\").getOrCreate\n%AddJar file:/data/libs/rapids-4-spark-XXX.jar\n%AddJar file:/data/libs/xgboost4j_3.0-XXX.jar\n%AddJar file:/data/libs/xgboost4j-spark_3.0-XXX.jar\n// ...\n```\n\n##### Please note the new jar \"rapids-4-spark-XXX.jar\" is only needed for GPU version, you can not add it to dependence list for CPU version.",
       "user": "anonymous",
       "dateUpdated": "2020-07-13T02:18:47+0000",
       "config": {
@@ -274,7 +274,7 @@
         "msg": [
           {
             "type": "HTML",
-            "data": "<div class=\"markdown-body\">\n<h2>Create a new spark session and load data</h2>\n<p>A new spark session should be created to continue all the following spark operations.</p>\n<p>NOTE: in this notebook, the dependency jars have been loaded when installing toree kernel. Alternatively the jars can be loaded into notebook by <a href=\"https://toree.incubator.apache.org/docs/current/user/faq/\">%AddJar magic</a>. However, there&rsquo;s one restriction for <code>%AddJar</code>: the jar uploaded can only be available when <code>AddJar</code> is called just after a new spark session is created. Do it as below:</p>\n<pre><code class=\"language-scala\">import org.apache.spark.sql.SparkSession\nval spark = SparkSession.builder().appName(&quot;mortgage-GPU&quot;).getOrCreate\n%AddJar file:/data/libs/cudf-XXX-cuda10.jar\n%AddJar file:/data/libs/rapids-4-spark-XXX.jar\n%AddJar file:/data/libs/xgboost4j_3.0-XXX.jar\n%AddJar file:/data/libs/xgboost4j-spark_3.0-XXX.jar\n// ...\n</code></pre>\n<h5>Please note the new jar &ldquo;rapids-4-spark-XXX.jar&rdquo; is only needed for GPU version, you can not add it to dependence list for CPU version.</h5>\n\n</div>"
+            "data": "<div class=\"markdown-body\">\n<h2>Create a new spark session and load data</h2>\n<p>A new spark session should be created to continue all the following spark operations.</p>\n<p>NOTE: in this notebook, the dependency jars have been loaded when installing toree kernel. Alternatively the jars can be loaded into notebook by <a href=\"https://toree.incubator.apache.org/docs/current/user/faq/\">%AddJar magic</a>. However, there&rsquo;s one restriction for <code>%AddJar</code>: the jar uploaded can only be available when <code>AddJar</code> is called just after a new spark session is created. Do it as below:</p>\n<pre><code class=\"language-scala\">import org.apache.spark.sql.SparkSession\nval spark = SparkSession.builder().appName(&quot;mortgage-GPU&quot;).getOrCreate\n%AddJar file:/data/libs/rapids-4-spark-XXX.jar\n%AddJar file:/data/libs/xgboost4j_3.0-XXX.jar\n%AddJar file:/data/libs/xgboost4j-spark_3.0-XXX.jar\n// ...\n</code></pre>\n<h5>Please note the new jar &ldquo;rapids-4-spark-XXX.jar&rdquo; is only needed for GPU version, you can not add it to dependence list for CPU version.</h5>\n\n</div>"
           }
         ]
       },

diff --git a/docs/dev/nvtx_profiling.md b/docs/dev/nvtx_profiling.md
@@ -10,22 +10,7 @@ once captured can be visually analyzed using
 [NVIDIA NSight Systems](https://developer.nvidia.com/nsight-systems).
 This document is specific to the RAPIDS Spark Plugin profiling.
 
-### STEP 1:
-
-In order to get NVTX ranges to work you need to recompile your cuDF with NVTX flag enabled:
-
-```
-//from the cpp/build directory
-
-cmake .. -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX -DCMAKE_CXX11_ABI=ON -DUSE_NVTX=1
-
-make -j <num_threads>
-```
-If you are using the java cuDF layer, recompile your jar as usual using maven.
-```
-mvn clean package -DskipTests
-```
-### STEP 2:
+### STEPS:
 
 We need to pass a flag to the spark executors / driver in order to enable NVTX collection.
 This can be done for spark shell by adding the following configuration keys:

diff --git a/docs/get-started/getting-started-databricks.md b/docs/get-started/getting-started-databricks.md
@@ -160,7 +160,7 @@ cluster.
     ```
 
 7. Once you’ve added the Spark config, click “Confirm and Restart”.
-8. Once the cluster comes back up, it is now enabled for GPU-accelerated Spark with RAPIDS and cuDF.
+8. Once the cluster comes back up, it is now enabled for GPU-accelerated Spark.
 
 ## Import the GPU Mortgage Example Notebook
 Import the example [notebook](../demo/gpu-mortgage_accelerated.ipynb) from the repo into your

diff --git a/docs/get-started/getting-started-gcp.md b/docs/get-started/getting-started-gcp.md
@@ -108,8 +108,7 @@ rest as a training set, saving to respective GCS locations.  Using the default n
 configuration the first stage should take ~110 seconds (1/3 of CPU execution time with same config)
 and the second stage takes ~170 seconds (1/7 of CPU execution time with same config).  The notebook
 depends on the pre-compiled [Spark RAPIDS SQL
-plugin](https://mvnrepository.com/artifact/com.nvidia/rapids-4-spark) and
-[cuDF](https://mvnrepository.com/artifact/ai.rapids/cudf), which are pre-downloaded by the GCP
+plugin](https://mvnrepository.com/artifact/com.nvidia/rapids-4-spark) which is pre-downloaded by the GCP
 Dataproc [RAPIDS init
 script](https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/rapids).