Update cudf dependency to 0.16-SNAPSHOT (NVIDIA#727)

* Update cudf dependency to 0.16-SNAPSHOT Signed-off-by: Jason Lowe <jlowe@nvidia.com> * Update docs to reflect new artifact versions * Exclude rmm_log.txt
nartal1 · Sep 15, 2020 · 36dc282 · 36dc282
1 parent d40cd75
commit 36dc282
Show file tree

Hide file tree

Showing 13 changed files with 20 additions and 18 deletions.
diff --git a/docs/configs.md b/docs/configs.md
@@ -10,7 +10,7 @@ The following is the list of options that `rapids-plugin-4-spark` supports.
 On startup use: `--conf [conf key]=[conf value]`. For example:
 
 ```
-${SPARK_HOME}/bin/spark --jars 'rapids-4-spark_2.12-0.2.0.jar,cudf-0.15-cuda10-1.jar' \
+${SPARK_HOME}/bin/spark --jars 'rapids-4-spark_2.12-0.3.0-SNAPSHOT.jar,cudf-0.16-SNAPSHOT-cuda10-1.jar' \
 --conf spark.plugins=com.nvidia.spark.SQLPlugin \
 --conf spark.rapids.sql.incompatibleOps.enabled=true
 ```

diff --git a/docs/get-started/Dockerfile.cuda b/docs/get-started/Dockerfile.cuda
@@ -53,8 +53,8 @@ COPY spark-3.0.1-bin-hadoop3.2/examples /opt/spark/examples
 COPY spark-3.0.1-bin-hadoop3.2/kubernetes/tests /opt/spark/tests
 COPY spark-3.0.1-bin-hadoop3.2/data /opt/spark/data
 
-COPY cudf-0.15-cuda10-1.jar /opt/sparkRapidsPlugin
-COPY rapids-4-spark_2.12-0.2.0.jar /opt/sparkRapidsPlugin
+COPY cudf-0.16-SNAPSHOT-cuda10-1.jar /opt/sparkRapidsPlugin
+COPY rapids-4-spark_2.12-0.3.0-SNAPSHOT.jar /opt/sparkRapidsPlugin
 COPY getGpusResources.sh /opt/sparkRapidsPlugin
 
 RUN mkdir /opt/spark/python

diff --git a/docs/get-started/getting-started-on-prem.md b/docs/get-started/getting-started-on-prem.md
@@ -55,16 +55,16 @@ CUDA and will not run on other versions. The jars use a maven classifier to keep
 - CUDA 11.0 => classifier cuda11
 
 For example, here is a sample version of the jars and cudf with CUDA 10.1 support:
-- cudf-0.15-cuda10-1.jar
-- rapids-4-spark_2.12-0.2.0.jar
+- cudf-0.16-SNAPSHOT-cuda10-1.jar
+- rapids-4-spark_2.12-0.3.0-SNAPSHOT.jar
 
 
 For simplicity export the location to these jars. This example assumes the sample jars above have
 been placed in the `/opt/sparkRapidsPlugin` directory:
 ```shell 
 export SPARK_RAPIDS_DIR=/opt/sparkRapidsPlugin
-export SPARK_CUDF_JAR=${SPARK_RAPIDS_DIR}/cudf-0.15-cuda10-1.jar
-export SPARK_RAPIDS_PLUGIN_JAR=${SPARK_RAPIDS_DIR}/rapids-4-spark_2.12-0.2.0.jar
+export SPARK_CUDF_JAR=${SPARK_RAPIDS_DIR}/cudf-0.16-SNAPSHOT-cuda10-1.jar
+export SPARK_RAPIDS_PLUGIN_JAR=${SPARK_RAPIDS_DIR}/rapids-4-spark_2.12-0.3.0-SNAPSHOT.jar
 ```
 
 ## Install the GPU Discovery Script
@@ -512,7 +512,7 @@ To enable _GPU Scheduling for Pandas UDF_, you need to configure your spark job
     On Standalone, you need to add
     ```shell
     ...
-    --conf spark.executorEnv.PYTHONPATH=rapids-4-spark_2.12-0.2.0.jar \
+    --conf spark.executorEnv.PYTHONPATH=rapids-4-spark_2.12-0.3.0-SNAPSHOT.jar \
     --py-files ${SPARK_RAPIDS_PLUGIN_JAR}
     ```
 

diff --git a/docs/testing.md b/docs/testing.md
@@ -20,7 +20,7 @@ we typically run with the default options and only increase the scale factor dep
 dbgen -b dists.dss -s 10
 ```
 
-You can include the test jar `rapids-4-spark-integration-tests_2.12-0.2.0.jar` with the
+You can include the test jar `rapids-4-spark-integration-tests_2.12-0.3.0-SNAPSHOT.jar` with the
 Spark --jars option to get the TPCH tests. To setup for the queries you can run 
 `TpchLikeSpark.setupAllCSV` for CSV formatted data or `TpchLikeSpark.setupAllParquet`
 for parquet formatted data.  Both of those take the Spark session, and a path to the dbgen
@@ -83,7 +83,7 @@ individually, so you don't risk running unit tests along with the integration te
 http://www.scalatest.org/user_guide/using_the_scalatest_shell
 
 ```shell 
-spark-shell --jars rapids-4-spark-tests_2.12-0.2.0-tests.jar,rapids-4-spark-integration-tests_2.12-0.2.0-tests.jar,scalatest_2.12-3.0.5.jar,scalactic_2.12-3.0.5.jar
+spark-shell --jars rapids-4-spark-tests_2.12-0.3.0-SNAPSHOT-tests.jar,rapids-4-spark-integration-tests_2.12-0.3.0-SNAPSHOT-tests.jar,scalatest_2.12-3.0.5.jar,scalactic_2.12-3.0.5.jar
 ```
 
 First you import the `scalatest_shell` and tell the tests where they can find the test files you

diff --git a/integration_tests/README.md b/integration_tests/README.md
@@ -49,7 +49,7 @@ Most clusters probably will not have the RAPIDS plugin installed in the cluster
 If just want to verify the SQL replacement is working you will need to add the `rapids-4-spark` and `cudf` jars to your `spark-submit` command.
 
 ```
-$SPARK_HOME/bin/spark-submit --jars "rapids-4-spark_2.12-0.2.0-SNAPSHOT.jar,cudf-0.15.jar" ./runtests.py
+$SPARK_HOME/bin/spark-submit --jars "rapids-4-spark_2.12-0.3.0-SNAPSHOT.jar,cudf-0.16-SNAPSHOT.jar" ./runtests.py
 ```
 
 You don't have to enable the plugin for this to work, the test framework will do that for you.
@@ -80,7 +80,7 @@ The TPCxBB, TPCH, and Mortgage tests in this framework can be enabled by providi
 As an example, here is the `spark-submit` command with the TPCxBB parameters:
 
 ```
-$SPARK_HOME/bin/spark-submit --jars "rapids-4-spark_2.12-0.2.0-SNAPSHOT.jar,cudf-0.15.jar,rapids-4-spark-tests_2.12-0.2.0-SNAPSHOT.jar" ./runtests.py --tpcxbb_format="csv" --tpcxbb_path="/path/to/tpcxbb/csv"
+$SPARK_HOME/bin/spark-submit --jars "rapids-4-spark_2.12-0.3.0-SNAPSHOT.jar,cudf-0.16-SNAPSHOT.jar,rapids-4-spark-tests_2.12-0.3.0-SNAPSHOT.jar" ./runtests.py --tpcxbb_format="csv" --tpcxbb_path="/path/to/tpcxbb/csv"
 ```
 
 ## Writing tests

diff --git a/integration_tests/pom.xml b/integration_tests/pom.xml
@@ -172,6 +172,7 @@
                         <exclude>src/test/resources/**</exclude>
                         <exclude>**/*.md</exclude>
                         <exclude>.pytest_cache/**</exclude>
+                        <exclude>rmm_log.txt</exclude>
                     </excludes>
                 </configuration>
             </plugin>

diff --git a/jenkins/Jenkinsfile.databricksnightly b/jenkins/Jenkinsfile.databricksnightly
@@ -46,7 +46,7 @@ pipeline {
         string(name: 'DATABRICKS_VERSION',
                 defaultValue: '0.3.0-SNAPSHOT', description: 'Version to set')
         string(name: 'CUDF_VERSION',
-                defaultValue: '0.15', description: 'Cudf version to use')
+                defaultValue: '0.16-SNAPSHOT', description: 'Cudf version to use')
         string(name: 'CUDA_VERSION',
                 defaultValue: 'cuda10-1', description: 'cuda version to use')
         string(name: 'CLUSTER_ID',

diff --git a/jenkins/databricks/run-tests.py b/jenkins/databricks/run-tests.py
@@ -49,7 +49,7 @@ def main():
   db_version = '0.1-databricks-SNAPSHOT'
   scala_version = '2.12'
   spark_version = '3.0.0'
-  cudf_version = '0.15'
+  cudf_version = '0.16-SNAPSHOT'
   cuda_version = 'cuda10-1'
   ci_cudf_jar = 'cudf-0.14-cuda10-1.jar'
   base_spark_pom_version = '3.0.0'

diff --git a/jenkins/spark-tests.sh b/jenkins/spark-tests.sh
@@ -73,7 +73,7 @@ MORTGAGE_SPARK_SUBMIT_ARGS=" --conf spark.plugins=com.nvidia.spark.SQLPlugin \
 # need to disable pooling for udf test to prevent cudaErrorMemoryAllocation
 CUDF_UDF_TEST_ARGS="--conf spark.rapids.python.memory.gpu.pooling.enabled=false \
     --conf spark.rapids.memory.gpu.pooling.enabled=false \
-    --conf spark.executorEnv.PYTHONPATH=rapids-4-spark_2.12-0.2.0-SNAPSHOT.jar \
+    --conf spark.executorEnv.PYTHONPATH=rapids-4-spark_2.12-0.3.0-SNAPSHOT.jar \
     --py-files ${RAPIDS_PLUGIN_JAR}"
 
 TEST_PARAMS="$SPARK_VER $PARQUET_PERF $PARQUET_ACQ $OUTPUT"

diff --git a/jenkins/version-def.sh b/jenkins/version-def.sh
@@ -26,7 +26,7 @@ for VAR in $OVERWRITE_PARAMS;do
 done
 IFS=$PRE_IFS
 
-CUDF_VER=${CUDF_VER:-"0.15"}
+CUDF_VER=${CUDF_VER:-"0.16-SNAPSHOT"}
 CUDA_CLASSIFIER=${CUDA_CLASSIFIER:-"cuda10-1"}
 PROJECT_VER=${PROJECT_VER:-"0.3.0-SNAPSHOT"}
 SPARK_VER=${SPARK_VER:-"3.0.0"}

diff --git a/pom.xml b/pom.xml
@@ -149,7 +149,7 @@
         <maven.compiler.target>1.8</maven.compiler.target>
         <spark.version>3.0.0</spark.version>
         <cuda.version>cuda10-1</cuda.version>
-        <cudf.version>0.15</cudf.version>
+        <cudf.version>0.16-SNAPSHOT</cudf.version>
         <scala.binary.version>2.12</scala.binary.version>
         <scala.version>2.12.8</scala.version>
         <orc.version>1.5.8</orc.version>

diff --git a/sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala b/sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala
@@ -729,7 +729,7 @@ object RapidsConf {
         |On startup use: `--conf [conf key]=[conf value]`. For example:
         |
         |```
-        |${SPARK_HOME}/bin/spark --jars 'rapids-4-spark_2.12-0.2.0.jar,cudf-0.15-cuda10-1.jar' \
+        |${SPARK_HOME}/bin/spark --jars 'rapids-4-spark_2.12-0.3.0-SNAPSHOT.jar,cudf-0.16-SNAPSHOT-cuda10-1.jar' \
         |--conf spark.plugins=com.nvidia.spark.SQLPlugin \
         |--conf spark.rapids.sql.incompatibleOps.enabled=true
         |```

diff --git a/tests/pom.xml b/tests/pom.xml
@@ -134,6 +134,7 @@
                 <configuration>
                     <excludes>
                         <exclude>src/test/resources/**</exclude>
+                        <exclude>rmm_log.txt</exclude>
                     </excludes>
                 </configuration>
             </plugin>