Skip to content

Commit

Permalink
Update cudf version to 0.17 (#1387)
Browse files Browse the repository at this point in the history
Signed-off-by: Jason Lowe <jlowe@nvidia.com>
  • Loading branch information
jlowe authored Dec 14, 2020
1 parent 703b643 commit dc6fbcf
Show file tree
Hide file tree
Showing 7 changed files with 15 additions and 13 deletions.
2 changes: 1 addition & 1 deletion docs/configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ The following is the list of options that `rapids-plugin-4-spark` supports.
On startup use: `--conf [conf key]=[conf value]`. For example:

```
${SPARK_HOME}/bin/spark --jars 'rapids-4-spark_2.12-0.3.0-SNAPSHOT.jar,cudf-0.17-SNAPSHOT-cuda10-1.jar' \
${SPARK_HOME}/bin/spark --jars 'rapids-4-spark_2.12-0.3.0-SNAPSHOT.jar,cudf-0.17-cuda10-1.jar' \
--conf spark.plugins=com.nvidia.spark.SQLPlugin \
--conf spark.rapids.sql.incompatibleOps.enabled=true
```
Expand Down
2 changes: 1 addition & 1 deletion docs/get-started/Dockerfile.cuda
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ COPY spark-3.0.1-bin-hadoop3.2/examples /opt/spark/examples
COPY spark-3.0.1-bin-hadoop3.2/kubernetes/tests /opt/spark/tests
COPY spark-3.0.1-bin-hadoop3.2/data /opt/spark/data

COPY cudf-0.17-SNAPSHOT-cuda10-1.jar /opt/sparkRapidsPlugin
COPY cudf-0.17-cuda10-1.jar /opt/sparkRapidsPlugin
COPY rapids-4-spark_2.12-0.3.0-SNAPSHOT.jar /opt/sparkRapidsPlugin
COPY getGpusResources.sh /opt/sparkRapidsPlugin

Expand Down
4 changes: 2 additions & 2 deletions docs/get-started/getting-started-on-prem.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,15 +55,15 @@ CUDA and will not run on other versions. The jars use a maven classifier to keep
- CUDA 11.0 => classifier cuda11

For example, here is a sample version of the jars and cudf with CUDA 10.1 support:
- cudf-0.17-SNAPSHOT-cuda10-1.jar
- cudf-0.17-cuda10-1.jar
- rapids-4-spark_2.12-0.3.0-SNAPSHOT.jar


For simplicity export the location to these jars. This example assumes the sample jars above have
been placed in the `/opt/sparkRapidsPlugin` directory:
```shell
export SPARK_RAPIDS_DIR=/opt/sparkRapidsPlugin
export SPARK_CUDF_JAR=${SPARK_RAPIDS_DIR}/cudf-0.17-SNAPSHOT-cuda10-1.jar
export SPARK_CUDF_JAR=${SPARK_RAPIDS_DIR}/cudf-0.17-cuda10-1.jar
export SPARK_RAPIDS_PLUGIN_JAR=${SPARK_RAPIDS_DIR}/rapids-4-spark_2.12-0.3.0-SNAPSHOT.jar
```

Expand Down
14 changes: 8 additions & 6 deletions integration_tests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,10 +125,12 @@ durations.run(new com.nvidia.spark.rapids.JoinsSuite)
```

Most clusters probably will not have the RAPIDS plugin installed in the cluster yet.
If you just want to verify the SQL replacement is working you will need to add the `rapids-4-spark` and `cudf` jars to your `spark-submit` command.
If you just want to verify the SQL replacement is working you will need to add the
`rapids-4-spark` and `cudf` jars to your `spark-submit` command. Note the following
example assumes CUDA 10.1 is being used.

```
$SPARK_HOME/bin/spark-submit --jars "rapids-4-spark_2.12-0.3.0-SNAPSHOT.jar,cudf-0.17-SNAPSHOT.jar" ./runtests.py
$SPARK_HOME/bin/spark-submit --jars "rapids-4-spark_2.12-0.3.0-SNAPSHOT.jar,cudf-0.17-cuda10-1.jar" ./runtests.py
```

You don't have to enable the plugin for this to work, the test framework will do that for you.
Expand Down Expand Up @@ -177,10 +179,10 @@ The TPCxBB, TPCH, TPCDS, and Mortgage tests in this framework can be enabled by
* TPCDS `tpcds-format` (optional, defaults to "parquet"), and `tpcds-path` (required, path to the TPCDS data).
* Mortgage `mortgage-format` (optional, defaults to "parquet"), and `mortgage-path` (required, path to the Mortgage data).

As an example, here is the `spark-submit` command with the TPCxBB parameters:
As an example, here is the `spark-submit` command with the TPCxBB parameters on CUDA 10.1:

```
$SPARK_HOME/bin/spark-submit --jars "rapids-4-spark_2.12-0.3.0-SNAPSHOT.jar,cudf-0.17-SNAPSHOT.jar,rapids-4-spark-tests_2.12-0.3.0-SNAPSHOT.jar" ./runtests.py --tpcxbb_format="csv" --tpcxbb_path="/path/to/tpcxbb/csv"
$SPARK_HOME/bin/spark-submit --jars "rapids-4-spark_2.12-0.3.0-SNAPSHOT.jar,cudf-0.17-cuda10-1.jar,rapids-4-spark-tests_2.12-0.3.0-SNAPSHOT.jar" ./runtests.py --tpcxbb_format="csv" --tpcxbb_path="/path/to/tpcxbb/csv"
```

Be aware that running these tests with read data requires at least an entire GPU, and preferable several GPUs/executors
Expand All @@ -206,10 +208,10 @@ To run cudf_udf tests, need following configuration changes:
* Decrease `spark.rapids.memory.gpu.allocFraction` to reserve enough GPU memory for Python processes in case of out-of-memory.
* Add `spark.rapids.python.concurrentPythonWorkers` and `spark.rapids.python.memory.gpu.allocFraction` to reserve enough GPU memory for Python processes in case of out-of-memory.

As an example, here is the `spark-submit` command with the cudf_udf parameter:
As an example, here is the `spark-submit` command with the cudf_udf parameter on CUDA 10.1:

```
$SPARK_HOME/bin/spark-submit --jars "rapids-4-spark_2.12-0.3.0-SNAPSHOT.jar,cudf-0.17-SNAPSHOT.jar,rapids-4-spark-tests_2.12-0.3.0-SNAPSHOT.jar" --conf spark.rapids.memory.gpu.allocFraction=0.3 --conf spark.rapids.python.memory.gpu.allocFraction=0.3 --conf spark.rapids.python.concurrentPythonWorkers=2 --py-files "rapids-4-spark_2.12-0.3.0-SNAPSHOT.jar" --conf spark.executorEnv.PYTHONPATH="rapids-4-spark_2.12-0.2.0-SNAPSHOT.jar" ./runtests.py --cudf_udf
$SPARK_HOME/bin/spark-submit --jars "rapids-4-spark_2.12-0.3.0-SNAPSHOT.jar,cudf-0.17-cuda10-1.jar,rapids-4-spark-tests_2.12-0.3.0-SNAPSHOT.jar" --conf spark.rapids.memory.gpu.allocFraction=0.3 --conf spark.rapids.python.memory.gpu.allocFraction=0.3 --conf spark.rapids.python.concurrentPythonWorkers=2 --py-files "rapids-4-spark_2.12-0.3.0-SNAPSHOT.jar" --conf spark.executorEnv.PYTHONPATH="rapids-4-spark_2.12-0.2.0-SNAPSHOT.jar" ./runtests.py --cudf_udf
```

## Writing tests
Expand Down
2 changes: 1 addition & 1 deletion jenkins/version-def.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ for VAR in $OVERWRITE_PARAMS;do
done
IFS=$PRE_IFS

CUDF_VER=${CUDF_VER:-"0.17-SNAPSHOT"}
CUDF_VER=${CUDF_VER:-"0.17"}
CUDA_CLASSIFIER=${CUDA_CLASSIFIER:-"cuda10-1"}
PROJECT_VER=${PROJECT_VER:-"0.3.0-SNAPSHOT"}
SPARK_VER=${SPARK_VER:-"3.0.0"}
Expand Down
2 changes: 1 addition & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@
<maven.compiler.target>1.8</maven.compiler.target>
<spark.version>3.0.0</spark.version>
<cuda.version>cuda10-1</cuda.version>
<cudf.version>0.17-SNAPSHOT</cudf.version>
<cudf.version>0.17</cudf.version>
<scala.binary.version>2.12</scala.binary.version>
<scala.version>2.12.8</scala.version>
<orc.version>1.5.8</orc.version>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -824,7 +824,7 @@ object RapidsConf {
|On startup use: `--conf [conf key]=[conf value]`. For example:
|
|```
|${SPARK_HOME}/bin/spark --jars 'rapids-4-spark_2.12-0.3.0-SNAPSHOT.jar,cudf-0.17-SNAPSHOT-cuda10-1.jar' \
|${SPARK_HOME}/bin/spark --jars 'rapids-4-spark_2.12-0.3.0-SNAPSHOT.jar,cudf-0.17-cuda10-1.jar' \
|--conf spark.plugins=com.nvidia.spark.SQLPlugin \
|--conf spark.rapids.sql.incompatibleOps.enabled=true
|```
Expand Down

0 comments on commit dc6fbcf

Please sign in to comment.