Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update cudf version to 0.17 #1387

Merged
merged 1 commit into from
Dec 14, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ The following is the list of options that `rapids-plugin-4-spark` supports.
On startup use: `--conf [conf key]=[conf value]`. For example:

```
${SPARK_HOME}/bin/spark --jars 'rapids-4-spark_2.12-0.3.0-SNAPSHOT.jar,cudf-0.17-SNAPSHOT-cuda10-1.jar' \
${SPARK_HOME}/bin/spark --jars 'rapids-4-spark_2.12-0.3.0-SNAPSHOT.jar,cudf-0.17-cuda10-1.jar' \
--conf spark.plugins=com.nvidia.spark.SQLPlugin \
--conf spark.rapids.sql.incompatibleOps.enabled=true
```
Expand Down
2 changes: 1 addition & 1 deletion docs/get-started/Dockerfile.cuda
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ COPY spark-3.0.1-bin-hadoop3.2/examples /opt/spark/examples
COPY spark-3.0.1-bin-hadoop3.2/kubernetes/tests /opt/spark/tests
COPY spark-3.0.1-bin-hadoop3.2/data /opt/spark/data

COPY cudf-0.17-SNAPSHOT-cuda10-1.jar /opt/sparkRapidsPlugin
COPY cudf-0.17-cuda10-1.jar /opt/sparkRapidsPlugin
COPY rapids-4-spark_2.12-0.3.0-SNAPSHOT.jar /opt/sparkRapidsPlugin
COPY getGpusResources.sh /opt/sparkRapidsPlugin

Expand Down
4 changes: 2 additions & 2 deletions docs/get-started/getting-started-on-prem.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,15 +55,15 @@ CUDA and will not run on other versions. The jars use a maven classifier to keep
- CUDA 11.0 => classifier cuda11

For example, here is a sample version of the jars and cudf with CUDA 10.1 support:
- cudf-0.17-SNAPSHOT-cuda10-1.jar
- cudf-0.17-cuda10-1.jar
- rapids-4-spark_2.12-0.3.0-SNAPSHOT.jar


For simplicity export the location to these jars. This example assumes the sample jars above have
been placed in the `/opt/sparkRapidsPlugin` directory:
```shell
export SPARK_RAPIDS_DIR=/opt/sparkRapidsPlugin
export SPARK_CUDF_JAR=${SPARK_RAPIDS_DIR}/cudf-0.17-SNAPSHOT-cuda10-1.jar
export SPARK_CUDF_JAR=${SPARK_RAPIDS_DIR}/cudf-0.17-cuda10-1.jar
export SPARK_RAPIDS_PLUGIN_JAR=${SPARK_RAPIDS_DIR}/rapids-4-spark_2.12-0.3.0-SNAPSHOT.jar
```

Expand Down
14 changes: 8 additions & 6 deletions integration_tests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,10 +125,12 @@ durations.run(new com.nvidia.spark.rapids.JoinsSuite)
```

Most clusters probably will not have the RAPIDS plugin installed in the cluster yet.
If you just want to verify the SQL replacement is working you will need to add the `rapids-4-spark` and `cudf` jars to your `spark-submit` command.
If you just want to verify the SQL replacement is working you will need to add the
`rapids-4-spark` and `cudf` jars to your `spark-submit` command. Note the following
example assumes CUDA 10.1 is being used.

```
$SPARK_HOME/bin/spark-submit --jars "rapids-4-spark_2.12-0.3.0-SNAPSHOT.jar,cudf-0.17-SNAPSHOT.jar" ./runtests.py
$SPARK_HOME/bin/spark-submit --jars "rapids-4-spark_2.12-0.3.0-SNAPSHOT.jar,cudf-0.17-cuda10-1.jar" ./runtests.py
```

You don't have to enable the plugin for this to work, the test framework will do that for you.
Expand Down Expand Up @@ -177,10 +179,10 @@ The TPCxBB, TPCH, TPCDS, and Mortgage tests in this framework can be enabled by
* TPCDS `tpcds-format` (optional, defaults to "parquet"), and `tpcds-path` (required, path to the TPCDS data).
* Mortgage `mortgage-format` (optional, defaults to "parquet"), and `mortgage-path` (required, path to the Mortgage data).

As an example, here is the `spark-submit` command with the TPCxBB parameters:
As an example, here is the `spark-submit` command with the TPCxBB parameters on CUDA 10.1:

```
$SPARK_HOME/bin/spark-submit --jars "rapids-4-spark_2.12-0.3.0-SNAPSHOT.jar,cudf-0.17-SNAPSHOT.jar,rapids-4-spark-tests_2.12-0.3.0-SNAPSHOT.jar" ./runtests.py --tpcxbb_format="csv" --tpcxbb_path="/path/to/tpcxbb/csv"
$SPARK_HOME/bin/spark-submit --jars "rapids-4-spark_2.12-0.3.0-SNAPSHOT.jar,cudf-0.17-cuda10-1.jar,rapids-4-spark-tests_2.12-0.3.0-SNAPSHOT.jar" ./runtests.py --tpcxbb_format="csv" --tpcxbb_path="/path/to/tpcxbb/csv"
```

Be aware that running these tests with read data requires at least an entire GPU, and preferable several GPUs/executors
Expand All @@ -206,10 +208,10 @@ To run cudf_udf tests, need following configuration changes:
* Decrease `spark.rapids.memory.gpu.allocFraction` to reserve enough GPU memory for Python processes in case of out-of-memory.
* Add `spark.rapids.python.concurrentPythonWorkers` and `spark.rapids.python.memory.gpu.allocFraction` to reserve enough GPU memory for Python processes in case of out-of-memory.

As an example, here is the `spark-submit` command with the cudf_udf parameter:
As an example, here is the `spark-submit` command with the cudf_udf parameter on CUDA 10.1:

```
$SPARK_HOME/bin/spark-submit --jars "rapids-4-spark_2.12-0.3.0-SNAPSHOT.jar,cudf-0.17-SNAPSHOT.jar,rapids-4-spark-tests_2.12-0.3.0-SNAPSHOT.jar" --conf spark.rapids.memory.gpu.allocFraction=0.3 --conf spark.rapids.python.memory.gpu.allocFraction=0.3 --conf spark.rapids.python.concurrentPythonWorkers=2 --py-files "rapids-4-spark_2.12-0.3.0-SNAPSHOT.jar" --conf spark.executorEnv.PYTHONPATH="rapids-4-spark_2.12-0.2.0-SNAPSHOT.jar" ./runtests.py --cudf_udf
$SPARK_HOME/bin/spark-submit --jars "rapids-4-spark_2.12-0.3.0-SNAPSHOT.jar,cudf-0.17-cuda10-1.jar,rapids-4-spark-tests_2.12-0.3.0-SNAPSHOT.jar" --conf spark.rapids.memory.gpu.allocFraction=0.3 --conf spark.rapids.python.memory.gpu.allocFraction=0.3 --conf spark.rapids.python.concurrentPythonWorkers=2 --py-files "rapids-4-spark_2.12-0.3.0-SNAPSHOT.jar" --conf spark.executorEnv.PYTHONPATH="rapids-4-spark_2.12-0.2.0-SNAPSHOT.jar" ./runtests.py --cudf_udf
```

## Writing tests
Expand Down
2 changes: 1 addition & 1 deletion jenkins/version-def.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ for VAR in $OVERWRITE_PARAMS;do
done
IFS=$PRE_IFS

CUDF_VER=${CUDF_VER:-"0.17-SNAPSHOT"}
CUDF_VER=${CUDF_VER:-"0.17"}
CUDA_CLASSIFIER=${CUDA_CLASSIFIER:-"cuda10-1"}
PROJECT_VER=${PROJECT_VER:-"0.3.0-SNAPSHOT"}
SPARK_VER=${SPARK_VER:-"3.0.0"}
Expand Down
2 changes: 1 addition & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@
<maven.compiler.target>1.8</maven.compiler.target>
<spark.version>3.0.0</spark.version>
<cuda.version>cuda10-1</cuda.version>
<cudf.version>0.17-SNAPSHOT</cudf.version>
<cudf.version>0.17</cudf.version>
<scala.binary.version>2.12</scala.binary.version>
<scala.version>2.12.8</scala.version>
<orc.version>1.5.8</orc.version>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -824,7 +824,7 @@ object RapidsConf {
|On startup use: `--conf [conf key]=[conf value]`. For example:
|
|```
|${SPARK_HOME}/bin/spark --jars 'rapids-4-spark_2.12-0.3.0-SNAPSHOT.jar,cudf-0.17-SNAPSHOT-cuda10-1.jar' \
|${SPARK_HOME}/bin/spark --jars 'rapids-4-spark_2.12-0.3.0-SNAPSHOT.jar,cudf-0.17-cuda10-1.jar' \
|--conf spark.plugins=com.nvidia.spark.SQLPlugin \
|--conf spark.rapids.sql.incompatibleOps.enabled=true
|```
Expand Down