NVIDIA · sameerz · Sep 11, 2020 · Sep 2, 2020 · Sep 2, 2020 · Sep 2, 2020
diff --git a/docs/FAQ.md b/docs/FAQ.md
@@ -19,10 +19,10 @@ shows stale results.
 
 ### What versions of Apache Spark does the RAPIDS Accelerator for Apache Spark support?
 
-The RAPIDS Accelerator for Apache Spark requires version 3.0.0 of Apache Spark. Because the plugin
-replaces parts of the physical plan that Apache Spark considers to be internal the code for those
-plans can change even between bug fix releases. As a part of our process, we try to stay on top of
-these changes and release updates as quickly as possible.
+The RAPIDS Accelerator for Apache Spark requires version 3.0.0 or 3.0.1 of Apache Spark. Because the
+plugin replaces parts of the physical plan that Apache Spark considers to be internal the code for
+those plans can change even between bug fix releases. As a part of our process, we try to stay on
+top of these changes and release updates as quickly as possible.
 
 ### Which distributions are supported?
 
@@ -41,9 +41,9 @@ Reference architectures should be available around Q4 2020.
 
 ### What CUDA versions are supported?
 
-CUDA 10.1 and 10.2 are currently supported, but you need to download the cudf jar that corresponds
-to the version you are using. Please look [here][version/stable-release.md] for download links
-for the stable release.
+CUDA 10.1, 10.2 and 11.0 are currently supported, but you need to download the cudf jar that 
+corresponds to the version you are using. Please look [here][version/stable-release.md] for download 
+links for the stable release.
 
 ### What parts of Apache Spark are accelerated?
 

diff --git a/docs/configs.md b/docs/configs.md
@@ -10,7 +10,7 @@ The following is the list of options that `rapids-plugin-4-spark` supports.
 On startup use: `--conf [conf key]=[conf value]`. For example:
 
 ```
-${SPARK_HOME}/bin/spark --jars 'rapids-4-spark_2.12-0.2.0-SNAPSHOT.jar,cudf-0.15-cuda10-1.jar' \
+${SPARK_HOME}/bin/spark --jars 'rapids-4-spark_2.12-0.2.0.jar,cudf-0.15-cuda10-1.jar' \
 --conf spark.plugins=com.nvidia.spark.SQLPlugin \
 --conf spark.rapids.sql.incompatibleOps.enabled=true
 ```

diff --git a/docs/demo/Databricks/generate-init-script.ipynb b/docs/demo/Databricks/generate-init-script.ipynb
@@ -1 +1 @@
-{"cells":[{"cell_type":"code","source":["dbutils.fs.mkdirs(\"dbfs:/databricks/init_scripts/\")\n \ndbutils.fs.put(\"/databricks/init_scripts/init.sh\",\"\"\"\n#!/bin/bash\nsudo wget -O /databricks/jars/rapids-4-spark_2.12-0.1.0-databricks.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/0.1.0-databricks/rapids-4-spark_2.12-0.1.0-databricks.jar\nsudo wget -O /databricks/jars/cudf-0.14-cuda10-1.jar https://repo1.maven.org/maven2/ai/rapids/cudf/0.14/cudf-0.14-cuda10-1.jar\"\"\", True)"],"metadata":{},"outputs":[],"execution_count":1},{"cell_type":"code","source":["%sh\ncd ../../dbfs/databricks/init_scripts\npwd\nls -ltr\ncat init.sh"],"metadata":{},"outputs":[],"execution_count":2},{"cell_type":"code","source":[""],"metadata":{},"outputs":[],"execution_count":3}],"metadata":{"name":"generate-init-script","notebookId":2645746662301564},"nbformat":4,"nbformat_minor":0}
+{"cells":[{"cell_type":"code","source":["dbutils.fs.mkdirs(\"dbfs:/databricks/init_scripts/\")\n \ndbutils.fs.put(\"/databricks/init_scripts/init.sh\",\"\"\"\n#!/bin/bash\nsudo wget -O /databricks/jars/rapids-4-spark_2.12-0.2.0-databricks.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/0.2.0-databricks/rapids-4-spark_2.12-0.2.0-databricks.jar\nsudo wget -O /databricks/jars/cudf-0.15-cuda10-1.jar https://repo1.maven.org/maven2/ai/rapids/cudf/0.15/cudf-0.15-cuda10-1.jar\"\"\", True)"],"metadata":{},"outputs":[],"execution_count":1},{"cell_type":"code","source":["%sh\ncd ../../dbfs/databricks/init_scripts\npwd\nls -ltr\ncat init.sh"],"metadata":{},"outputs":[],"execution_count":2},{"cell_type":"code","source":[""],"metadata":{},"outputs":[],"execution_count":3}],"metadata":{"name":"generate-init-script","notebookId":2645746662301564},"nbformat":4,"nbformat_minor":0}
diff --git a/docs/get-started/Dockerfile.cuda b/docs/get-started/Dockerfile.cuda
@@ -0,0 +1,88 @@
+#
+# Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+FROM nvidia/cuda:10.1-devel-ubuntu18.04 
+ARG spark_uid=185
+
+# Install java dependencies 
+RUN apt-get update && apt-get install -y --no-install-recommends openjdk-8-jdk openjdk-8-jre
+ENV JAVA_HOME /usr/lib/jvm/java-1.8.0-openjdk-amd64
+ENV PATH $PATH:/usr/lib/jvm/java-1.8.0-openjdk-amd64/jre/bin:/usr/lib/jvm/java-1.8.0-openjdk-amd64/bin
+
+# Before building the docker image, first either download Apache Spark 3.0+ from 
+# http://spark.apache.org/downloads.html or build and make a Spark distribution following
+# the instructions in http://spark.apache.org/docs/3.0.1/building-spark.html (3.0.0 can 
+# be used as well).  
+# If this docker file is being used in the context of building your images from a Spark
+# distribution, the docker build command should be invoked from the top level directory
+# of the Spark distribution. E.g.:
+# docker build -t spark:3.0.1 -f kubernetes/dockerfiles/spark/Dockerfile .
+
+RUN set -ex && \
+    ln -s /lib /lib64 && \
+    mkdir -p /opt/spark && \
+    mkdir -p /opt/spark/jars && \
+    mkdir -p /opt/tpch && \
+    mkdir -p /opt/spark/examples && \
+    mkdir -p /opt/spark/work-dir && \
+    mkdir -p /opt/sparkRapidsPlugin && \
+    touch /opt/spark/RELEASE && \
+    rm /bin/sh && \
+    ln -sv /bin/bash /bin/sh && \
+    echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \
+    chgrp root /etc/passwd && chmod ug+rw /etc/passwd
+
+COPY spark-3.0.1-bin-hadoop3.2/jars /opt/spark/jars
+COPY spark-3.0.1-bin-hadoop3.2/bin /opt/spark/bin
+COPY spark-3.0.1-bin-hadoop3.2/sbin /opt/spark/sbin
+COPY spark-3.0.1-bin-hadoop3.2/kubernetes/dockerfiles/spark/entrypoint.sh /opt/
+COPY spark-3.0.1-bin-hadoop3.2/examples /opt/spark/examples
+COPY spark-3.0.1-bin-hadoop3.2/kubernetes/tests /opt/spark/tests
+COPY spark-3.0.1-bin-hadoop3.2/data /opt/spark/data
+
+COPY cudf-0.15-cuda10-1.jar /opt/sparkRapidsPlugin
+COPY rapids-4-spark_2.12-0.2.0.jar /opt/sparkRapidsPlugin
+COPY getGpusResources.sh /opt/sparkRapidsPlugin
+
+RUN mkdir /opt/spark/python
+# TODO: Investigate running both pip and pip3 via virtualenvs
+RUN apt-get update && \
+    apt install -y python python-pip && \
+    apt install -y python3 python3-pip && \
+    # We remove ensurepip since it adds no functionality since pip is
+    # installed on the image and it just takes up 1.6MB on the image
+    rm -r /usr/lib/python*/ensurepip && \
+    pip install --upgrade pip setuptools && \
+    # You may install with python3 packages by using pip3.6
+    # Removed the .cache to save space
+    rm -r /root/.cache && rm -rf /var/cache/apt/*
+
+COPY spark-3.0.1-bin-hadoop3.2/python/pyspark /opt/spark/python/pyspark
+COPY spark-3.0.1-bin-hadoop3.2/python/lib /opt/spark/python/lib
+
+ENV SPARK_HOME /opt/spark
+
+WORKDIR /opt/spark/work-dir
+RUN chmod g+w /opt/spark/work-dir
+
+ENV TINI_VERSION v0.18.0
+ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini /usr/bin/tini
+RUN chmod +rx /usr/bin/tini
+
+ENTRYPOINT [ "/opt/entrypoint.sh" ]
+
+# Specify the User that the actual main process will run as
+USER ${spark_uid}
diff --git a/docs/get-started/getting-started-gcp.md b/docs/get-started/getting-started-gcp.md
@@ -66,7 +66,7 @@ To use notebooks with a Dataproc cluster, click on the cluster name under the Da
 
 ![Dataproc Web Interfaces](../img/dataproc-service.png)
 
-The notebook will first transcode CSV files into Parquet files and then run an ETL query to prepare the dataset for training.  In the sample notebook, we use 2016 data as the evaluation set and the rest as a training set, saving to respective GCS locations.  Using the default notebook configuration the first stage should take ~110 seconds (1/3 of CPU execution time with same config) and the second stage takes ~170 seconds (1/7 of CPU execution time with same config).  The notebook depends on the pre-compiled [Spark RAPIDS SQL plugin](https://mvnrepository.com/artifact/com.nvidia/rapids-4-spark-parent) and [cuDF](https://mvnrepository.com/artifact/ai.rapids/cudf/0.14), which are pre-downloaded by the GCP Dataproc [RAPIDS init script]().
+The notebook will first transcode CSV files into Parquet files and then run an ETL query to prepare the dataset for training.  In the sample notebook, we use 2016 data as the evaluation set and the rest as a training set, saving to respective GCS locations.  Using the default notebook configuration the first stage should take ~110 seconds (1/3 of CPU execution time with same config) and the second stage takes ~170 seconds (1/7 of CPU execution time with same config).  The notebook depends on the pre-compiled [Spark RAPIDS SQL plugin](https://mvnrepository.com/artifact/com.nvidia/rapids-4-spark) and [cuDF](https://mvnrepository.com/artifact/ai.rapids/cudf/0.15), which are pre-downloaded by the GCP Dataproc [RAPIDS init script](https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/rapids).
 
 Once data is prepared, we use the [Mortgage XGBoost4j Scala Notebook](../demo/GCP/mortgage-xgboost4j-gpu-scala.zpln) in Dataproc's Zeppelin service to execute the training job on the GPU.  NVIDIA also ships [Spark XGBoost4j](https://github.com/NVIDIA/spark-xgboost) which is based on [DMLC xgboost](https://github.com/dmlc/xgboost).  Precompiled [XGBoost4j](https://repo1.maven.org/maven2/com/nvidia/xgboost4j_3.0/) and [XGBoost4j Spark](https://repo1.maven.org/maven2/com/nvidia/xgboost4j-spark_3.0/1.0.0-0.1.0/) libraries can be downloaded from maven.  They are pre-downloaded by the GCP [RAPIDS init action](https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/rapids).  Since github cannot render a Zeppelin notebook, we prepared a [Jupyter Notebook with Scala code](../demo/GCP/mortgage-xgboost4j-gpu-scala.ipynb) for you to view the code content. 
 

diff --git a/docs/get-started/getting-started-on-prem.md b/docs/get-started/getting-started-on-prem.md
@@ -37,7 +37,8 @@ to read the deployment method sections before doing any installations.
 
 ## Install Spark
 To install Apache Spark please follow the official 
-[instructions](https://spark.apache.org/docs/latest/#launching-on-a-cluster). Please note that only
+[instructions](https://spark.apache.org/docs/latest/#launching-on-a-cluster). Supported versions of
+Spark are listed on the [stable release](stable-release.md) page.  Please note that only
 scala version 2.12 is currently supported by the accelerator. 
 
 ## Download the RAPIDS jars
@@ -51,18 +52,19 @@ CUDA and will not run on other versions. The jars use a maven classifier to keep
 
 - CUDA 10.1 => classifier cuda10-1
 - CUDA 10.2 => classifier cuda10-2
-- CUDA 11.0 => classifier cuda11-0
+- CUDA 11.0 => classifier cuda11
 
 For example, here is a sample version of the jars and cudf with CUDA 10.1 support:
 - cudf-0.15-cuda10-1.jar
-- rapids-4-spark_2.12-0.1.0.jar
+- rapids-4-spark_2.12-0.2.0.jar
+
 
 For simplicity export the location to these jars. This example assumes the sample jars above have
 been placed in the `/opt/sparkRapidsPlugin` directory:
 ```shell 
 export SPARK_RAPIDS_DIR=/opt/sparkRapidsPlugin
 export SPARK_CUDF_JAR=${SPARK_RAPIDS_DIR}/cudf-0.15-cuda10-1.jar
-export SPARK_RAPIDS_PLUGIN_JAR=${SPARK_RAPIDS_DIR}/rapids-4-spark_2.12-0.2.0-SNAPSHOT.jar
+export SPARK_RAPIDS_PLUGIN_JAR=${SPARK_RAPIDS_DIR}/rapids-4-spark_2.12-0.2.0.jar
 ```
 
 ## Install the GPU Discovery Script
@@ -289,10 +291,13 @@ $SPARK_HOME/bin/spark-shell \
 ```  
 
 ## Running on Kubernetes
-Kubernetes requires a Docker image to run Spark. Generally you put everything you need in
-that Docker image - Spark, the RAPIDS Accelerator for Spark jars, and the discovery script.
-Alternatively they would need to be on a drive that is mounted when your Spark application runs.
-Here we will assume you have created a Docker image that contains all of them.
+Kubernetes requires a Docker image to run Spark.  Generally everything needed is in the Docker
+image - Spark, the RAPIDS Accelerator for Spark jars, and the discovery script.  See this
+[Dockerfile.cuda](Dockerfile.cuda) example.
+
+Alternatively the jars and discovery script would need to be on a drive that is mounted when your
+Spark application runs.  Here we will assume you have created a Docker image that contains the
+RAPIDS jars, cudf jars and discovery script.
 
 This assumes you have Kubernetes already installed and setup.  These instructions do not cover how
 to setup a Kubernetes cluster.
@@ -302,8 +307,9 @@ to setup a Kubernetes cluster.
   [GPU discovery script](#install-the-gpu-discovery-script) on the node from which you are
   going to build your Docker image.  Note that you can download these into a local directory and
   untar the Spark `.tar.gz` rather than installing into a location on the machine.
+- Include the RAPIDS Accelerator for Spark jars in the Spark /jars directory
 - Download the sample
-  [Dockerfile.cuda](https://drive.google.com/open?id=1ah7I1DQEB4Wqz5t2KK2UsctGrxDwWpeJ) or create
+  [Dockerfile.cuda](Dockerfile.cuda) or create
   your own.
 - Update the Dockerfile with the filenames for Spark and the RAPIDS Accelerator for Spark jars
   that you downloaded.  Include anything else application-specific that you need.

diff --git a/docs/testing.md b/docs/testing.md
@@ -20,7 +20,7 @@ we typically run with the default options and only increase the scale factor dep
 dbgen -b dists.dss -s 10
 ```
 
-You can include the test jar `rapids-4-spark-integration-tests_2.12-0.2.0-SNAPSHOT.jar` with the
+You can include the test jar `rapids-4-spark-integration-tests_2.12-0.2.0.jar` with the
 Spark --jars option to get the TPCH tests. To setup for the queries you can run 
 `TpchLikeSpark.setupAllCSV` for CSV formatted data or `TpchLikeSpark.setupAllParquet`
 for parquet formatted data.  Both of those take the Spark session, and a path to the dbgen
@@ -83,7 +83,7 @@ individually, so you don't risk running unit tests along with the integration te
 http://www.scalatest.org/user_guide/using_the_scalatest_shell
 
 ```shell 
-spark-shell --jars rapids-4-spark-tests_2.12-0.2.0-SNAPSHOT-tests.jar,rapids-4-spark-integration-tests_2.12-0.2.0-SNAPSHOT-tests.jar,scalatest_2.12-3.0.5.jar,scalactic_2.12-3.0.5.jar
+spark-shell --jars rapids-4-spark-tests_2.12-0.2.0-tests.jar,rapids-4-spark-integration-tests_2.12-0.2.0-tests.jar,scalatest_2.12-3.0.5.jar,scalactic_2.12-3.0.5.jar
 ```
 
 First you import the `scalatest_shell` and tell the tests where they can find the test files you

diff --git a/docs/version/stable-release.md b/docs/version/stable-release.md
@@ -5,9 +5,35 @@ nav_order: 1
 parent: Version
 ---
 
+## Stable Release - v0.2.0
+This is the second public release of the RAPIDS Accelerator for Apache Spark. 
+The list of supported operations is provided [here](../configs.md#supported-gpu-operators-and-fine-tuning)
+
+Hardware Requirements: 
+
+	GPU Architecture: NVIDIA Pascal™ or better (Tested on V100, T4 and A100 GPU)
+
+Software Requirements:
+
+	OS: Ubuntu 16.04 & gcc 5.4 OR Ubuntu 18.04/CentOS 7 & gcc 7.3
+
+	CUDA & Nvidia Drivers: 10.1.2 & v418.87+, 10.2 & v440.33+ or 11.0 & v450.36+
+
+	Apache Spark 3.0, 3.0.1
+
+	Apache Hadoop 2.10+ or 3.1.1+ (3.1.1 for nvidia-docker version 2)
+
+	Python 3.x, Scala 2.12, Java 8 
+
+## Download - v0.2.0
+* [RAPIDS Spark Package](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/0.1.0/rapids-4-spark_2.12-0.2.0.jar)
+* [cuDF 11.0 Package](https://repo1.maven.org/maven2/ai/rapids/cudf/0.15/cudf-0.15-cuda11-0.jar)
+* [cuDF 10.2 Package](https://repo1.maven.org/maven2/ai/rapids/cudf/0.15/cudf-0.15-cuda10-2.jar)
+* [cuDF 10.1 Package](https://repo1.maven.org/maven2/ai/rapids/cudf/0.15/cudf-0.15-cuda10-1.jar)
+
 ## Stable Release - v0.1.0
 This is the first public release of the RAPIDS Accelerator for Apache Spark. 
-The list of supported operations is provided [here](../configs.html#supported-gpu-operators-and-fine-tuning)
+The list of supported operations is provided [here](../configs.md#supported-gpu-operators-and-fine-tuning)
 
 Hardware Requirements: 
 
@@ -27,7 +53,7 @@ Software Requirements:
     Python 3.x, Scala 2.12, Java 8 
 
 
-## Download
+## Download v0.1
 * [RAPIDS Spark Package](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/0.1.0/rapids-4-spark_2.12-0.1.0.jar)
 * [cuDF 10.2 Package](https://repo1.maven.org/maven2/ai/rapids/cudf/0.14/cudf-0.14-cuda10-2.jar)
 * [cuDF 10.1 Package](https://repo1.maven.org/maven2/ai/rapids/cudf/0.14/cudf-0.14-cuda10-1.jar)

diff --git a/sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala b/sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala
@@ -719,7 +719,7 @@ object RapidsConf {
         |On startup use: `--conf [conf key]=[conf value]`. For example:
         |
         |```
-        |${SPARK_HOME}/bin/spark --jars 'rapids-4-spark_2.12-0.2.0-SNAPSHOT.jar,cudf-0.15-cuda10-1.jar' \
+        |${SPARK_HOME}/bin/spark --jars 'rapids-4-spark_2.12-0.2.0.jar,cudf-0.15-cuda10-1.jar' \
         |--conf spark.plugins=com.nvidia.spark.SQLPlugin \
         |--conf spark.rapids.sql.incompatibleOps.enabled=true
         |```