NVIDIA · viadea · Jun 3, 2022 · May 25, 2022 · May 25, 2022 · May 25, 2022
diff --git a/docs/FAQ.md b/docs/FAQ.md
@@ -307,7 +307,9 @@ Yes
 
 ### Are the R APIs for Spark supported?
 
-Yes, but we don't actively test them.
+Yes, but we don't actively test them. It is because the RAPIDS Accelerator hooks into Spark not at 
-Yes, but we don't actively test them. It is because the RAPIDS Accelerator hooks into Spark not at 
+Yes, but we don't actively test them, because the RAPIDS Accelerator hooks into Spark not at 
-Yes, but we don't actively test them. It is because the RAPIDS Accelerator hooks into Spark not at 
+Yes, but we don't actively test them. It is because the RAPIDS Accelerator hooks into Spark not at 
-Yes, but we don't actively test them. It is because the RAPIDS Accelerator hooks into Spark not at 
+Yes, but we don't actively test them, because the RAPIDS Accelerator hooks into Spark not at 
-Yes, but we don't actively test them. It is because the RAPIDS Accelerator hooks into Spark not at 
+Yes, but we don't actively test them. It is because the RAPIDS Accelerator hooks into Spark not at 
-Yes, but we don't actively test them. It is because the RAPIDS Accelerator hooks into Spark not at 
+Yes, but we don't actively test them, because the RAPIDS Accelerator hooks into Spark not at 
-Yes, but we don't actively test them. It is because the RAPIDS Accelerator hooks into Spark not at 
+Yes, but we don't actively test them. It is because the RAPIDS Accelerator hooks into Spark not at 
-Yes, but we don't actively test them. It is because the RAPIDS Accelerator hooks into Spark not at 
+Yes, but we don't actively test them, because the RAPIDS Accelerator hooks into Spark not at 
-Yes, but we don't actively test them. It is because the RAPIDS Accelerator hooks into Spark not at 
+Yes, but we don't actively test them. It is because the RAPIDS Accelerator hooks into Spark not at 
+the various language APIs but at the Catalyst level after all the various APIs have converged into 
+the DataFrame API.
 
 ### Are the Java APIs for Spark supported?
 
@@ -410,6 +412,14 @@ The Scala UDF byte-code analyzer is disabled by default and must be enabled by t
 [`spark.rapids.sql.udfCompiler.enabled`](configs.md#sql.udfCompiler.enabled) configuration
 setting.
 
+#### Optimize a row-based UDF in a GPU operation
+
+If the UDF can not be implemented by RAPIDS Accelerated UDFs or be automatically translated to
+Apache Spark operations, the RAPIDS Accelerator has an experimental feature to transfer only the
+data it needs between GPU and CPU inside a query operation, instead of falling this operation back 
+to CPU. This feature can be enabled by setting `spark.rapids.sql.rowBasedUDF.enabled` to true.
+
+
 ### Why is the size of my output Parquet/ORC file different?
 
 This can come down to a number of factors.  The GPU version often compresses data in smaller chunks

diff --git a/docs/download.md b/docs/download.md
@@ -18,6 +18,65 @@ cuDF jar, that is either preinstalled in the Spark classpath on all nodes or sub
 that uses the RAPIDS Accelerator For Apache Spark. See the [getting-started
 guide](https://nvidia.github.io/spark-rapids/Getting-Started/) for more details.
 
+## Release v22.06.0
+Hardware Requirements:
+
+The plugin is tested on the following architectures:
+
+	GPU Models: NVIDIA V100, T4 and A2/A10/A30/A100 GPUs
+
+Software Requirements:
+
+	OS: Ubuntu 18.04, Ubuntu 20.04 or CentOS 7, CentOS 8
+
+	CUDA & NVIDIA Drivers*: 11.x & v450.80.02+
+
+	Apache Spark 3.1.1, 3.1.2, 3.1.3, 3.2.0, 3.2.1, Databricks 9.1 ML LTS or 10.4 ML LTS Runtime and GCP Dataproc 2.0
+
+	Python 3.6+, Scala 2.12, Java 8
+
+*Some hardware may have a minimum driver version greater than v450.80.02+.  Check the GPU spec sheet
+for your hardware's minimum driver version.
+
+*For Cloudera and EMR support, please refer to the
+[Distributions](./FAQ.md#which-distributions-are-supported) section of the FAQ.
+
+### Download v22.06.0
+* Download the [RAPIDS
+  Accelerator for Apache Spark 22.06.0 jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.06.0/rapids-4-spark_2.12-22.06.0.jar)
+
+This package is built against CUDA 11.5 and has [CUDA forward
+compatibility](https://docs.nvidia.com/deploy/cuda-compatibility/index.html) enabled.  It is tested
+on V100, T4, A2, A10, A30 and A100 GPUs with CUDA 11.0-11.5.  For those using other types of GPUs which
+do not have CUDA forward compatibility (for example, GeForce), CUDA 11.5 is required. Users will
+need to ensure the minimum driver (450.80.02) and CUDA toolkit are installed on each Spark node.
+
+### Verify signature
+* Download the [RAPIDS Accelerator for Apache Spark 22.06.0 jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.06.0/rapids-4-spark_2.12-22.06.0.jar)
+  and [RAPIDS Accelerator for Apache Spark 22.06.0 jars.asc](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.06.0/rapids-4-spark_2.12-22.06.0.jar.asc)
+* Download the [PUB_KEY](https://keys.openpgp.org/search?q=sw-spark@nvidia.com).
+* Import the public key: `gpg --import PUB_KEY`
+* Verify the signature: `gpg --verify rapids-4-spark_2.12-22.06.0.jar.asc rapids-4-spark_2.12-22.06.0.jar`
+
+The output if signature verify:
+
+	gpg: Good signature from "NVIDIA Spark (For the signature of spark-rapids release jars) <sw-spark@nvidia.com>"
+
+### Release Notes
+New functionality and performance improvements for this release include:
+* Combined cuDF jar and rapids-4-spark jar to a single rapids-4-spark jar
+* Add UI for Qualification tool
+* Support function map_filter
+* Support spark.sql.mapKeyDedupPolicy=LAST_WIN for function transform_keys
+* Enable MIG with YARN on Dataproc 2.0
+* Enable CSV raed by default
+* Enable regular expression by default
+* Enable some float related configurations by default
+* Changed to ASYNC allocator from ARENA by default
+
+For a detailed list of changes, please refer to the
+[CHANGELOG](https://github.com/NVIDIA/spark-rapids/blob/main/CHANGELOG.md).
+
 ## Release v22.04.0
 Hardware Requirements:
 

diff --git a/docs/get-started/getting-started-databricks.md b/docs/get-started/getting-started-databricks.md
@@ -156,7 +156,7 @@ cluster.
     ```bash
     spark.rapids.sql.python.gpu.enabled true
     spark.python.daemon.module rapids.daemon_databricks
-    spark.executorEnv.PYTHONPATH /databricks/jars/rapids-4-spark_2.12-22.04.0.jar:/databricks/spark/python
+    spark.executorEnv.PYTHONPATH /databricks/jars/rapids-4-spark_2.12-22.06.0.jar:/databricks/spark/python
     ```
 
 7. Once you’ve added the Spark config, click “Confirm and Restart”.

diff --git a/docs/get-started/getting-started-kubernetes.md b/docs/get-started/getting-started-kubernetes.md
@@ -17,6 +17,10 @@ Kubernetes requires a Docker image to run Spark.  Generally everything needed is
 image - Spark, the RAPIDS Accelerator for Spark jars, and the discovery script.  See this
 [Dockerfile.cuda](Dockerfile.cuda) example.
 
+You can find other supported base CUDA images for from 
+[CUDA dockerhub](https://hub.docker.com/r/nvidia/cuda). Its source Dockerfile is inside
+[GitLab repoistory](https://gitlab.com/nvidia/container-images/cuda/) which can be used to build
+the docker images from OS base image from scratch.
 
 ## Prerequisites
     * Kubernetes cluster is up and running with NVIDIA GPU support

diff --git a/docs/get-started/gpu_dataproc_packages_ubuntu_sample.sh b/docs/get-started/gpu_dataproc_packages_ubuntu_sample.sh
@@ -139,14 +139,12 @@ EOF
   systemctl start dataproc-cgroup-device-permissions
 }
 
-readonly DEFAULT_SPARK_RAPIDS_VERSION="22.04.0"
+readonly DEFAULT_SPARK_RAPIDS_VERSION="22.06.0"
 readonly DEFAULT_CUDA_VERSION="11.0"
-readonly DEFAULT_CUDF_VERSION="22.04.0"
 readonly DEFAULT_XGBOOST_VERSION="1.4.2"
 readonly DEFAULT_XGBOOST_GPU_SUB_VERSION="0.3.0"
 readonly SPARK_VERSION="3.0"
 
-readonly CUDF_VERSION=${DEFAULT_CUDF_VERSION}
 # SPARK config
 readonly SPARK_RAPIDS_VERSION=${DEFAULT_SPARK_RAPIDS_VERSION}
 readonly XGBOOST_VERSION=${DEFAULT_XGBOOST_VERSION}
@@ -174,9 +172,6 @@ function install_spark_rapids() {
   wget -nv --timeout=30 --tries=5 --retry-connrefused \
     "${nvidia_repo_url}/rapids-4-spark_2.12/${SPARK_RAPIDS_VERSION}/rapids-4-spark_2.12-${SPARK_RAPIDS_VERSION}.jar" \
     -P /usr/lib/spark/jars/
-  wget -nv --timeout=30 --tries=5 --retry-connrefused \
-    "${rapids_repo_url}/cudf/${CUDF_VERSION}/cudf-${CUDF_VERSION}-cuda${cudf_cuda_version}.jar" \
-    -P /usr/lib/spark/jars/
 }
 
 function configure_spark() {

diff --git a/docs/spark-profiling-tool.md b/docs/spark-profiling-tool.md
@@ -31,7 +31,7 @@ more information.
 The Profiling tool requires the Spark 3.x jars to be able to run but do not need an Apache Spark run time. 
 If you do not already have Spark 3.x installed, 
 you can download the Spark distribution to any machine and include the jars in the classpath.
-- Download the jar file from [Maven repository](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark-tools_2.12/22.04.0/)
+- Download the jar file from [Maven repository](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark-tools_2.12/22.06.0/)
 - [Download Apache Spark 3.x](http://spark.apache.org/downloads.html) - Spark 3.1.1 for Apache Hadoop is recommended
 If you want to compile the jars, please refer to the instructions [here](./spark-qualification-tool.md#How-to-compile-the-tools-jar). 
 

diff --git a/docs/spark-qualification-tool.md b/docs/spark-qualification-tool.md
@@ -3,7 +3,6 @@ layout: page
 title: Qualification Tool
 nav_order: 8
 ---
-
 # Qualification Tool
 
 The Qualification tool analyzes Spark events generated from CPU based Spark applications to determine 
@@ -41,7 +40,7 @@ more information.
 The Qualification tool require the Spark 3.x jars to be able to run but do not need an Apache Spark run time. 
 If you do not already have Spark 3.x installed, you can download the Spark distribution to 
 any machine and include the jars in the classpath.
-- Download the jar file from [Maven repository](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark-tools_2.12/22.04.0/)
+- Download the jar file from [Maven repository](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark-tools_2.12/22.06.0/)
 - [Download Apache Spark 3.x](http://spark.apache.org/downloads.html) - Spark 3.1.1 for Apache Hadoop is recommended
 
 ### Step 2 Run the Qualification tool
@@ -236,7 +235,7 @@ below for the description of output fields.
 - Java 8 or above, Spark 3.0.1+ 
 
 ### Download the tools jar
-- Download the jar file from [Maven repository](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark-tools_2.12/22.04.0/)
+- Download the jar file from [Maven repository](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark-tools_2.12/22.06.0/)
 
 ### Modify your application code to call the api's
 

diff --git a/docs/tuning-guide.md b/docs/tuning-guide.md
@@ -194,6 +194,14 @@ rather than megabytes or smaller.
 Note that the GPU can encode Parquet and ORC data much faster than the CPU, so the costs of
 writing large files can be significantly lower.
 
+## Input Files' column order
+When there are a large number of columns for file formats like Parquet and ORC the size of the 
+contiguous data for each individual column can be very small. This can result in doing lots of very 
+small random reads to the file system to read the data for the subset of columns that are needed.
+
+We would suggest reorder the columns needed by the queries and then rewrite the files to make those
+columns adjacent. This could help both Spark on CPU and GPU.
+
 ## Input Partition Size
 
 Similar to the discussion on [input file size](#input-files), many queries can benefit from using