Merge pull request NVIDIA#1495 from pxLi/fix-merge-conflict

Fix merge conflict, merge branch 0.3 to branch 0.4 [skip ci]
nartal1 · Jan 12, 2021 · d870a58 · d870a58
2 parents 935a1c4 + acca222
commit d870a58
Show file tree

Hide file tree

Showing 16 changed files with 168 additions and 93 deletions.
diff --git a/README.md b/README.md
@@ -37,6 +37,11 @@ The plugin has a set of Spark configs that control its behavior and are document
 We use github issues to track bugs, feature requests, and to try and answer questions. You
 may file one [here](https://github.com/NVIDIA/spark-rapids/issues/new/choose).
 
+## Download
+
+The jar files for the most recent release can be retrieved from the [download](docs/download.md)
+page. 
+
 ## Build
 
 There are two types of branches in this repository:

diff --git a/docs/FAQ.md b/docs/FAQ.md
@@ -1,7 +1,7 @@
 ---
 layout: page
 title: Frequently Asked Questions
-nav_order: 8
+nav_order: 9
 ---
 # Frequently Asked Questions
 
@@ -37,13 +37,13 @@ with most cloud service providers to set up testing and validation on their dist
 
 ### What is the right hardware setup to run GPU accelerated Spark?
 
-Reference architectures should be available around Q4 2020.
+Reference architectures should be available around Q1 2021.
 
 ### What CUDA versions are supported?
 
 CUDA 10.1, 10.2 and 11.0 are currently supported, but you need to download the cudf jar that 
-corresponds to the version you are using. Please look [here][version/stable-release.md] for download 
-links for the stable release.
+corresponds to the version you are using. Please look [here][download.md] for download 
+links for the latest release.
 
 ### What parts of Apache Spark are accelerated?
 

diff --git a/docs/benchmarks.md b/docs/benchmarks.md
@@ -1,3 +1,8 @@
+---
+layout: page
+title: Benchmarks
+nav_exclude: true
+---
 # Benchmarks
 
 The `integration_test` module contains benchmarks derived from the 

diff --git a/docs/compatibility.md b/docs/compatibility.md
@@ -1,7 +1,7 @@
 ---
 layout: page
 title: Compatibility
-nav_order: 4
+nav_order: 5
 ---
 
 # RAPIDS Accelerator for Apache Spark Compatibility with Apache Spark

diff --git a/docs/configs.md b/docs/configs.md
@@ -1,7 +1,7 @@
 ---
 layout: page
 title: Configuration
-nav_order: 3
+nav_order: 4
 ---
 <!-- Generated by RapidsConf.help. DO NOT EDIT! -->
 # RAPIDS Accelerator for Apache Spark Configuration
@@ -68,7 +68,7 @@ Name | Description | Default Value
 <a name="sql.format.parquet.multiThreadedRead.maxNumFilesParallel"></a>spark.rapids.sql.format.parquet.multiThreadedRead.maxNumFilesParallel|A limit on the maximum number of files per task processed in parallel on the CPU side before the file is sent to the GPU. This affects the amount of host memory used when reading the files in parallel. Used with MULTITHREADED reader, see spark.rapids.sql.format.parquet.reader.type|2147483647
 <a name="sql.format.parquet.multiThreadedRead.numThreads"></a>spark.rapids.sql.format.parquet.multiThreadedRead.numThreads|The maximum number of threads, on the executor, to use for reading small parquet files in parallel. This can not be changed at runtime after the executor has started. Used with COALESCING and MULTITHREADED reader, see spark.rapids.sql.format.parquet.reader.type.|20
 <a name="sql.format.parquet.read.enabled"></a>spark.rapids.sql.format.parquet.read.enabled|When set to false disables parquet input acceleration|true
-<a name="sql.format.parquet.reader.type"></a>spark.rapids.sql.format.parquet.reader.type|Sets the parquet reader type. We support different types that are optimized for different environments. The original Spark style reader can be selected by setting this to PERFILE which individually reads and copies files to the GPU. Loading many small files individually has high overhead, and using either COALESCING or MULTITHREADED is recommended instead. The COALESCING reader is good when using a local file system where the executors are on the same nodes or close to the nodes the data is being read on. This reader coalesces all the files assigned to a task into a single host buffer before sending it down to the GPU. It copies blocks from a single file into a host buffer in separate threads in parallel, see spark.rapids.sql.format.parquet.multiThreadedRead.numThreads. MULTITHREADED is good for cloud environments where you are reading from a blobstore that is totally separate and likely has a higher I/O read cost. Many times the cloud environments also get better throughput when you have multiple readers in parallel. This reader uses multiple threads to read each file in parallel and each file is sent to the GPU separately. This allows the CPU to keep reading while GPU is also doing work. See spark.rapids.sql.format.parquet.multiThreadedRead.numThreads and spark.rapids.sql.format.parquet.multiThreadedRead.maxNumFilesParallel to control the number of threads and amount of memory used. By default this is set to AUTO so we select the reader we think is best. This will either be the COALESCING or the MULTITHREADED based on whether we think the file is in the cloud. See spark.rapids.cloudSchemes.|AUTO
+<a name="sql.format.parquet.reader.type"></a>spark.rapids.sql.format.parquet.reader.type|Sets the parquet reader type. We support different types that are optimized for different environments. The original Spark style reader can be selected by setting this to PERFILE which individually reads and copies files to the GPU. Loading many small files individually has high overhead, and using either COALESCING or MULTITHREADED is recommended instead. The COALESCING reader is good when using a local file system where the executors are on the same nodes or close to the nodes the data is being read on. This reader coalesces all the files assigned to a task into a single host buffer before sending it down to the GPU. It copies blocks from a single file into a host buffer in separate threads in parallel, see spark.rapids.sql.format.parquet.multiThreadedRead.numThreads. MULTITHREADED is good for cloud environments where you are reading from a blobstore that is totally separate and likely has a higher I/O read cost. Many times the cloud environments also get better throughput when you have multiple readers in parallel. This reader uses multiple threads to read each file in parallel and each file is sent to the GPU separately. This allows the CPU to keep reading while GPU is also doing work. See spark.rapids.sql.format.parquet.multiThreadedRead.numThreads and spark.rapids.sql.format.parquet.multiThreadedRead.maxNumFilesParallel to control the number of threads and amount of memory used. This can be set to AUTO to select the reader we think is best. This will either be the COALESCING or the MULTITHREADED based on whether we think the file is in the cloud. See spark.rapids.cloudSchemes. The default is currently set to MULTITHREADED because the COALESCING reader does not handle partitioned data efficiently. If you aren't using partitioned data in a non cloud environment, the COALESCING reader would be a good choice.|MULTITHREADED
 <a name="sql.format.parquet.write.enabled"></a>spark.rapids.sql.format.parquet.write.enabled|When set to false disables parquet output acceleration|true
 <a name="sql.hasNans"></a>spark.rapids.sql.hasNans|Config to indicate if your data has NaN's. Cudf doesn't currently support NaN's properly so you can get corrupt data if you have NaN's in your data and it runs on the GPU.|true
 <a name="sql.hashOptimizeSort.enabled"></a>spark.rapids.sql.hashOptimizeSort.enabled|Whether sorts should be inserted after some hashed operations to improve output ordering. This can improve output file sizes when saving to columnar formats.|false

diff --git a/docs/dev/README.md b/docs/dev/README.md
@@ -1,7 +1,7 @@
 ---
 layout: page
 title: Developer Overview
-nav_order: 9
+nav_order: 10
 has_children: true
 permalink: /developer-overview/
 ---

diff --git a/docs/dev/idea-code-style-settings.md b/docs/dev/idea-code-style-settings.md
@@ -1,7 +1,7 @@
 ---
 layout: page
 title: IDEA Code Style Settings
-nav_order: 3
+nav_order: 4
 parent: Developer Overview
 ---
 ```xml

diff --git a/docs/dev/testing.md b/docs/dev/testing.md
@@ -0,0 +1,10 @@
+---
+layout: page
+title: Testing
+nav_order: 1
+parent: Developer Overview
+---
+An overview of testing can be found within the repository at:
+* [Unit tests](https://github.com/NVIDIA/spark-rapids/tree/branch-0.3/tests)
+* [Integration testing](https://github.com/NVIDIA/spark-rapids/tree/branch-0.3/integration_tests)
+* [Benchmarks](../benchmarks.md)
diff --git a/docs/download.md b/docs/download.md
@@ -0,0 +1,122 @@
+---
+layout: page
+title: Download
+nav_order: 3
+---
+
+## Release v0.3.0
+This release includes additional performance improvements, including
+* Use of per thread default stream to make more efficient use of the GPU
+* Further supporting Spark's adaptive query execution, with more rewritten query plans now able to
+run on the GPU 
+* Performance improvements for reading small Parquet files
+* RAPIDS Shuffle with UCX updated to UCX 1.9.0
+
+New functionality for the release includes
+* Parquet reading for lists and structs, 
+* Lead/lag for windows, and
+* Greatest/least operators 
+
+The release is supported on Apache Spark 3.0.0, 3.0.1, Databricks 7.3 ML LTS and Google Cloud
+Platform Dataproc 2.0.
+
+The list of all supported operations is provided [here](supported_ops.md).
+
+For a detailed list of changes, please refer to the
+[CHANGELOG](https://github.com/NVIDIA/spark-rapids/blob/main/CHANGELOG.md). 
+
+Hardware Requirements: 
+
+	GPU Architecture: NVIDIA Pascal™ or better (Tested on V100, T4 and A100 GPU)
+
+Software Requirements:
+
+	OS: Ubuntu 16.04, Ubuntu 18.04 or CentOS 7
+
+	CUDA & Nvidia Drivers: 10.1.2 & v418.87+, 10.2 & v440.33+ or 11.0 & v450.36+
+
+	Apache Spark 3.0, 3.0.1, Databricks 7.3 ML LTS Runtime, or GCP Dataproc 2.0 
+
+	Apache Hadoop 2.10+ or 3.1.1+ (3.1.1 for nvidia-docker version 2)
+
+	Python 3.6+, Scala 2.12, Java 8 
+
+### Download v0.3.0
+* [RAPIDS Spark Package](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/0.3.0/rapids-4-spark_2.12-0.3.0.jar)
+* [cuDF 11.0 Package](https://repo1.maven.org/maven2/ai/rapids/cudf/0.17/cudf-0.17-cuda11.jar)
+* [cuDF 10.2 Package](https://repo1.maven.org/maven2/ai/rapids/cudf/0.17/cudf-0.17-cuda10-2.jar)
+* [cuDF 10.1 Package](https://repo1.maven.org/maven2/ai/rapids/cudf/0.17/cudf-0.17-cuda10-1.jar)
+
+## Release v0.2.0
+This is the second release of the RAPIDS Accelerator for Apache Spark.  Adaptive Query Execution
+[SPARK-31412](https://issues.apache.org/jira/browse/SPARK-31412) is a new enhancement that was
+included in Spark 3.0 that alters the physical execution plan dynamically to improve the performance
+of the query.  The RAPIDS Accelerator v0.2 introduces Adaptive Query Execution (AQE) for GPUs and
+leverages columnar processing [SPARK-32332](https://issues.apache.org/jira/browse/SPARK-32332)
+starting from Spark 3.0.1.
+
+Another enhancement in v0.2 is improvement in reading small Parquet files.  This feature takes into
+account the scenario where input data can be stored across many small files.  By leveraging multiple
+CPU threads v0.2 delivers up to 6x performance improvement over the previous release for small
+Parquet file reads.
+
+The RAPIDS Accelerator introduces a beta feature that accelerates [Spark shuffle for
+GPUs](get-started/getting-started-on-prem.md#enabling-rapidsshufflemanager).  Accelerated
+shuffle makes use of high bandwidth transfers between GPUs (NVLink or p2p over PCIe) and leverages
+RDMA (RoCE or Infiniband) for remote transfers. 
+
+The list of all supported operations is provided
+[here](configs.md#supported-gpu-operators-and-fine-tuning).
+
+For a detailed list of changes, please refer to the
+[CHANGELOG](https://github.com/NVIDIA/spark-rapids/blob/main/CHANGELOG.md). 
+
+Hardware Requirements: 
+
+	GPU Architecture: NVIDIA Pascal™ or better (Tested on V100, T4 and A100 GPU)
+
+Software Requirements:
+
+	OS: Ubuntu 16.04, Ubuntu 18.04 or CentOS 7
+
+	CUDA & Nvidia Drivers: 10.1.2 & v418.87+, 10.2 & v440.33+ or 11.0 & v450.36+
+
+	Apache Spark 3.0, 3.0.1
+
+	Apache Hadoop 2.10+ or 3.1.1+ (3.1.1 for nvidia-docker version 2)
+
+	Python 3.x, Scala 2.12, Java 8 
+
+### Download v0.2.0
+* [RAPIDS Spark Package](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/0.2.0/rapids-4-spark_2.12-0.2.0.jar)
+* [cuDF 11.0 Package](https://repo1.maven.org/maven2/ai/rapids/cudf/0.15/cudf-0.15-cuda11.jar)
+* [cuDF 10.2 Package](https://repo1.maven.org/maven2/ai/rapids/cudf/0.15/cudf-0.15-cuda10-2.jar)
+* [cuDF 10.1 Package](https://repo1.maven.org/maven2/ai/rapids/cudf/0.15/cudf-0.15-cuda10-1.jar)
+
+## Release v0.1.0
+
+Hardware Requirements: 
+
+    GPU Architecture: NVIDIA Pascal™ or better (Tested on V100 and T4 GPU)
+
+Software Requirements: 
+
+	OS: Ubuntu 16.04, Ubuntu 18.04 or CentOS 7
+    (RHEL 7 support is provided through CentOS 7 builds/installs)
+
+    CUDA & NVIDIA Drivers: 10.1.2 & v418.87+ or 10.2 & v440.33+
+
+    Apache Spark 3.0
+
+    Apache Hadoop 2.10+ or 3.1.1+ (3.1.1 for nvidia-docker version 2)
+
+    Python 3.x, Scala 2.12, Java 8 
+
+
+### Download v0.1.0
+* [RAPIDS Spark Package](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/0.1.0/rapids-4-spark_2.12-0.1.0.jar)
+* [cuDF 10.2 Package](https://repo1.maven.org/maven2/ai/rapids/cudf/0.14/cudf-0.14-cuda10-2.jar)
+* [cuDF 10.1 Package](https://repo1.maven.org/maven2/ai/rapids/cudf/0.14/cudf-0.14-cuda10-1.jar)
+
+
+
diff --git a/docs/examples.md b/docs/examples.md
@@ -1,7 +1,7 @@
 ---
 layout: page
 title: Demos
-nav_order: 4
+nav_order: 11
 ---
 # Demos
 

diff --git a/docs/get-started/getting-started-on-prem.md b/docs/get-started/getting-started-on-prem.md
@@ -38,13 +38,13 @@ to read the deployment method sections before doing any installations.
 ## Install Spark
 To install Apache Spark please follow the official 
 [instructions](https://spark.apache.org/docs/latest/#launching-on-a-cluster). Supported versions of
-Spark are listed on the [stable release](../version/stable-release.md) page.  Please note that only
+Spark are listed on the [download](../download.md) page.  Please note that only
 scala version 2.12 is currently supported by the accelerator. 
 
 ## Download the RAPIDS jars
 The [accelerator](https://mvnrepository.com/artifact/com.nvidia/rapids-4-spark_2.12) and 
 [cudf](https://mvnrepository.com/artifact/ai.rapids/cudf) jars are available in the 
-[download](../version/stable-release.md) section.
+[download](../download.md) section.
 
 Download the RAPIDS Accelerator for Apache Spark plugin jar. Then download the version of the cudf
 jar that your version of the accelerator depends on. Each cudf jar is for a specific version of

diff --git a/docs/ml-integration.md b/docs/ml-integration.md
@@ -1,7 +1,7 @@
 ---
 layout: page
 title: ML Integration
-nav_order: 7
+nav_order: 8
 ---
 # RAPIDS Accelerator for Apache Spark ML Library Integration
 

diff --git a/docs/supported_ops.md b/docs/supported_ops.md
@@ -1,13 +1,13 @@
 ---
 layout: page
-title: Operators
-nav_order: 4
+title: Supported Operators
+nav_order: 6
 ---
 <!-- Generated by SupportedOpsDocs.help. DO NOT EDIT! -->
 Apache Spark supports processing various types of data. Not all expressions
 support all data types. The RAPIDS Accelerator for Apache Spark has further
 restrictions on what types are supported for processing. This tries
-to document what operations are supported and what data types they support.
+to document what operations are supported and what data types each operation supports.
 Because Apache Spark is under active development too and this document was generated
 against version 3.0.0 of Spark. Most of this should still
 apply to other versions of Spark, but there may be slight changes.
@@ -28,7 +28,7 @@ The RAPIDS Accelerator only supports UTC as the time zone for timestamps.
 
 ## `CalendarInterval`
 In Spark `CalendarInterval`s store three values, months, days, and microseconds.
-Support for this types is still very limited in the accelerator. In some cases
+Support for this type is still very limited in the accelerator. In some cases
 only a a subset of the type is supported, like window ranges only support days currently.
 
 ## Configuration
@@ -44,6 +44,7 @@ the reasons why this particular operator or expression is on the CPU or GPU.
 
 # Key
 ## Types
+
 |Type Name|Type Description|
 |---------|----------------|
 |BOOLEAN|Holds true or false values.|
@@ -53,19 +54,20 @@ the reasons why this particular operator or expression is on the CPU or GPU.
 |LONG|Signed 64-bit integer value.|
 |FLOAT|32-bit floating point value.|
 |DOUBLE|64-bit floating point value.|
-|DATE|A date with no time component. Stored as 32-bit integer with days since Jan 1, 1970|
+|DATE|A date with no time component. Stored as 32-bit integer with days since Jan 1, 1970.|
 |TIMESTAMP|A date and time. Stored as 64-bit integer with microseconds since Jan 1, 1970 in the current time zone.|
 |STRING|A text string. Stored as UTF-8 encoded bytes.|
 |DECIMAL|A fixed point decimal value with configurable precision and scale.|
 |NULL|Only stores null values and is typically only used when no other type can be determined from the SQL.|
-|BINARY|An array of non-nullable bytes|
+|BINARY|An array of non-nullable bytes.|
 |CALENDAR|Represents a period of time.  Stored as months, days and microseconds.|
 |ARRAY|A sequence of elements.|
 |MAP|A set of key value pairs, the keys cannot be null.|
 |STRUCT|A series of named fields.|
-|UDT|User defined types and java Objects. These are not standard SQL types|
+|UDT|User defined types and java Objects. These are not standard SQL types.|
 
 ## Support
+
 |Value|Description|
 |---------|----------------|
 |S| (Supported) Both Apache Spark and the RAPIDS Accelerator support this type.|
@@ -75,7 +77,7 @@ the reasons why this particular operator or expression is on the CPU or GPU.
 |_PS*_| (Partial Support with limitations) Like regular Partial Support but with general limitations on `Timestamp` or `Decimal` types.|
 |**NS**| (Not Supported) Apache Spark supports this type but the RAPIDS Accelerator does not.
 
-# `SparkPlan` or Executor Nodes
+# SparkPlan or Executor Nodes
 Apache Spark uses a Directed Acyclic Graph(DAG) of processing to build a query.
 The nodes in this graph are instances of `SparkPlan` and represent various high
 level operations like doing a filter or project. The operations that the RAPIDS
@@ -726,12 +728,12 @@ Accelerator supports are described below.
 <td><b>NS</b></td>
 </tr>
 </table>
-* as was state previously Decimal is only supported up to a precision of 
+* as was stated previously Decimal is only supported up to a precision of 
 18 and Timestamp is only supported in the 
 UTC time zone. Decimals are off by default due to performance impact in
 some cases.
 
-# `Expression` and SQL Functions
+# Expression and SQL Functions
 Inside each node in the DAG there can be one or more trees of expressions
 that describe various types of processing that happens in that part of the plan.
 These can be things like adding two numbers together or checking for null.

diff --git a/docs/tuning-guide.md b/docs/tuning-guide.md
@@ -1,7 +1,7 @@
 ---
 layout: page
 title: Tuning
-nav_order: 5
+nav_order: 7
 ---
 # RAPIDS Accelerator for Apache Spark Tuning Guide
 Tuning a Spark job's configuration settings from the defaults can often improve job performance,