Merge branch 'branch-22.02' into fix-merge

NVIDIA · Feb 11, 2022 · e8f44f1 · e8f44f1
2 parents c2ba7b3 + 7586051
commit e8f44f1
Show file tree

Hide file tree

Showing 9 changed files with 30 additions and 30 deletions.
diff --git a/docs/additional-functionality/rapids-udfs.md b/docs/additional-functionality/rapids-udfs.md
@@ -141,19 +141,19 @@ in the [udf-examples](../../udf-examples) project.
 
 - [URLDecode](../../udf-examples/src/main/scala/com/nvidia/spark/rapids/udf/scala/URLDecode.scala)
 decodes URL-encoded strings using the
-[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)
+[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy)
 - [URLEncode](../../udf-examples/src/main/scala/com/nvidia/spark/rapids/udf/scala/URLEncode.scala)
 URL-encodes strings using the
-[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)
+[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy)
 
 ### Spark Java UDF Examples
 
 - [URLDecode](../../udf-examples/src/main/java/com/nvidia/spark/rapids/udf/java/URLDecode.java)
 decodes URL-encoded strings using the
-[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)
+[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy)
 - [URLEncode](../../udf-examples/src/main/java/com/nvidia/spark/rapids/udf/java/URLEncode.java)
 URL-encodes strings using the
-[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)
+[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy)
 - [CosineSimilarity](../../udf-examples/src/main/java/com/nvidia/spark/rapids/udf/java/CosineSimilarity.java)
 computes the [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity)
 between two float vectors using [native code](../../udf-examples/src/main/cpp/src)
@@ -162,11 +162,11 @@ between two float vectors using [native code](../../udf-examples/src/main/cpp/sr
 
 - [URLDecode](../../udf-examples/src/main/java/com/nvidia/spark/rapids/udf/hive/URLDecode.java)
 implements a Hive simple UDF using the
-[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)
+[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy)
 to decode URL-encoded strings
 - [URLEncode](../../udf-examples/src/main/java/com/nvidia/spark/rapids/udf/hive/URLEncode.java)
 implements a Hive generic UDF using the
-[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)
+[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy)
 to URL-encode strings
 - [StringWordCount](../../udf-examples/src/main/java/com/nvidia/spark/rapids/udf/hive/StringWordCount.java)
 implements a Hive simple UDF using

diff --git a/docs/demo/AWS-EMR/Mortgage-ETL-GPU-EMR.ipynb b/docs/demo/AWS-EMR/Mortgage-ETL-GPU-EMR.ipynb
@@ -12,7 +12,7 @@
     "\n",
     "Dataset is derived from Fannie Mae’s [Single-Family Loan Performance Data](http://www.fanniemae.com/portal/funding-the-market/data/loan-performance-data.html) with all rights reserved by Fannie Mae. This processed dataset is redistributed with permission and consent from Fannie Mae. For the full raw dataset visit [Fannie Mae]() to register for an account and to download\n",
     "\n",
-    "Instruction is available at NVIDIA [RAPIDS demo site](https://rapidsai.github.io/demos/datasets/mortgage-data).\n",
+    "Instruction is available at NVIDIA [RAPIDS demo site](https://docs.rapids.ai/datasets/mortgage-data).\n",
     "\n",
     "## Prerequisite\n",
     "\n",

diff --git a/docs/demo/GCP/Mortgage-ETL-CPU.ipynb b/docs/demo/GCP/Mortgage-ETL-CPU.ipynb
@@ -8,7 +8,7 @@
     "\n",
     "Dataset is derived from Fannie Mae’s [Single-Family Loan Performance Data](http://www.fanniemae.com/portal/funding-the-market/data/loan-performance-data.html) with all rights reserved by Fannie Mae. This processed dataset is redistributed with permission and consent from Fannie Mae. For the full raw dataset visit [Fannie Mae]() to register for an account and to download\n",
     "\n",
-    "Instruction is available at NVIDIA [RAPIDS demo site](https://rapidsai.github.io/demos/datasets/mortgage-data).\n",
+    "Instruction is available at NVIDIA [RAPIDS demo site](https://docs.rapids.ai/datasets/mortgage-data).\n",
     "\n",
     "### Prerequisite\n",
     "\n",

diff --git a/docs/demo/GCP/Mortgage-ETL-GPU.ipynb b/docs/demo/GCP/Mortgage-ETL-GPU.ipynb
@@ -12,7 +12,7 @@
     "\n",
     "Dataset is derived from Fannie Mae’s [Single-Family Loan Performance Data](http://www.fanniemae.com/portal/funding-the-market/data/loan-performance-data.html) with all rights reserved by Fannie Mae. This processed dataset is redistributed with permission and consent from Fannie Mae. For the full raw dataset visit [Fannie Mae]() to register for an account and to download\n",
     "\n",
-    "Instruction is available at NVIDIA [RAPIDS demo site](https://rapidsai.github.io/demos/datasets/mortgage-data).\n",
+    "Instruction is available at NVIDIA [RAPIDS demo site](https://docs.rapids.ai/datasets/mortgage-data).\n",
     "\n",
     "### Prerequisite\n",
     "\n",

diff --git a/docs/download.md b/docs/download.md
@@ -619,8 +619,8 @@ account the scenario where input data can be stored across many small files.  By
 CPU threads v0.2 delivers up to 6x performance improvement over the previous release for small
 Parquet file reads.
 
-The RAPIDS Accelerator introduces a beta feature that accelerates [Spark shuffle for
-GPUs](get-started/getting-started-on-prem.md#enabling-rapidsshufflemanager).  Accelerated
+The RAPIDS Accelerator introduces a beta feature that accelerates 
+[Spark shuffle for GPUs](get-started/getting-started-on-prem.md#enabling-rapids-shuffle-manager).  Accelerated
 shuffle makes use of high bandwidth transfers between GPUs (NVLink or p2p over PCIe) and leverages
 RDMA (RoCE or Infiniband) for remote transfers. 
 

diff --git a/docs/get-started/getting-started-databricks.md b/docs/get-started/getting-started-databricks.md
@@ -26,12 +26,12 @@ The number of GPUs per node dictates the number of Spark executors that can run
 1. Adaptive query execution(AQE) and Delta optimization write do not work. These should be disabled
 when using the plugin. Queries may still see significant speedups even with AQE disabled.
 
-    ```bash 
-    spark.databricks.delta.optimizeWrite.enabled false
-    spark.sql.adaptive.enabled false
-    ```
+   ```bash 
+   spark.databricks.delta.optimizeWrite.enabled false
+   spark.sql.adaptive.enabled false
+   ```
 
-    See [issue-1059](https://github.com/NVIDIA/spark-rapids/issues/1059) for more detail. 
+   See [issue-1059](https://github.com/NVIDIA/spark-rapids/issues/1059) for more detail. 
 
 2. Dynamic partition pruning(DPP) does not work.  This results in poor performance for queries which
    would normally benefit from DPP.  See
@@ -42,10 +42,10 @@ when using the plugin. Queries may still see significant speedups even with AQE
 
 4. Cannot spin off multiple executors on a multi-GPU node. 
 
-	Even though it is possible to set `spark.executor.resource.gpu.amount=N` (where N is the number
-    of GPUs per node) in the in Spark Configuration tab, Databricks overrides this to
-    `spark.executor.resource.gpu.amount=1`.  This will result in failed executors when starting the
-    cluster.
+   Even though it is possible to set `spark.executor.resource.gpu.amount=1` in the in Spark 
+   Configuration tab, Databricks overrides this to `spark.executor.resource.gpu.amount=N` 
+   (where N is the number of GPUs per node). This will result in failed executors when starting the
+   cluster.
 
 5. Databricks makes changes to the runtime without notification.
 

diff --git a/docs/get-started/getting-started-gcp.md b/docs/get-started/getting-started-gcp.md
@@ -85,9 +85,9 @@ If you'd like to further accelerate init time to 4-5 minutes, create a custom Da
 ## Run PySpark or Scala Notebook on a Dataproc Cluster Accelerated by GPUs
 To use notebooks with a Dataproc cluster, click on the cluster name under the Dataproc cluster tab
 and navigate to the "Web Interfaces" tab.  Under "Web Interfaces", click on the JupyterLab or
-Jupyter link to start to use sample [Mortgage ETL on GPU Jupyter
-Notebook](../demo/GCP/Mortgage-ETL-GPU.ipynb) to process full 17 years [Mortgage
-data](https://rapidsai.github.io/demos/datasets/mortgage-data).
+Jupyter link to start to use sample 
+[Mortgage ETL on GPU Jupyter Notebook](../demo/GCP/Mortgage-ETL-GPU.ipynb) to process full 17 years 
+[Mortgage data](https://docs.rapids.ai/datasets/mortgage-data).
 
 ![Dataproc Web Interfaces](../img/GCP/dataproc-service.png)
 

diff --git a/docs/get-started/getting-started-workload-qualification.md b/docs/get-started/getting-started-workload-qualification.md
@@ -30,8 +30,8 @@ This article describes the tools we provide and how to do gap analysis and workl
 ### How to use
 
 If you have Spark event logs from prior runs of the applications on Spark 2.x or 3.x, you can use
-the [Qualification tool](../spark-qualification-tool.md) and [Profiling
-tool](../spark-profiling-tool.md) to analyze them.  The qualification tool outputs the score, rank
+the [Qualification tool](../spark-qualification-tool.md) and 
+[Profiling tool](../spark-profiling-tool.md) to analyze them.  The qualification tool outputs the score, rank
 and some of the potentially not-supported features for each Spark application.  For example, the CSV
 output can print `Unsupported Read File Formats and Types`, `Unsupported Write Data Format` and
 `Potential Problems` which are the indication of some not-supported features.  Its output can help
@@ -119,8 +119,8 @@ the driver logs with `spark.rapids.sql.explain=all`.
 
 This log can show you which operators (on what data type) can not run on GPU and the reason.
 If it shows a specific RAPIDS Accelerator parameter which can be turned on to enable that feature,
-you should first understand the risk and applicability of that parameter based on [configs
-doc](../configs.md) and then enable that parameter and try the tool again.
+you should first understand the risk and applicability of that parameter based on 
+[configs doc](../configs.md) and then enable that parameter and try the tool again.
 
 Since its output is directly based on specific version of `rapids-4-spark` jar, the gap analysis is
 pretty accurate.
@@ -213,8 +213,8 @@ which is the same as the driver logs with `spark.rapids.sql.explain=all`.
 
 This log can show you which operators (on what data type) can not run on GPU and the reason.
 If it shows a specific RAPIDS Accelerator parameter which can be turned on to enable that feature,
-you should first understand the risk and applicability of that parameter based on [configs
-doc](../configs.md) and then enable that parameter and try the tool again.
+you should first understand the risk and applicability of that parameter based on 
+[configs doc](../configs.md) and then enable that parameter and try the tool again.
 
 Since its output is directly based on specific version of `rapids-4-spark` jar, the gap analysis is
 pretty accurate.

diff --git a/docs/tuning-guide.md b/docs/tuning-guide.md
@@ -337,7 +337,7 @@ Custom Spark SQL Metrics are available which can help identify performance bottl
 
 Not all metrics are enabled by default. The configuration setting `spark.rapids.sql.metrics.level` can be set
 to `DEBUG`, `MODERATE`, or `ESSENTIAL`, with `MODERATE` being the default value. More information about this
-configuration option is available in the <a href="configs.md#sql.metrics.level">configuration</a> documentation.
+configuration option is available in the [configuration documentation](configs.md#sql.metrics.level).
 
 Output row and batch counts show up for operators where the number of output rows or batches are
 expected to change. For example a filter operation would show the number of rows that passed the