Skip to content

Commit

Permalink
Merge branch 'branch-22.02' into fix-merge
Browse files Browse the repository at this point in the history
  • Loading branch information
jlowe committed Feb 11, 2022
2 parents c2ba7b3 + 7586051 commit e8f44f1
Show file tree
Hide file tree
Showing 9 changed files with 30 additions and 30 deletions.
12 changes: 6 additions & 6 deletions docs/additional-functionality/rapids-udfs.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,19 +141,19 @@ in the [udf-examples](../../udf-examples) project.

- [URLDecode](../../udf-examples/src/main/scala/com/nvidia/spark/rapids/udf/scala/URLDecode.scala)
decodes URL-encoded strings using the
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy)
- [URLEncode](../../udf-examples/src/main/scala/com/nvidia/spark/rapids/udf/scala/URLEncode.scala)
URL-encodes strings using the
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy)

### Spark Java UDF Examples

- [URLDecode](../../udf-examples/src/main/java/com/nvidia/spark/rapids/udf/java/URLDecode.java)
decodes URL-encoded strings using the
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy)
- [URLEncode](../../udf-examples/src/main/java/com/nvidia/spark/rapids/udf/java/URLEncode.java)
URL-encodes strings using the
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy)
- [CosineSimilarity](../../udf-examples/src/main/java/com/nvidia/spark/rapids/udf/java/CosineSimilarity.java)
computes the [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity)
between two float vectors using [native code](../../udf-examples/src/main/cpp/src)
Expand All @@ -162,11 +162,11 @@ between two float vectors using [native code](../../udf-examples/src/main/cpp/sr

- [URLDecode](../../udf-examples/src/main/java/com/nvidia/spark/rapids/udf/hive/URLDecode.java)
implements a Hive simple UDF using the
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy)
to decode URL-encoded strings
- [URLEncode](../../udf-examples/src/main/java/com/nvidia/spark/rapids/udf/hive/URLEncode.java)
implements a Hive generic UDF using the
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy)
to URL-encode strings
- [StringWordCount](../../udf-examples/src/main/java/com/nvidia/spark/rapids/udf/hive/StringWordCount.java)
implements a Hive simple UDF using
Expand Down
2 changes: 1 addition & 1 deletion docs/demo/AWS-EMR/Mortgage-ETL-GPU-EMR.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
"\n",
"Dataset is derived from Fannie Mae’s [Single-Family Loan Performance Data](http://www.fanniemae.com/portal/funding-the-market/data/loan-performance-data.html) with all rights reserved by Fannie Mae. This processed dataset is redistributed with permission and consent from Fannie Mae. For the full raw dataset visit [Fannie Mae]() to register for an account and to download\n",
"\n",
"Instruction is available at NVIDIA [RAPIDS demo site](https://rapidsai.github.io/demos/datasets/mortgage-data).\n",
"Instruction is available at NVIDIA [RAPIDS demo site](https://docs.rapids.ai/datasets/mortgage-data).\n",
"\n",
"## Prerequisite\n",
"\n",
Expand Down
2 changes: 1 addition & 1 deletion docs/demo/GCP/Mortgage-ETL-CPU.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
"\n",
"Dataset is derived from Fannie Mae’s [Single-Family Loan Performance Data](http://www.fanniemae.com/portal/funding-the-market/data/loan-performance-data.html) with all rights reserved by Fannie Mae. This processed dataset is redistributed with permission and consent from Fannie Mae. For the full raw dataset visit [Fannie Mae]() to register for an account and to download\n",
"\n",
"Instruction is available at NVIDIA [RAPIDS demo site](https://rapidsai.github.io/demos/datasets/mortgage-data).\n",
"Instruction is available at NVIDIA [RAPIDS demo site](https://docs.rapids.ai/datasets/mortgage-data).\n",
"\n",
"### Prerequisite\n",
"\n",
Expand Down
2 changes: 1 addition & 1 deletion docs/demo/GCP/Mortgage-ETL-GPU.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
"\n",
"Dataset is derived from Fannie Mae’s [Single-Family Loan Performance Data](http://www.fanniemae.com/portal/funding-the-market/data/loan-performance-data.html) with all rights reserved by Fannie Mae. This processed dataset is redistributed with permission and consent from Fannie Mae. For the full raw dataset visit [Fannie Mae]() to register for an account and to download\n",
"\n",
"Instruction is available at NVIDIA [RAPIDS demo site](https://rapidsai.github.io/demos/datasets/mortgage-data).\n",
"Instruction is available at NVIDIA [RAPIDS demo site](https://docs.rapids.ai/datasets/mortgage-data).\n",
"\n",
"### Prerequisite\n",
"\n",
Expand Down
4 changes: 2 additions & 2 deletions docs/download.md
Original file line number Diff line number Diff line change
Expand Up @@ -619,8 +619,8 @@ account the scenario where input data can be stored across many small files. By
CPU threads v0.2 delivers up to 6x performance improvement over the previous release for small
Parquet file reads.

The RAPIDS Accelerator introduces a beta feature that accelerates [Spark shuffle for
GPUs](get-started/getting-started-on-prem.md#enabling-rapidsshufflemanager). Accelerated
The RAPIDS Accelerator introduces a beta feature that accelerates
[Spark shuffle for GPUs](get-started/getting-started-on-prem.md#enabling-rapids-shuffle-manager). Accelerated
shuffle makes use of high bandwidth transfers between GPUs (NVLink or p2p over PCIe) and leverages
RDMA (RoCE or Infiniband) for remote transfers.

Expand Down
18 changes: 9 additions & 9 deletions docs/get-started/getting-started-databricks.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,12 @@ The number of GPUs per node dictates the number of Spark executors that can run
1. Adaptive query execution(AQE) and Delta optimization write do not work. These should be disabled
when using the plugin. Queries may still see significant speedups even with AQE disabled.

```bash
spark.databricks.delta.optimizeWrite.enabled false
spark.sql.adaptive.enabled false
```
```bash
spark.databricks.delta.optimizeWrite.enabled false
spark.sql.adaptive.enabled false
```

See [issue-1059](https://github.com/NVIDIA/spark-rapids/issues/1059) for more detail.
See [issue-1059](https://github.com/NVIDIA/spark-rapids/issues/1059) for more detail.

2. Dynamic partition pruning(DPP) does not work. This results in poor performance for queries which
would normally benefit from DPP. See
Expand All @@ -42,10 +42,10 @@ when using the plugin. Queries may still see significant speedups even with AQE

4. Cannot spin off multiple executors on a multi-GPU node.

Even though it is possible to set `spark.executor.resource.gpu.amount=N` (where N is the number
of GPUs per node) in the in Spark Configuration tab, Databricks overrides this to
`spark.executor.resource.gpu.amount=1`. This will result in failed executors when starting the
cluster.
Even though it is possible to set `spark.executor.resource.gpu.amount=1` in the in Spark
Configuration tab, Databricks overrides this to `spark.executor.resource.gpu.amount=N`
(where N is the number of GPUs per node). This will result in failed executors when starting the
cluster.

5. Databricks makes changes to the runtime without notification.

Expand Down
6 changes: 3 additions & 3 deletions docs/get-started/getting-started-gcp.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,9 +85,9 @@ If you'd like to further accelerate init time to 4-5 minutes, create a custom Da
## Run PySpark or Scala Notebook on a Dataproc Cluster Accelerated by GPUs
To use notebooks with a Dataproc cluster, click on the cluster name under the Dataproc cluster tab
and navigate to the "Web Interfaces" tab. Under "Web Interfaces", click on the JupyterLab or
Jupyter link to start to use sample [Mortgage ETL on GPU Jupyter
Notebook](../demo/GCP/Mortgage-ETL-GPU.ipynb) to process full 17 years [Mortgage
data](https://rapidsai.github.io/demos/datasets/mortgage-data).
Jupyter link to start to use sample
[Mortgage ETL on GPU Jupyter Notebook](../demo/GCP/Mortgage-ETL-GPU.ipynb) to process full 17 years
[Mortgage data](https://docs.rapids.ai/datasets/mortgage-data).

![Dataproc Web Interfaces](../img/GCP/dataproc-service.png)

Expand Down
12 changes: 6 additions & 6 deletions docs/get-started/getting-started-workload-qualification.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,8 @@ This article describes the tools we provide and how to do gap analysis and workl
### How to use

If you have Spark event logs from prior runs of the applications on Spark 2.x or 3.x, you can use
the [Qualification tool](../spark-qualification-tool.md) and [Profiling
tool](../spark-profiling-tool.md) to analyze them. The qualification tool outputs the score, rank
the [Qualification tool](../spark-qualification-tool.md) and
[Profiling tool](../spark-profiling-tool.md) to analyze them. The qualification tool outputs the score, rank
and some of the potentially not-supported features for each Spark application. For example, the CSV
output can print `Unsupported Read File Formats and Types`, `Unsupported Write Data Format` and
`Potential Problems` which are the indication of some not-supported features. Its output can help
Expand Down Expand Up @@ -119,8 +119,8 @@ the driver logs with `spark.rapids.sql.explain=all`.

This log can show you which operators (on what data type) can not run on GPU and the reason.
If it shows a specific RAPIDS Accelerator parameter which can be turned on to enable that feature,
you should first understand the risk and applicability of that parameter based on [configs
doc](../configs.md) and then enable that parameter and try the tool again.
you should first understand the risk and applicability of that parameter based on
[configs doc](../configs.md) and then enable that parameter and try the tool again.

Since its output is directly based on specific version of `rapids-4-spark` jar, the gap analysis is
pretty accurate.
Expand Down Expand Up @@ -213,8 +213,8 @@ which is the same as the driver logs with `spark.rapids.sql.explain=all`.

This log can show you which operators (on what data type) can not run on GPU and the reason.
If it shows a specific RAPIDS Accelerator parameter which can be turned on to enable that feature,
you should first understand the risk and applicability of that parameter based on [configs
doc](../configs.md) and then enable that parameter and try the tool again.
you should first understand the risk and applicability of that parameter based on
[configs doc](../configs.md) and then enable that parameter and try the tool again.

Since its output is directly based on specific version of `rapids-4-spark` jar, the gap analysis is
pretty accurate.
Expand Down
2 changes: 1 addition & 1 deletion docs/tuning-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -337,7 +337,7 @@ Custom Spark SQL Metrics are available which can help identify performance bottl

Not all metrics are enabled by default. The configuration setting `spark.rapids.sql.metrics.level` can be set
to `DEBUG`, `MODERATE`, or `ESSENTIAL`, with `MODERATE` being the default value. More information about this
configuration option is available in the <a href="configs.md#sql.metrics.level">configuration</a> documentation.
configuration option is available in the [configuration documentation](configs.md#sql.metrics.level).

Output row and batch counts show up for operators where the number of output rows or batches are
expected to change. For example a filter operation would show the number of rows that passed the
Expand Down

0 comments on commit e8f44f1

Please sign in to comment.