From 78e7fcc1e8dc94bf8940d53e2a198986ebd32b2f Mon Sep 17 00:00:00 2001 From: Allen Xu Date: Fri, 13 Nov 2020 22:29:11 +0800 Subject: [PATCH] Update udf-compiler descriptions in related docs Signed-off-by: Allen Xu --- docs/compatibility.md | 4 ++-- udf-compiler/README.md | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/compatibility.md b/docs/compatibility.md index 80720517904..eae7cd3344a 100644 --- a/docs/compatibility.md +++ b/docs/compatibility.md @@ -290,10 +290,10 @@ Casting from string to timestamp currently has the following limitations. Only timezone 'Z' (UTC) is supported. Casting unsupported formats will result in null values. ## UDF to Catalyst Expressions -To speedup the process of UDF, spark-rapids introduces a udf-compiler extension to translate UDFs to Catalyst expressions. +To speedup the process of UDF, spark-rapids introduces a udf-compiler extension to translate UDFs to Catalyst expressions. This compiler will be injected automatically to spark extensions by setting `spark.plugins=com.nvidia.spark.SQLPlugin` and is disabled by default. To enable this operation on the GPU, set -[`spark.rapids.sql.udfCompiler.enabled`](configs.md#sql.udfCompiler.enabled) to `true`, and `spark.sql.extensions=com.nvidia.spark.udf.Plugin`. +[`spark.rapids.sql.udfCompiler.enabled`](configs.md#sql.udfCompiler.enabled) to `true`. However, Spark may produce different results for a compiled udf and the non-compiled. For example: a udf of `x/y` where `y` happens to be `0`, the compiled catalyst expressions will return `NULL` while the original udf would fail the entire job with a `java.lang.ArithmeticException: / by zero` diff --git a/udf-compiler/README.md b/udf-compiler/README.md index fe76bd273e2..0a8ca4e9ba7 100644 --- a/udf-compiler/README.md +++ b/udf-compiler/README.md @@ -14,6 +14,6 @@ How to run ---------- The UDF compiler is included in the rapids-4-spark jar that is produced by the `dist` maven project. Set up your cluster to run the RAPIDS Accelerator for Apache Spark -and set the spark config `spark.sql.extensions` to include `com.nvidia.spark.udf.Plugin`. +and this udf plugin will be automatically injected to spark extensions when `com.nvidia.spark.SQLPlugin` is set. The plugin is still disabled by default and you will need to set `spark.rapids.sql.udfCompiler.enabled` to `true` to enable it.