RAPIDS accelerated Spark Scala UDF support (#1636)

Signed-off-by: Jason Lowe <jlowe@nvidia.com>
NVIDIA · Feb 1, 2021 · d6be108 · d6be108
1 parent ef6d3a7
commit d6be108
Show file tree

Hide file tree

Showing 25 changed files with 507 additions and 118 deletions.
diff --git a/docs/FAQ.md b/docs/FAQ.md
@@ -252,10 +252,10 @@ can throw at it.
 The RAPIDS Accelerator provides the following solutions for running
 user-defined functions on the GPU:
 
-#### RAPIDS-Accelerated UDFs
+#### RAPIDS Accelerated UDFs
 
-UDFs can provide a RAPIDS-accelerated implementation which allows the RAPIDS Accelerator to perform
-the operation on the GPU.  See the [RAPIDS-accelerated UDF documentation](additional-functionality/rapids-udfs.md)
+UDFs can provide a RAPIDS accelerated implementation which allows the RAPIDS Accelerator to perform
+the operation on the GPU.  See the [RAPIDS accelerated UDF documentation](additional-functionality/rapids-udfs.md)
 for details.
 
 #### Automatic Translation of Scala UDFs to Apache Spark Operations

diff --git a/docs/additional-functionality/rapids-udfs.md b/docs/additional-functionality/rapids-udfs.md
@@ -1,12 +1,12 @@
 ---
 layout: page
-title: RAPIDS-Accelerated User-Defined Functions
+title: RAPIDS Accelerated User-Defined Functions
 parent: Additional Functionality
 nav_order: 3
 ---
-# RAPIDS-Accelerated User-Defined Functions
+# RAPIDS Accelerated User-Defined Functions
 
-This document describes how UDFs can provide a RAPIDS-accelerated
+This document describes how UDFs can provide a RAPIDS accelerated
 implementation alongside the CPU implementation, enabling the
 RAPIDS Accelerator to perform the user-defined operation on the GPU.
 
@@ -28,26 +28,26 @@ with the RAPIDS Accelerator. This implementation can then be invoked by the
 RAPIDS Accelerator when a corresponding query step using the UDF executes
 on the GPU.
 
-## Limitations of RAPIDS-Accelerated UDFs
+## Limitations of RAPIDS Accelerated UDFs
 
-The RAPIDS Accelerator only supports RAPIDS-accelerated forms of regular
-Hive UDFs. Other forms of Spark UDFs are not supported, such as:
+The RAPIDS Accelerator only supports RAPIDS accelerated forms of the
+following UDF types:
+- Scala UDFs implementing a `Function` interface and registered via `SparkSession.udf.register`
+- [Simple](https://github.com/apache/hive/blob/cb213d88304034393d68cc31a95be24f5aac62b6/ql/src/java/org/apache/hadoop/hive/ql/exec/UDF.java)
+  or
+  [Generic](https://github.com/apache/hive/blob/cb213d88304034393d68cc31a95be24f5aac62b6/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java)
+  Hive UDFs
+
+Other forms of Spark UDFs are not supported, such as:
+- Spark Java UDFs (i.e.: derived from `org.apache.spark.sql.api.java.UDF`* interfaces)
 - Hive Aggregate Function (UDAF)
 - Hive Tabular Function (UDTF)
-- Lambda functions and others registered via `SparkSession.udf`
-- Functions created with `org.apache.spark.sql.functions.udf`
-
-## Adding GPU Implementations to Hive UDFs
+- Lambda functions
 
-As mentioned in the [Limitations](#limitations-of-rapids-accelerated-udfs)
-section, the RAPIDS Accelerator only detects GPU implementations for Hive
-regular UDFs. The Hive UDF can be either
-[simple](https://github.com/apache/hive/blob/cb213d88304034393d68cc31a95be24f5aac62b6/ql/src/java/org/apache/hadoop/hive/ql/exec/UDF.java)
-or
-[generic](https://github.com/apache/hive/blob/cb213d88304034393d68cc31a95be24f5aac62b6/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java).
+## Adding GPU Implementations to UDFs
 
-The RAPIDS Accelerator will detect a GPU implementation if the UDF class
-implements the
+For supported UDFs, the RAPIDS Accelerator will detect a GPU implementation
+if the UDF class implements the
 [RapidsUDF](../sql-plugin/src/main/java/com/nvidia/spark/RapidsUDF.java)
 interface. This interface requires implementing the following method:
 
@@ -71,7 +71,7 @@ must not make any assumptions on the number of input rows.
 
 #### Scalar Inputs
 
-Passing scalar inputs to a RAPIDS-accelerated UDF is supported with
+Passing scalar inputs to a RAPIDS accelerated UDF is supported with
 limitations. The scalar value will be replicated into a full column before
 being passed to `evaluateColumnar`. Therefore the UDF implementation cannot
 easily detect the difference between a scalar input and a columnar input.
@@ -92,19 +92,30 @@ cudf type to match the result type of the original UDF. For example, if the
 CPU UDF returns a `double` then `evaluateColumnar` must return a column of
 type `FLOAT64`.
 
-### RAPIDS-Accelerated Hive UDF Examples
+## RAPIDS Accelerated UDF Examples
 
-Source code for examples of RAPIDS-accelerated Hive UDFs is provided
+Source code for examples of RAPIDS accelerated Hive UDFs is provided
 in the [udf-examples](../udf-examples) project.
 
-- [URLDecode](../udf-examples/src/main/java/com/nvidia/spark/rapids/udf/URLDecode.java)
+### Spark Scala UDF Examples
+
+- [URLDecode](../udf-examples/src/main/scala/com/nvidia/spark/rapids/udf/scala/URLDecode.scala)
+decodes URL-encoded strings using the
+[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)
+- [URLEncode](../udf-examples/src/main/scala/com/nvidia/spark/rapids/udf/scala/URLEncode.scala)
+URL-encodes strings using the
+[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)
+
+### Hive UDF Examples
+
+- [URLDecode](../udf-examples/src/main/java/com/nvidia/spark/rapids/udf/hive/URLDecode.java)
 implements a Hive simple UDF using the
 [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)
 to decode URL-encoded strings
-- [URLEncode](../udf-examples/src/main/java/com/nvidia/spark/rapids/udf/URLEncode.java)
+- [URLEncode](../udf-examples/src/main/java/com/nvidia/spark/rapids/udf/hive/URLEncode.java)
 implements a Hive generic UDF using the
 [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)
 to URL-encode strings
-- [StringWordCount](../udf-examples/src/main/java/com/nvidia/spark/rapids/udf/StringWordCount.java)
+- [StringWordCount](../udf-examples/src/main/java/com/nvidia/spark/rapids/udf/hive/StringWordCount.java)
 implements a Hive simple UDF using
 [native code](../udf-examples/src/main/cpp/src) to count words in strings
diff --git a/docs/configs.md b/docs/configs.md
@@ -204,6 +204,7 @@ Name | SQL Function(s) | Description | Default Value | Notes
 <a name="sql.expression.Rint"></a>spark.rapids.sql.expression.Rint|`rint`|Rounds up a double value to the nearest double equal to an integer|true|None|
 <a name="sql.expression.Round"></a>spark.rapids.sql.expression.Round|`round`|Round an expression to d decimal places using HALF_UP rounding mode|true|None|
 <a name="sql.expression.RowNumber"></a>spark.rapids.sql.expression.RowNumber|`row_number`|Window function that returns the index for the row within the aggregation window|true|None|
+<a name="sql.expression.ScalaUDF"></a>spark.rapids.sql.expression.ScalaUDF| |User Defined Function, support requires the UDF to implement a RAPIDS accelerated interface|true|None|
 <a name="sql.expression.Second"></a>spark.rapids.sql.expression.Second|`second`|Returns the second component of the string/timestamp|true|None|
 <a name="sql.expression.ShiftLeft"></a>spark.rapids.sql.expression.ShiftLeft|`shiftleft`|Bitwise shift left (<<)|true|None|
 <a name="sql.expression.ShiftRight"></a>spark.rapids.sql.expression.ShiftRight|`shiftright`|Bitwise shift right (>>)|true|None|
@@ -254,8 +255,8 @@ Name | SQL Function(s) | Description | Default Value | Notes
 <a name="sql.expression.Min"></a>spark.rapids.sql.expression.Min|`min`|Min aggregate operator|true|None|
 <a name="sql.expression.Sum"></a>spark.rapids.sql.expression.Sum|`sum`|Sum aggregate operator|true|None|
 <a name="sql.expression.NormalizeNaNAndZero"></a>spark.rapids.sql.expression.NormalizeNaNAndZero| |Normalize NaN and zero|true|None|
-<a name="sql.expression.HiveGenericUDF"></a>spark.rapids.sql.expression.HiveGenericUDF| |Hive Generic UDF, support requires the UDF to implement a RAPIDS-accelerated interface|true|None|
-<a name="sql.expression.HiveSimpleUDF"></a>spark.rapids.sql.expression.HiveSimpleUDF| |Hive UDF, support requires the UDF to implement a RAPIDS-accelerated interface|true|None|
+<a name="sql.expression.HiveGenericUDF"></a>spark.rapids.sql.expression.HiveGenericUDF| |Hive Generic UDF, support requires the UDF to implement a RAPIDS accelerated interface|true|None|
+<a name="sql.expression.HiveSimpleUDF"></a>spark.rapids.sql.expression.HiveSimpleUDF| |Hive UDF, support requires the UDF to implement a RAPIDS accelerated interface|true|None|
 
 ### Execution
 

diff --git a/docs/supported_ops.md b/docs/supported_ops.md
@@ -10973,6 +10973,96 @@ Accelerator support is described below.
 <td> </td>
 </tr>
 <tr>
+<td rowSpan="4">ScalaUDF</td>
+<td rowSpan="4"> </td>
+<td rowSpan="4">User Defined Function, support requires the UDF to implement a RAPIDS accelerated interface</td>
+<td rowSpan="4">None</td>
+<td rowSpan="2">project</td>
+<td>param</td>
+<td>S</td>
+<td>S</td>
+<td>S</td>
+<td>S</td>
+<td>S</td>
+<td>S</td>
+<td>S</td>
+<td>S</td>
+<td>S*</td>
+<td>S</td>
+<td>S*</td>
+<td>S</td>
+<td>S</td>
+<td>S</td>
+<td><em>PS* (missing nested UDT)</em></td>
+<td><em>PS* (missing nested UDT)</em></td>
+<td><em>PS* (missing nested UDT)</em></td>
+<td><b>NS</b></td>
+</tr>
+<tr>
+<td>result</td>
+<td>S</td>
+<td>S</td>
+<td>S</td>
+<td>S</td>
+<td>S</td>
+<td>S</td>
+<td>S</td>
+<td>S</td>
+<td>S*</td>
+<td>S</td>
+<td>S*</td>
+<td>S</td>
+<td>S</td>
+<td>S</td>
+<td><em>PS* (missing nested UDT)</em></td>
+<td><em>PS* (missing nested UDT)</em></td>
+<td><em>PS* (missing nested UDT)</em></td>
+<td><b>NS</b></td>
+</tr>
+<tr>
+<td rowSpan="2">lambda</td>
+<td>param</td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+</tr>
+<tr>
+<td>result</td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+</tr>
+<tr>
 <td rowSpan="4">Second</td>
 <td rowSpan="4">`second`</td>
 <td rowSpan="4">Returns the second component of the string/timestamp</td>
@@ -16778,7 +16868,7 @@ Accelerator support is described below.
 <tr>
 <td rowSpan="4">HiveGenericUDF</td>
 <td rowSpan="4"> </td>
-<td rowSpan="4">Hive Generic UDF, support requires the UDF to implement a RAPIDS-accelerated interface</td>
+<td rowSpan="4">Hive Generic UDF, support requires the UDF to implement a RAPIDS accelerated interface</td>
 <td rowSpan="4">None</td>
 <td rowSpan="2">project</td>
 <td>param</td>
@@ -16868,7 +16958,7 @@ Accelerator support is described below.
 <tr>
 <td rowSpan="4">HiveSimpleUDF</td>
 <td rowSpan="4"> </td>
-<td rowSpan="4">Hive UDF, support requires the UDF to implement a RAPIDS-accelerated interface</td>
+<td rowSpan="4">Hive UDF, support requires the UDF to implement a RAPIDS accelerated interface</td>
 <td rowSpan="4">None</td>
 <td rowSpan="2">project</td>
 <td>param</td>

diff --git a/integration_tests/README.md b/integration_tests/README.md
@@ -107,7 +107,7 @@ individually, so you don't risk running unit tests along with the integration te
 http://www.scalatest.org/user_guide/using_the_scalatest_shell
 
 ```shell 
-spark-shell --jars rapids-4-spark-tests_2.12-0.4.0-SNAPSHOT-tests.jar,rapids-4-spark-udf-examples-0.4.0-SNAPSHOT,rapids-4-spark-integration-tests_2.12-0.4.0-SNAPSHOT-tests.jar,scalatest_2.12-3.0.5.jar,scalactic_2.12-3.0.5.jar
+spark-shell --jars rapids-4-spark-tests_2.12-0.4.0-SNAPSHOT-tests.jar,rapids-4-spark-udf-examples_2.12-0.4.0-SNAPSHOT,rapids-4-spark-integration-tests_2.12-0.4.0-SNAPSHOT-tests.jar,scalatest_2.12-3.0.5.jar,scalactic_2.12-3.0.5.jar
 ```
 
 First you import the `scalatest_shell` and tell the tests where they can find the test files you
@@ -131,7 +131,7 @@ If you just want to verify the SQL replacement is working you will need to add t
 example assumes CUDA 10.1 is being used.
 
 ```
-$SPARK_HOME/bin/spark-submit --jars "rapids-4-spark_2.12-0.4.0-SNAPSHOT.jar,rapids-4-spark-udf-examples-0.4.0-SNAPSHOT.jar,cudf-0.18-SNAPSHOT-cuda10-1.jar" ./runtests.py
+$SPARK_HOME/bin/spark-submit --jars "rapids-4-spark_2.12-0.4.0-SNAPSHOT.jar,rapids-4-spark-udf-examples_2.12-0.4.0-SNAPSHOT.jar,cudf-0.18-SNAPSHOT-cuda10-1.jar" ./runtests.py
 ```
 
 You don't have to enable the plugin for this to work, the test framework will do that for you.
@@ -183,7 +183,7 @@ The TPCxBB, TPCH, TPCDS, and Mortgage tests in this framework can be enabled by
 As an example, here is the `spark-submit` command with the TPCxBB parameters on CUDA 10.1:
 
 ```
-$SPARK_HOME/bin/spark-submit --jars "rapids-4-spark_2.12-0.4.0-SNAPSHOT.jar,rapids-4-spark-udf-examples-0.4.0-SNAPSHOT.jar,cudf-0.18-SNAPSHOT-cuda10-1.jar,rapids-4-spark-tests_2.12-0.4.0-SNAPSHOT.jar" ./runtests.py --tpcxbb_format="csv" --tpcxbb_path="/path/to/tpcxbb/csv"
+$SPARK_HOME/bin/spark-submit --jars "rapids-4-spark_2.12-0.4.0-SNAPSHOT.jar,rapids-4-spark-udf-examples_2.12-0.4.0-SNAPSHOT.jar,cudf-0.18-SNAPSHOT-cuda10-1.jar,rapids-4-spark-tests_2.12-0.4.0-SNAPSHOT.jar" ./runtests.py --tpcxbb_format="csv" --tpcxbb_path="/path/to/tpcxbb/csv"
 ```
 
 Be aware that running these tests with read data requires at least an entire GPU, and preferable several GPUs/executors
@@ -212,7 +212,7 @@ To run cudf_udf tests, need following configuration changes:
 As an example, here is the `spark-submit` command with the cudf_udf parameter on CUDA 10.1:
 
 ```
-$SPARK_HOME/bin/spark-submit --jars "rapids-4-spark_2.12-0.4.0-SNAPSHOT.jar,rapids-4-spark-udf-examples-0.4.0-SNAPSHOT.jar,cudf-0.18-SNAPSHOT-cuda10-1.jar,rapids-4-spark-tests_2.12-0.4.0-SNAPSHOT.jar" --conf spark.rapids.memory.gpu.allocFraction=0.3 --conf spark.rapids.python.memory.gpu.allocFraction=0.3 --conf spark.rapids.python.concurrentPythonWorkers=2 --py-files "rapids-4-spark_2.12-0.4.0-SNAPSHOT.jar" --conf spark.executorEnv.PYTHONPATH="rapids-4-spark_2.12-0.4.0-SNAPSHOT.jar" ./runtests.py --cudf_udf
+$SPARK_HOME/bin/spark-submit --jars "rapids-4-spark_2.12-0.4.0-SNAPSHOT.jar,rapids-4-spark-udf-examples_2.12-0.4.0-SNAPSHOT.jar,cudf-0.18-SNAPSHOT-cuda10-1.jar,rapids-4-spark-tests_2.12-0.4.0-SNAPSHOT.jar" --conf spark.rapids.memory.gpu.allocFraction=0.3 --conf spark.rapids.python.memory.gpu.allocFraction=0.3 --conf spark.rapids.python.concurrentPythonWorkers=2 --py-files "rapids-4-spark_2.12-0.4.0-SNAPSHOT.jar" --conf spark.executorEnv.PYTHONPATH="rapids-4-spark_2.12-0.4.0-SNAPSHOT.jar" ./runtests.py --cudf_udf
 ```
 
 ## Writing tests

diff --git a/integration_tests/pom.xml b/integration_tests/pom.xml
@@ -107,7 +107,7 @@
         </dependency>
         <dependency>
             <groupId>com.nvidia</groupId>
-            <artifactId>rapids-4-spark-udf-examples</artifactId>
+            <artifactId>rapids-4-spark-udf-examples_${scala.binary.version}</artifactId>
             <version>${project.version}</version>
             <scope>test</scope>
         </dependency>

diff --git a/integration_tests/src/main/python/rapids_udf_test.py b/integration_tests/src/main/python/rapids_udf_test.py
@@ -36,7 +36,7 @@ def test_hive_simple_udf():
     with_spark_session(skip_if_no_hive)
     data_gens = [["i", int_gen], ["s", StringGen('([^%]{0,1}(%[0-9A-F][0-9A-F]){0,1}){0,30}')]]
     def evalfn(spark):
-        load_udf_or_skip_test(spark, "urldecode", "com.nvidia.spark.rapids.udf.URLDecode")
+        load_udf_or_skip_test(spark, "urldecode", "com.nvidia.spark.rapids.udf.hive.URLDecode")
         return gen_df(spark, data_gens)
     assert_gpu_and_cpu_are_equal_sql(
         evalfn,
@@ -47,7 +47,7 @@ def test_hive_generic_udf():
     with_spark_session(skip_if_no_hive)
     data_gens = [["s", StringGen('.{0,30}')]]
     def evalfn(spark):
-        load_udf_or_skip_test(spark, "urlencode", "com.nvidia.spark.rapids.udf.URLEncode")
+        load_udf_or_skip_test(spark, "urlencode", "com.nvidia.spark.rapids.udf.hive.URLEncode")
         return gen_df(spark, data_gens)
     assert_gpu_and_cpu_are_equal_sql(
         evalfn,
@@ -59,7 +59,7 @@ def test_hive_simple_udf_native(enable_rapids_udf_example_native):
     with_spark_session(skip_if_no_hive)
     data_gens = [["s", StringGen('.{0,30}')]]
     def evalfn(spark):
-        load_udf_or_skip_test(spark, "wordcount", "com.nvidia.spark.rapids.udf.StringWordCount")
+        load_udf_or_skip_test(spark, "wordcount", "com.nvidia.spark.rapids.udf.hive.StringWordCount")
         return gen_df(spark, data_gens)
     assert_gpu_and_cpu_are_equal_sql(
         evalfn,

diff --git a/jenkins/databricks/build.sh b/jenkins/databricks/build.sh
@@ -1,6 +1,6 @@
 #!/bin/bash
 #
-# Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
+# Copyright (c) 2020-2021, NVIDIA CORPORATION. All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@@ -46,7 +46,7 @@ CUDA_VERSION=`mvn help:evaluate -q -pl dist -Dexpression=cuda.version -DforceStd
 # the version of spark used when we install the databricks jars in .m2
 SPARK_VERSION_TO_INSTALL_DATABRICKS_JARS=$BASE_SPARK_VERSION-databricks
 RAPIDS_BUILT_JAR=rapids-4-spark_$SCALA_VERSION-$SPARK_PLUGIN_JAR_VERSION.jar
-RAPIDS_UDF_JAR=rapids-4-spark-udf-examples-$SPARK_PLUGIN_JAR_VERSION.jar
+RAPIDS_UDF_JAR=rapids-4-spark-udf-examples_$SCALA_VERSION-$SPARK_PLUGIN_JAR_VERSION.jar
 
 echo "Scala version is: $SCALA_VERSION"
 mvn -B -P${BUILD_PROFILES} clean package -DskipTests || true

diff --git a/jenkins/spark-tests.sh b/jenkins/spark-tests.sh
@@ -1,6 +1,6 @@
 #!/bin/bash
 #
-# Copyright (c) 2019-2020, NVIDIA CORPORATION. All rights reserved.
+# Copyright (c) 2019-2021, NVIDIA CORPORATION. All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@@ -33,7 +33,7 @@ $MVN_GET_CMD -DremoteRepositories=$CUDF_REPO \
 $MVN_GET_CMD -DremoteRepositories=$PROJECT_REPO \
     -DgroupId=com.nvidia -DartifactId=rapids-4-spark_$SCALA_BINARY_VER -Dversion=$PROJECT_VER
 $MVN_GET_CMD -DremoteRepositories=$PROJECT_TEST_REPO \
-    -DgroupId=com.nvidia -DartifactId=rapids-4-spark-udf-examples -Dversion=$PROJECT_TEST_VER
+    -DgroupId=com.nvidia -DartifactId=rapids-4-spark-udf-examples_$SCALA_BINARY_VER -Dversion=$PROJECT_TEST_VER
 $MVN_GET_CMD -DremoteRepositories=$PROJECT_TEST_REPO \
     -DgroupId=com.nvidia -DartifactId=rapids-4-spark-integration-tests_$SCALA_BINARY_VER -Dversion=$PROJECT_TEST_VER
 if [ "$CUDA_CLASSIFIER"x == x ];then
@@ -42,7 +42,7 @@ else
     CUDF_JAR="$ARTF_ROOT/cudf-$CUDF_VER-$CUDA_CLASSIFIER.jar"
 fi
 RAPIDS_PLUGIN_JAR="$ARTF_ROOT/rapids-4-spark_${SCALA_BINARY_VER}-$PROJECT_VER.jar"
-RAPIDS_UDF_JAR="$ARTF_ROOT/rapids-4-spark-udf-examples-$PROJECT_TEST_VER.jar"
+RAPIDS_UDF_JAR="$ARTF_ROOT/rapids-4-spark-udf-examples_${SCALA_BINARY_VER}-$PROJECT_TEST_VER.jar"
 RAPIDS_TEST_JAR="$ARTF_ROOT/rapids-4-spark-integration-tests_${SCALA_BINARY_VER}-$PROJECT_TEST_VER.jar"
 
 $MVN_GET_CMD -DremoteRepositories=$PROJECT_TEST_REPO \

diff --git a/sql-plugin/src/main/java/com/nvidia/spark/RapidsUDF.java b/sql-plugin/src/main/java/com/nvidia/spark/RapidsUDF.java
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2020, NVIDIA CORPORATION.
+ * Copyright (c) 2020-2021, NVIDIA CORPORATION.
  *
  * Licensed under the Apache License, Version 2.0 (the "License");
  * you may not use this file except in compliance with the License.
@@ -18,7 +18,7 @@
 
 import ai.rapids.cudf.ColumnVector;
 
-/** A RAPIDS-accelerated version of a user-defined function (UDF). */
+/** A RAPIDS accelerated version of a user-defined function (UDF). */
 public interface RapidsUDF {
   /**
    * Evaluate a user-defined function with RAPIDS cuDF columnar inputs

diff --git a/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala b/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala
@@ -2192,11 +2192,13 @@ object GpuOverrides {
       (a, conf, p, r) => new UnaryExprMeta[MakeDecimal](a, conf, p, r) {
         override def convertToGpu(child: Expression): GpuExpression =
           GpuMakeDecimal(child, a.precision, a.scale, a.nullOnOverflow)
-      })
+      }),
+    GpuScalaUDF.exprMeta
   ).map(r => (r.getClassFor.asSubclass(classOf[Expression]), r)).toMap
 
+  // Shim expressions should be last to allow overrides with shim-specific versions
   val expressions: Map[Class[_ <: Expression], ExprRule[_ <: Expression]] =
-    commonExpressions ++ ShimLoader.getSparkShims.getExprs ++ GpuHiveOverrides.exprs
+    commonExpressions ++ GpuHiveOverrides.exprs ++ ShimLoader.getSparkShims.getExprs
 
 
   def wrapScan[INPUT <: Scan](