RAPIDS accelerated Spark Scala UDF support #1636
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #1594.
This implements a GPU version of
ScalaUDF
which is used to track Spark Scala UDFs. Note that this class is also used for Java UDFs which are NOT supported by this change due to the obscuring of the user's class by a lambda wrapper for that case. Adding support for Spark Java UDFs is tracked by #1635.A working example of a RAPIDS accelerated Spark Scala UDF is also provided, which required adding the Scala version to the udf-examples jar (and everywhere it was referenced). A unit test was added to exercise it. It was not implemented as a Python test as was done for the Hive UDFs because PySpark does not support registering Scala UDFs (it uses the Java UDF interface instead).
The RAPIDS accelerated UDF documentation has also been updated to reflect the new functionality.