Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG][Databricks 12.2] GpuRowBasedHiveGenericUDF ClassCastException #8318

Closed
andygrove opened this issue May 18, 2023 · 4 comments · Fixed by #8893
Closed

[BUG][Databricks 12.2] GpuRowBasedHiveGenericUDF ClassCastException #8318

andygrove opened this issue May 18, 2023 · 4 comments · Fixed by #8893
Assignees
Labels
bug Something isn't working

Comments

@andygrove
Copy link
Contributor

Describe the bug

#8282 adds support for Databricks 12.2 but skips the test row-based_udf_test.py::test_hive_empty_generic_udf, which needs further investigation.

E                   Caused by: java.lang.ClassCastException: org.apache.spark.sql.hive.rapids.GpuRowBasedHiveGenericUDF$$Lambda$5077/1906229681 cannot be cast to org.apache.spark.unsafe.types.UTF8String
E                   	at org.apache.spark.sql.hive.HiveInspectors.$anonfun$wrapperFor$3(HiveInspectors.scala:280)
E                   	at org.apache.spark.sql.hive.HiveInspectors.$anonfun$withNullSafe$1(HiveInspectors.scala:262)
E                   	at org.apache.spark.sql.hive.DeferredObjectAdapter.get(hiveUDFs.scala:129)
E                   	at com.nvidia.spark.rapids.tests.udf.hive.EmptyHiveGenericUDF.evaluate(EmptyHiveGenericUDF.java:48)
E                   	at org.apache.spark.sql.hive.rapids.GpuRowBasedHiveGenericUDF.evaluateRow(rowBasedHiveUDFs.scala:183)
E                   	at com.nvidia.spark.rapids.GpuRowBasedUserDefinedFunction.$anonfun$columnarEval$8(GpuUserDefinedFunction.scala:141)
E                   	at com.nvidia.spark.rapids.GpuRowBasedUserDefinedFunction.$anonfun$columnarEval$8$adapted(GpuUserDefinedFunction.scala:140)
E                   	at scala.collection.Iterator.foreach(Iterator.scala:943)
E                   	at scala.collection.Iterator.foreach$(Iterator.scala:943)
E                   	at com.nvidia.spark.rapids.ColumnarToRowIterator.foreach(GpuColumnarToRowExec.scala:198)
E                   	at com.nvidia.spark.rapids.GpuRowBasedUserDefinedFunction.$anonfun$columnarEval$7(GpuUserDefinedFunction.scala:140)
E                   	at com.nvidia.spark.rapids.Arm$.closeOnExcept(Arm.scala:88)
E                   	at com.nvidia.spark.rapids.GpuRowBasedUserDefinedFunction.columnarEval(GpuUserDefinedFunction.scala:124)

Steps/Code to reproduce bug
Run the test on DB 12.2

Expected behavior
Should fall back to CPU or pass.

Environment details (please complete the following information)
N/A

Additional context

@andygrove andygrove added bug Something isn't working ? - Needs Triage Need team to review and classify labels May 18, 2023
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label May 23, 2023
@andygrove
Copy link
Contributor Author

andygrove commented Jul 27, 2023

This is the key part of the stack trace:

Caused by: java.lang.ClassCastException: org.apache.spark.sql.hive.rapids.GpuRowBasedHiveGenericUDF$$Lambda$5077/1906229681 cannot be cast to org.apache.spark.unsafe.types.UTF8String
  at org.apache.spark.sql.hive.HiveInspectors.$anonfun$wrapperFor$3(HiveInspectors.scala:280)
  at org.apache.spark.sql.hive.HiveInspectors.$anonfun$withNullSafe$1(HiveInspectors.scala:262)
  at org.apache.spark.sql.hive.DeferredObjectAdapter.get(hiveUDFs.scala:129)
  at com.nvidia.spark.rapids.tests.udf.hive.EmptyHiveGenericUDF.evaluate(EmptyHiveGenericUDF.java:48)
  at org.apache.spark.sql.hive.rapids.GpuRowBasedHiveGenericUDF.evaluateRow(rowBasedHiveUDFs.scala:183)

DeferredObjectAdapter contains a function. The call to DeferredObjectAdapter.get in open source Spark and earlier Databricks versions will invoke the function and then cast the result to a specific data type. What appears to be happening in Databricks 12.2 is that DeferredObjectAdapter.get tries to apply the cast to the function itself, rather than the result of invoking the function.

This appears to be a bug in Databricks' Spark code.

@andygrove
Copy link
Contributor Author

andygrove commented Jul 31, 2023

After more experimentation, I have a fix for the issue, but it changes the semantics of how Hive generic UDFs work.

In GpuRowBasedHiveGenericUDF, we supply a function for accessing UDF parameters:

deferredObjects(i).set(() => childRowAccessors(idx)(childrenRow))

In open-source Spark, when the UDF calls get() on one of these deferred objects, the function is invoked, and the resulting value is cast to the expected type. As of Databricks 12.2, the behavior has changed, and the value passed to set is accessed directly instead of being invoked as a function. It seems that Databricks no longer supports deferred invocation here.

I can get the test passing on DBR 12.2 if I change the code to just set the value rather than providing a function:

deferredObjects(i).set(childRowAccessors(idx)(childrenRow))

If I try the same code change with open-source Spark, it fails to compile, further confirming that this is a change in functionality in Databricks 12.2

[ERROR] [Error] /home/andy/git/nvidia/spark-rapids/sql-plugin/src/main/scala/org/apache/spark/sql/hive/rapids/rowBasedHiveUDFs.scala:180: type mismatch;
 found   : Any
 required: () => Any

@andygrove
Copy link
Contributor Author

@firestarman fyi

@andygrove
Copy link
Contributor Author

Spark master (3.5.0) has the same new behavior, introduced in apache/spark#39555

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants