-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Figure out why MapFromArrays
appears in the tests for hive parquet write
#10948
Comments
This also happens on Spark 351. See #10956 |
Added back in needs triage because if we really need to understand what is happening. If we cannot do something simple with DB like this it is either a bug in our code or theirs and we need to know which. |
@firestarman this needs to be investigated to figure out the root cause given we'll have an unneeded fallback with this feature on Databricks. |
This However this is not a Plugin bug, I think. Because Spark 350+ generates a different plan for the Hive style write ( Spark 341
Spark 350 and 351
But the CTAS command (
|
This appears to be coming form https://github.com/apache/spark/blob/fd86f85e181fc2dc0f50a096855acf83a6cc5d9c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala#L381-L421 It appears that https://issues.apache.org/jira/browse/SPARK-42151 apache/spark#40308 So technically this is a regression, more accurately a performance regression, in that we could run the query fully on the GPU before, but now we cannot. |
@sameerz and @mattahrens we now know why the regression has happened and we need to decide what the next steps are. Implementing this is not too difficult. We mainly need to verify that the array lengths are the same everywhere and then pull out the data column from each of the arrays and turn them into a struct. |
Describe the bug
PR #10912 introduces the parquet support for
GpuInsertIntoHiveTable
, along with the relevant tests. In some of the tests on Databricks, theProjectExec
will fall back to CPU due to missing the GPU version of theMapFromArrays
expression.It is better to find out the root cause of why this expression appears only in these tests on Databricks.
The text was updated successfully, but these errors were encountered: