-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fallback to CPU for Parquet reads with _databricks_internal
columns [databricks]
#6257
Fallback to CPU for Parquet reads with _databricks_internal
columns [databricks]
#6257
Conversation
Signed-off-by: Andy Grove <andygrove@nvidia.com>
case f: FileSourceScanExec if f.requiredSchema.fields | ||
.exists(_.name.startsWith("_databricks_internal")) => | ||
logDebug(s"Fallback for FileSourceScanExec with _databricks_internal: $f") | ||
true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will look at adding a shim so we only apply this check on Databricks
👍 I have verified that this change allows for the I'm inclined to call this "Plan A", instead of it being the fallback position for 22.08. The We have verified that run-of-the-mill DELTA table reads are not affected: only the |
_databricks_internal
columns_databricks_internal
columns [databricks]
build |
build |
_databricks_internal
columns [databricks]_databricks_internal
columns [databricks]
I still plan on making the shim change, but tests are now passing, and this could be merged as is for 22.08 with a follow-up issue to use shims in 22,10. |
I filed #6279 for the follow-up issue |
When performing a Delta Lake MERGE operation on Databricks, a query runs that relies on receiving valid data from computed columns such as
_databricks_internal_edge_computed_column_row_index
and_databricks_internal_edge_computed_column_skip_row
in the Parquet read.For example:
These columns obviously do not exist if we are reading the files on GPU so we need to fall back to CPU in this case.
This PR is still WIP because there are no tests yet.