Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update getFileScanRDD shim for recent changes in Spark 3.3.0 [databricks] #4427

Merged
merged 2 commits into from
Dec 23, 2021

Conversation

jlowe
Copy link
Member

@jlowe jlowe commented Dec 22, 2021

Fixes #4423. Updated the getFileScanRDD shim interface to take the read data schema and metadata columns, and updated the only place this is called to pass the read data schema. I filed #4426 to track handling the new hidden metadata column functionality added in 3.3.0.

Signed-off-by: Jason Lowe <jlowe@nvidia.com>
@jlowe jlowe added the build Related to CI / CD or cleanly building label Dec 22, 2021
@jlowe jlowe added this to the Dec 13 - Jan 7 milestone Dec 22, 2021
@jlowe jlowe self-assigned this Dec 22, 2021
@jlowe
Copy link
Member Author

jlowe commented Dec 22, 2021

build

@jlowe jlowe changed the title Update getFileScanRDD shim for recent changes in Spark 3.3.0 Update getFileScanRDD shim for recent changes in Spark 3.3.0 [databricks] Dec 22, 2021
@jlowe
Copy link
Member Author

jlowe commented Dec 22, 2021

build

@jlowe
Copy link
Member Author

jlowe commented Dec 22, 2021

build

@revans2
Copy link
Collaborator

revans2 commented Dec 22, 2021

Do you know what the change was for?

@jlowe
Copy link
Member Author

jlowe commented Dec 22, 2021

Do you know what the change was for?

Yes, details are in #4426 and the linked Spark JIRA. Spark 3.3 added support for loading "hidden" metadata if specified explicitly. This PR just gets us building again, but we'll need to properly handle this feature, either by implementing the metadata columns for the GPU or falling back to the CPU if they are detected in the read schema, when we start officially supporting Spark 3.3.0.

@GaryShen2008 GaryShen2008 merged commit c47fc88 into NVIDIA:branch-22.02 Dec 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Related to CI / CD or cleanly building
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Build is failing due to FileScanRDD changes in Spark 3.3.0-SNAPSHOT
3 participants