Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Parquet read fails when table created with more columns than file #60

Closed
revans2 opened this issue May 29, 2020 · 2 comments
Closed
Assignees
Labels
bug Something isn't working SQL part of the SQL/Dataframe plugin

Comments

@revans2
Copy link
Collaborator

revans2 commented May 29, 2020

Describe the bug
If you run a create table with more columns then the underlying parquet file has, the cpu version works and fills the column with nulls, but the gpu version blows up.

Steps/Code to reproduce bug

CREATE TABLE IF NOT EXISTS f_five_minutes (`five_minutes` TIMESTAMP, `day` DATE, `DateTime` TIMESTAMP)
USING PARQUET
OPTIONS (
path 's3://.../five_minutes'
)
PARTITIONED BY (day)

The parquet file itself only has five_minutes in it.

@revans2 revans2 added bug Something isn't working ? - Needs Triage Need team to review and classify SQL part of the SQL/Dataframe plugin labels May 29, 2020
@revans2 revans2 removed the ? - Needs Triage Need team to review and classify label Jun 2, 2020
@revans2
Copy link
Collaborator Author

revans2 commented Jun 9, 2020

This also shows up when doing schema evolution with the 'mergeSchema' option.

An integration test has been added for this.

@revans2
Copy link
Collaborator Author

revans2 commented Jun 11, 2020

This was fixed as a part of 7294b1a

@revans2 revans2 closed this as completed Jun 11, 2020
@revans2 revans2 added this to the Release 0.1 milestone Jun 12, 2020
wjxiz1992 pushed a commit to wjxiz1992/spark-rapids that referenced this issue Oct 29, 2020
* Rewrite, edit, and reconfigure getting started guides for EMR and SageMaker

* Small documentation fixes
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this issue Nov 30, 2023
Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working SQL part of the SQL/Dataframe plugin
Projects
None yet
Development

No branches or pull requests

1 participant