Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] parquet and orc corner case tests #5462

Open
revans2 opened this issue May 11, 2022 · 0 comments
Open

[FEA] parquet and orc corner case tests #5462

revans2 opened this issue May 11, 2022 · 0 comments
Labels
reliability Features to improve reliability or bugs that severly impact the reliability of the plugin test Only impacts tests

Comments

@revans2
Copy link
Collaborator

revans2 commented May 11, 2022

We have run into a number of places recently where there are corner cases with old parquet data, or odd mixtures of things that are causing issues.

The goal of this is to really try hard to find corner cases for us to test for parquet and ORC. This is likely going to require us to understand the file formats themselves and write out data in a way that Spark cannot do. This is like with #5445

We should also look deeply at schema evolution and what happens if I add new files that have a modified schema. What does the CPU do and how do we handle it? Things like moving from an int to a long. We have implemented some of this for parquet but ORC is still really lacking #135

We should look at features like with parquet having the data stored in a different file from the footer. Does anyone use this? If so does Spark with with this?

To be clear not all of this work needs to be done in one issue. We can split this up into multiple issues, and if we find bugs we need to make sure to file those bugs against us.

@revans2 revans2 added feature request New feature or request ? - Needs Triage Need team to review and classify test Only impacts tests reliability Features to improve reliability or bugs that severly impact the reliability of the plugin labels May 11, 2022
@sameerz sameerz removed ? - Needs Triage Need team to review and classify feature request New feature or request labels May 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reliability Features to improve reliability or bugs that severly impact the reliability of the plugin test Only impacts tests
Projects
None yet
Development

No branches or pull requests

2 participants