Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parquet Statistics Pruning Ignores ColumnOrder, resulting in potentially incorrect statistics #8342

Open
tustvold opened this issue Nov 28, 2023 · 5 comments
Labels
bug Something isn't working

Comments

@tustvold
Copy link
Contributor

tustvold commented Nov 28, 2023

Describe the bug

The statistics are only valid if interpreted in the context of ColumnOrder, otherwise the results are not necessarily correct

The ColumnOrder field in Parquet statistics says what ordering was used to compute the min/max values and seems not to be widely used or populated in the eco system. However, ignoring it when it is present is probably wrong

To Reproduce

No response

Expected behavior

No response

Additional context

No response

@tustvold tustvold added the bug Something isn't working label Nov 28, 2023
@tustvold tustvold changed the title Parquet Statistics Ignores ColumnOrder Parquet Statistics Pruning Ignores ColumnOrder Nov 28, 2023
@alamb alamb changed the title Parquet Statistics Pruning Ignores ColumnOrder Parquet Statistics Pruning Ignores ColumnOrder, resulting in potentially incorrect statistics Nov 28, 2023
@alamb
Copy link
Contributor

alamb commented Nov 28, 2023

@alamb
Copy link
Contributor

alamb commented Dec 1, 2023

I believe we fixed this in #8294

But I am not 100% sure given the dearth of information on this ticket. Please reopen it if I am misunderstanding

@alamb alamb closed this as completed Dec 1, 2023
@tustvold
Copy link
Contributor Author

tustvold commented Dec 1, 2023

Afraid this is tracking something different that PR didn't address, as we aren't even populating this correctly in parquet-rs currently - apache/arrow-rs#5152

@tustvold tustvold reopened this Dec 1, 2023
@alamb
Copy link
Contributor

alamb commented Dec 3, 2023

Thanks, updated the description hopefully to provide a little more background

@edmondop
Copy link
Contributor

I pick this one if I can

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants