Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Support for missing ORC column statistics #7087

Closed
1 of 4 tasks
vuule opened this issue Jan 6, 2021 · 2 comments · Fixed by #13848
Closed
1 of 4 tasks

[FEA] Support for missing ORC column statistics #7087

vuule opened this issue Jan 6, 2021 · 2 comments · Fixed by #13848
Labels
cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.

Comments

@vuule
Copy link
Contributor

vuule commented Jan 6, 2021

The column statistics encoding in the writer is missing support for a few fields:

  • hasNull;
  • sum field of doubleStatistics;
  • sum field of stringStatistics;
  • decimalStatistics;

Also, the ProtobufReader does not support bool fields (needed to read the hasNull field without a Python API).

@vuule vuule added feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. cuIO cuIO issue labels Jan 6, 2021
@vuule vuule changed the title [FEA] Support for hasNull ORC column statistics [FEA] Support for missing ORC column statistics Jan 12, 2021
@github-actions
Copy link

This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.

@github-actions github-actions bot added the stale label Feb 16, 2021
@github-actions
Copy link

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

rapids-bot bot pushed a commit that referenced this issue Sep 26, 2022
Adds the ability for ORC statistics reader to read the value `ColumnStatistics::hasNull`. 

Contributes to #7087. Does not close it because the issue also requires the ability to write the field in the orc writer.

Authors:
  - Devavret Makkar (https://github.com/devavret)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - Vukasin Milovanovic (https://github.com/vuule)

URL: #11747
rapids-bot bot pushed a commit that referenced this issue Sep 18, 2023
Closes #7087, closes #13793, closes #13899

This PR adds support for several cases and statistics types:
- sum statistics are included even when all elements are null (no minmax);
- sum statistics are included in double stats;
- minimum/maximum and minimumNanos/maximumNanos are included in timestamp stats;
- hasNull field is written for all columns.
- decimal statistics

Added tests for all supported stats.

Authors:
  - Vukasin Milovanovic (https://github.com/vuule)
  - Karthikeyan (https://github.com/karthikeyann)

Approvers:
  - Lawrence Mitchell (https://github.com/wence-)
  - Robert (Bobby) Evans (https://github.com/revans2)
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Karthikeyan (https://github.com/karthikeyann)

URL: #13848
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants