Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Fix index handling in parquet reader and writer #6771

Merged
merged 27 commits into from
Dec 2, 2020

Conversation

galipremsagar
Copy link
Contributor

@galipremsagar galipremsagar commented Nov 16, 2020

Fixes: #6337 , #6740

The actual issue was that cudf was failing to read/write a dataframe for any case other than a RangeIndex/named Multi-Index(all-levels), so this PR revamps index writing and retrieval logic to and from parquet metadata.

This PR:

  • Introduces code changes to write & read Index/MultiIndex objects correctly into & from parquet files.
  • Adds step attribute to RangeIndex.
  • Fixes issue where a RangeIndex with step value other than 1 was failing in cudf.from_pandas.
  • Add & Fix pytests for above changes.

@galipremsagar galipremsagar added 2 - In Progress Currently a work in progress Python Affects Python cuDF API. labels Nov 16, 2020
@galipremsagar galipremsagar self-assigned this Nov 16, 2020
@GPUtester
Copy link
Collaborator

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

@codecov
Copy link

codecov bot commented Nov 16, 2020

Codecov Report

Merging #6771 (2a9b65c) into branch-0.17 (bd537b6) will increase coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.17    #6771      +/-   ##
===============================================
+ Coverage        81.92%   81.93%   +0.01%     
===============================================
  Files               96       96              
  Lines            16167    16166       -1     
===============================================
+ Hits             13245    13246       +1     
+ Misses            2922     2920       -2     
Impacted Files Coverage Δ
python/cudf/cudf/core/dataframe.py 90.99% <100.00%> (-0.02%) ⬇️
python/cudf/cudf/core/index.py 93.13% <100.00%> (+0.24%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bd537b6...2a9b65c. Read the comment docs.

@galipremsagar galipremsagar changed the title [WIP] Fix index handling in parquet reader and writer [REVIEW] Fix index handling in parquet reader and writer Nov 16, 2020
@galipremsagar galipremsagar marked this pull request as ready for review November 16, 2020 21:01
@galipremsagar galipremsagar requested a review from a team as a code owner November 16, 2020 21:01
Comment on lines +1535 to +1541
@property
def step(self):
"""
The value of the step parameter.
"""
return self._step

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching the missing step property!

Copy link
Collaborator

@kkraus14 kkraus14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor change, otherwise LGTM

python/cudf/cudf/_lib/parquet.pyx Show resolved Hide resolved
@galipremsagar galipremsagar changed the title [WIP] Fix index handling in parquet reader and writer [REVIEW] Fix index handling in parquet reader and writer Dec 1, 2020
@galipremsagar galipremsagar changed the title [REVIEW] Fix index handling in parquet reader and writer [WIP] Fix index handling in parquet reader and writer Dec 1, 2020
@galipremsagar galipremsagar changed the title [WIP] Fix index handling in parquet reader and writer [REVIEW] Fix index handling in parquet reader and writer Dec 1, 2020
@galipremsagar galipremsagar added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Dec 1, 2020
@galipremsagar galipremsagar added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Dec 1, 2020
python/cudf/cudf/_lib/parquet.pyx Outdated Show resolved Hide resolved
python/cudf/cudf/_lib/parquet.pyx Outdated Show resolved Hide resolved
python/cudf/cudf/tests/test_parquet.py Outdated Show resolved Hide resolved
@kkraus14 kkraus14 added 5 - Ready to Merge Testing and reviews complete, ready to merge 6 - Okay to Auto-Merge and removed 3 - Ready for Review Ready for review by team labels Dec 1, 2020
@galipremsagar galipremsagar requested a review from a team as a code owner December 1, 2020 22:08
@rapids-bot rapids-bot bot merged commit edd1af1 into rapidsai:branch-0.17 Dec 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
5 participants