Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add packed serialization option for dataframes #8661

Conversation

charlesbluca
Copy link
Member

Adds the option to enable pack/unpack for dataframe serialization using the environment variable CUDF_PACKED_SERIALIZATION. This is intended to be a temporary solution to aid in benchmarking, which will eventually be replaced by a config module discussed in #5311.

@charlesbluca charlesbluca added feature request New feature or request 3 - Ready for Review Ready for review by team Python Affects Python cuDF API. non-breaking Non-breaking change labels Jul 6, 2021
@charlesbluca charlesbluca requested a review from a team as a code owner July 6, 2021 16:16
Copy link
Contributor

@brandon-b-miller brandon-b-miller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charlesbluca this looks fine overall but I am wondering how we plan on testing this? Seems we have some tests for serialization that might suffice from the perspective of coverage, but you'd have to inject the environment variable in both the true and false state during the test session somehow.

@charlesbluca
Copy link
Member Author

That's a good point I didn't consider - I imagine we could either

  • Make a function/context manager to specific enable/disable pack/unpack independent of the env variable
  • Call os.environ.get() on each call of serialize/deserialize, from which we should be able to use os.environ.set() to enable pack/unpack from the test script

@brandon-b-miller
Copy link
Contributor

Something like this may help https://docs.pytest.org/en/latest/how-to/monkeypatch.html

@codecov
Copy link

codecov bot commented Jul 6, 2021

Codecov Report

Merging #8661 (db02ffb) into branch-21.08 (3ee264c) will increase coverage by 0.02%.
The diff coverage is 52.94%.

❗ Current head db02ffb differs from pull request most recent head b85966d. Consider uploading reports for the commit b85966d to get more accurate results
Impacted file tree graph

@@               Coverage Diff                @@
##           branch-21.08    #8661      +/-   ##
================================================
+ Coverage         10.60%   10.62%   +0.02%     
================================================
  Files               109      109              
  Lines             18280    18292      +12     
================================================
+ Hits               1938     1943       +5     
- Misses            16342    16349       +7     
Impacted Files Coverage Δ
python/cudf/cudf/core/dataframe.py 0.00% <0.00%> (ø)
python/dask_cudf/dask_cudf/groupby.py 91.47% <90.00%> (+0.19%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3ee264c...b85966d. Read the comment docs.

@charlesbluca charlesbluca marked this pull request as draft July 13, 2021 17:34
@charlesbluca
Copy link
Member Author

Converting this to draft as there are currently a lot of issues with pack/unpack serialization that need to be worked out before this option is really useful for its intended purpose (benchmarking).

@charlesbluca charlesbluca changed the base branch from branch-21.08 to branch-21.10 July 21, 2021 16:30
@galipremsagar
Copy link
Contributor

Moving this to 21.12, please target this PR to 21.12.

@charlesbluca charlesbluca changed the base branch from branch-21.10 to branch-21.12 September 23, 2021 12:16
@charlesbluca
Copy link
Member Author

Closing this as we haven't really found a place for pack/unpack serialization yet - will reopen if this changes in the future

@charlesbluca charlesbluca deleted the add-packed-serialization-option branch July 19, 2022 14:26
@jakirkham
Copy link
Member

Now that we have a way to config cuDF ( #11193 ), is this worth revisiting?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team feature request New feature or request non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants