Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Python bindings to pack/unpack #7601

Closed
jakirkham opened this issue Mar 15, 2021 · 2 comments · Fixed by #8153
Closed

[FEA] Python bindings to pack/unpack #7601

jakirkham opened this issue Mar 15, 2021 · 2 comments · Fixed by #8153
Assignees
Labels
feature request New feature or request Python Affects Python cuDF API.

Comments

@jakirkham
Copy link
Member

Is your feature request related to a problem? Please describe.

When shipping DataFrames over the wire or spilling them, it can be handy to pack them into a more compact single buffer first and then unpack them into multiple buffers at the other end.

Describe the solution you'd like

Recently this functionality was added at the C++ layer ( #7096 ). It would be good to have bindings to this for Python and use this in relevant serialize/deserialize methods.

Describe alternatives you've considered

We could do this packing elsewhere like in Distributed ( dask/distributed#3732 ). Though this would then not use the C++ implementation here. It also wouldn't solve this for other Python use cases

Additional context

We've discussed adding this to a config potentially ( #5311 ). Not sure if this is still needed given the newer C++ implementation

@jakirkham jakirkham added feature request New feature or request Needs Triage Need team to review and classify labels Mar 15, 2021
@kkraus14 kkraus14 added Python Affects Python cuDF API. and removed Needs Triage Need team to review and classify labels Mar 26, 2021
@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@shwina
Copy link
Contributor

shwina commented Jun 2, 2021

Being worked on in #8153

rapids-bot bot pushed a commit that referenced this issue Jun 30, 2021
Closes #7601

Adds a Python API for `pack`/`unpack`, so that we might be able to pack/unpack DataFrames in serialization:

- `PackedColumns` is a Python representation of the `cudf::packed_columns` struct containing the struct itself along with some Python metadata for the DataFrame being packed; supports Dask/pickle serialization
- `pack()` takes in a `Table` and returns a `PackedColumns`
- `unpack()` takes in a `PackedColumns` and returns a `Table`

cc @brandon-b-miller

Authors:
  - Charles Blackmon-Luca (https://github.com/charlesbluca)

Approvers:
  - Devavret Makkar (https://github.com/devavret)
  - https://github.com/brandon-b-miller
  - Karthikeyan (https://github.com/karthikeyann)
  - https://github.com/jakirkham

URL: #8153
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants