Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Function to "pack" a table into a single buffer #3793

Closed
jrhemstad opened this issue Jan 15, 2020 · 11 comments
Closed

[FEA] Function to "pack" a table into a single buffer #3793

jrhemstad opened this issue Jan 15, 2020 · 11 comments
Assignees
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.

Comments

@jrhemstad
Copy link
Contributor

jrhemstad commented Jan 15, 2020

Is your feature request related to a problem? Please describe.

I would like to be able to create a copy of a table_view where the storage for the constituent column is "packed" into a single allocation.

Describe the solution you'd like

An API something like this:

std::pair<rmm::device_buffer, table_view> pack(table_view const& t);

Where the returned device_buffer contains a copy of the contents of t in a single allocation and the returned table_view points into that buffer.

Additional context

This functionality is effectively already developed for contiguous_split in the alloc_and_copy function. The ask is just to wrap and expose this as a public API.

@jrhemstad jrhemstad added the feature request New feature or request label Jan 15, 2020
@felipeblazing
Copy link

We would love to have this. Shouldn't we have somethin that is alse like

std::unique_ptr<table> unpack(const rmm::device_buffer & buffer, unsigned long long size); 

@jrhemstad
Copy link
Contributor Author

jrhemstad commented Jan 15, 2020

We would love to have this. Shouldn't we have somethin that is alse like

std::unique_ptr<table> unpack(const rmm::device_buffer & buffer, unsigned long long size); 

I think it'd be better to return a table_view that points into the returned device_buffer. I updated the issue.

Then if you want to create a non-packed table, from a packed one, you could do this:

table t;

[buffer, view] = pack(t);

table t(view); // copy ctor table from view will create a new `table` that copies from the `view`

@harrism harrism added the libcudf Affects libcudf (C++/CUDA) code. label Feb 10, 2020
@jakirkham
Copy link
Member

This would also be useful from the UCX side as we could serialize a larger contiguous piece of memory and then unpack it later so that individual columns could be returned to RMM when no longer needed.

cc @quasiben

@jrhemstad
Copy link
Contributor Author

@jakirkham there was some offline conversation with @jlowe @kkraus14 @nvdbaranec about how we'd serialize a "packed" table for communication.

The gist of it is, when you "pack" a table, you get a device_buffer and a table_view that points into that device_buffer. If you were to simply communicate the device_buffer and table_view as-is, the pointers in the table_view on the receiving side wouldn't be correct because they'd refer to memory locations on the sending side.

It's pretty easy to update the pointers on the receiving side if you know what the base pointers were on the sending side. It's just a bunch of pointer offset arithmetic.

We're trying to figure out a common set of serialize/deserialize primitives for libcudf that both Dask and Spark can use. Could you weigh in and help to define that the serialized layout should look like?

@quasiben
Copy link
Member

quasiben commented Mar 4, 2020

@madsbk do you have opinions here as well on things ucx-py could be/should be doing ?

@madsbk
Copy link
Member

madsbk commented Mar 4, 2020

@madsbk do you have opinions here as well on things ucx-py could be/should be doing ?

I don't think that ucx-py should support any serialization directly.
But it would be great if cuDF could serialize/deserialize multiple columns to/from a single device_buffer.

@kkraus14
Copy link
Collaborator

kkraus14 commented Mar 4, 2020

@jrhemstad I don't think we have any special requirements for the serialized layout other than using the typical standard containers of things like rmm::device_buffer, and std::vector so that we can easily send them over the wire without having to play games.

From our perspective we don't have a lot of opinions as our path would look something like:
Python Table --> cudf::table_view --> cudf::pack --> (rmm::device_buffer, std::vector) --> UCX Send(s) --> UCX Receive(s) --> (rmm::device_buffer, std::vector) --> cudf::unpack --> cudf::table_view --> Python Table

@jlowe
Copy link
Member

jlowe commented Mar 4, 2020

I don't think we have any special requirements for the serialized layout other than using the typical standard containers of things

+1 from the Spark plugin perspective. We should be able to work with just the device buffer representing the contiguous GPU data and an opaque blob of host bytes representing the serialized table_view.

@jakirkham
Copy link
Member

@madsbk do you have opinions here as well on things ucx-py could be/should be doing ?

I don't think that ucx-py should support any serialization directly.
But it would be great if cuDF could serialize/deserialize multiple columns to/from a single device_buffer.

Yeah I was thinking we might do this in Distributed before it got to UCX-Py. More-or-less the same as PR ( dask/distributed#3453 ).

@jakirkham
Copy link
Member

FWIW I think from the Dask side w.r.t. cuDF we would be well served for most cases by having something we could plug into cuDF "cuda" serialization. We could then reuse this for transmission with TCP or UCX and spilling since they all go through that code path now.

There are some other objects we would like similar functionality (like CuPy sparse arrays). Though I think other Numba and CuPy objects are already working with contiguous memory chunks to the extent possible.

@jrhemstad jrhemstad self-assigned this Apr 16, 2020
rapids-bot bot pushed a commit that referenced this issue Feb 4, 2021
…format. (#7096)

Addresses #3793

Depends on  #6864   (This affects contiguous_split.cu.  For the purposes of this PR, the only changes that are relevant are those that involve the generation of metadata)

- `pack()` performs a `contiguous_split()` on the incoming table to arrange the memory into a unified device buffer, and generates a host-side metadata buffer.   These are returned in the `packed_columns` struct.

- unpack() takes the data stored in the `packed_columns` struct and returns a deserialized `table_view` that points into it.

The intent of this functionality is as follows (pseudocode)

```
// serialize-side
table_view t;
packed_columns p = pack(t);
send_over_network(p.gpu_data);
send_over_network(p.metadata);

// deserialize-side
packed_columns p = receive_from_network();
table_view t = unpack(p);
```

This PR also renames `contiguous_split_result` to `packed_table` (which is just a bundled `table_view` and `packed_column`)

Authors:
  - @nvdbaranec

Approvers:
  - Jake Hemstad (@jrhemstad)
  - Paul Taylor (@trxcllnt)
  - Mike Wilson (@hyperbolic2346)

URL: #7096
@jrhemstad
Copy link
Contributor Author

This was closed by #7096

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants