[WIP] Pack/unpack cuDF frames during serialization #4803

jakirkham · 2020-04-03T23:51:49Z

Related to issue ( #3793 ), issue ( https://github.com/rapidsai/rmm/issues/318 ), issue ( rapidsai/dask-cuda#250 )

To cutdown on the number of frames that need to be transmitted over the wire or transferred to/from host, pack all Buffers into one DeviceBuffer using CuPy to facilitate. In deserialization, use CuPy to unpack them into a series of separate DeviceBuffer allocations.

Since cuDF already makes sure CuPy uses RMM for allocations, we need not worry about this when creating ndarrays. However it is worth noting we pay for a copy and a larger allocation both during serialization and deserialization. It would be nice to avoid that, but it probably requires some C++/CUDA code to do. So have held off on it for now.

cc @quasiben

GPUtester · 2020-04-03T23:52:09Z

Can one of the admins verify this patch?

Dask typically does this. So may overwrite the value we have here. However just make sure we have set this value correctly. It makes it a little easier to play with the serialization and deserialization functions here. Plus if Dask doesn't set these for some reason, we have handled it ourselves.

In the `"cuda"` case, we really want to make sure Dask is not trying to compress our data. This shouldn't be happening anyways. However this provides some protection against it. In the `"dask"` case, allow compression since this is data on host.

To cutdown on the number of frames that need to be transmitted over the wire or transferred to/from host, pack all `Buffer`s into one `DeviceBuffer` using CuPy to facilitate. In deserialization, use CuPy to unpack them into a series of separate `DeviceBuffer` allocations. Since cuDF already makes sure CuPy uses RMM for allocations, we need not worry about this when creating `ndarray`s. However it is worth noting we pay for a copy and a larger allocation both during serialization and deserialization. It would be nice to avoid that, but it probably requires some C++/CUDA code to do. So have held off on it for now.

jakirkham · 2020-04-04T01:32:56Z

Looks like Dask is doing something funky. The gist is it appears to be trying to call merge_frames on DeviceBuffers, which shouldn't happen (as that should only be used on host memory after decompressing the data). Still investigating why that happens.

codecov · 2020-04-04T03:47:35Z

Codecov Report

Merging #4803 into branch-0.14 will decrease coverage by 0.08%.
The diff coverage is 0.00%.

@@               Coverage Diff               @@
##           branch-0.14    #4803      +/-   ##
===============================================
- Coverage        88.42%   88.34%   -0.09%     
===============================================
  Files               51       51              
  Lines             9734     9743       +9     
===============================================
  Hits              8607     8607              
- Misses            1127     1136       +9

Impacted Files	Coverage Δ
python/cudf/cudf/comm/serialize.py	`0.00% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f5b7ed6...9370bc1. Read the comment docs.

jakirkham · 2020-05-09T03:23:57Z

Closing as this is more thoroughly addressed by PR ( #5025 ).

jakirkham added 2 - In Progress Currently a work in progress Python Affects Python cuDF API. dask Dask issue labels Apr 3, 2020

jakirkham force-pushed the agg_frames branch 4 times, most recently from f35dbc5 to 2b40691 Compare April 4, 2020 00:11

Space out other serialization work

5d9b752

jakirkham force-pushed the agg_frames branch from 2b40691 to 01bdd13 Compare April 4, 2020 00:14

jakirkham force-pushed the agg_frames branch from 01bdd13 to 75626ad Compare April 4, 2020 00:37

jakirkham added 3 commits April 3, 2020 17:52

Specify whether frames can be compressed

60be364

In the `"cuda"` case, we really want to make sure Dask is not trying to compress our data. This shouldn't be happening anyways. However this provides some protection against it. In the `"dask"` case, allow compression since this is data on host.

jakirkham force-pushed the agg_frames branch from 75626ad to 9370bc1 Compare April 4, 2020 01:13

jakirkham closed this May 9, 2020

jakirkham deleted the agg_frames branch May 9, 2020 03:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Pack/unpack cuDF frames during serialization #4803

[WIP] Pack/unpack cuDF frames during serialization #4803

jakirkham commented Apr 3, 2020 •

edited

Loading

GPUtester commented Apr 3, 2020

jakirkham commented Apr 4, 2020

codecov bot commented Apr 4, 2020 •

edited

Loading

jakirkham commented May 9, 2020

[WIP] Pack/unpack cuDF frames during serialization #4803

[WIP] Pack/unpack cuDF frames during serialization #4803

Conversation

jakirkham commented Apr 3, 2020 • edited Loading

GPUtester commented Apr 3, 2020

jakirkham commented Apr 4, 2020

codecov bot commented Apr 4, 2020 • edited Loading

Codecov Report

jakirkham commented May 9, 2020

jakirkham commented Apr 3, 2020 •

edited

Loading

codecov bot commented Apr 4, 2020 •

edited

Loading