Optionally compress on a frame-by-frame basis #3586

mrocklin · 2020-03-18T01:38:54Z

Previously we used to do compression on an all-or-nothing basis for every message. This presented challenges when we bundled multiple kinds of data into the same message, such as when we had a dict that contained both a numpy array and a cupy array.

Now we explicitly walk through each frame and see if we've been asked to compress it or not.

This required some changes to internal utility functions, like frame_split_size, to have them act in a more granular way.

Supercedes #3584

Fixes #3580

cc @quasiben

Previously this converted a list of bytes-like objects into a list. Now we consume a single one and use map when dealing with lists.

We've changed the convention so that None now means "proceed as usual" rather than "don't do anything please"

jakirkham

Thanks for working this Matt! 😄

I'm a little confused on one of the points below. Maybe you can help me understand that better?

jakirkham · 2020-03-18T02:07:58Z

distributed/protocol/core.py

+                frames, head.get("compression") or [None] * len(frames)
+            ):
+                if compression is None:  # default behavior
+                    _frames = frame_split_size(frame)


I'm a little worried that this doesn't actually do what we want. Though I could be wrong.

In particular it appears when serializer is "cuda", we set each frame's compression to None. As a result frames will get split here giving us the behavior we wanted to avoid.

Feel free to correct me if I'm just missing something 🙂

Maybe the answer is just we should be setting "compression" to False in that case?

Ah indeed. I failed to push, but yes, we want to do exactly that I think

That seems reasonable. Thanks Matt! 😄

So I guess whenever we do decide to have compression for CUDA data, we should update for frame splitting and merging behavior accordingly. Does that sound right?

Yeah, I think that in that situation we might register a frame size per compression type?

max_frame_size = { "zstd": None, "blosc": 2**31, "cuda-zstd": None, }

I don't know though, I think that it'll be easier once/if that happens.

So None here would mean preserve existing frame size or something else?

Agree we don't need to worry about it now.

So None here would mean preserve existing frame size or something else?

Yeah, we only need to split frames if the compression technology requires it.

quasiben · 2020-03-18T02:17:20Z

I tested this PR against the MRE and it passed. @cjnolet can you test this PR as well ?

cjnolet · 2020-03-18T14:15:26Z

@quasiben, looking at this now.

cjnolet · 2020-03-18T14:29:54Z

This worked for me . Thanks again, guys!

quasiben · 2020-03-18T14:45:40Z

I tested the failing tests locally they pass. I think we are ok to merge. @mrocklin ?

jakirkham · 2020-03-18T21:02:49Z

Looks like there is some CI testing happening in PR ( #3587 ).

jakirkham · 2020-03-19T01:08:48Z

Thanks Matt! 😄

Previously this converted a list of bytes-like objects into a list. Now we consume a single one and use map when dealing with lists. * Handle compression on a frame-by-frame basis * Set cuda serialization to False rather than None We've changed the convention so that None now means "proceed as usual" rather than "don't do anything please"

mrocklin added 3 commits March 17, 2020 18:31

Change frame_split_size to consume a single bytes-like object

033857b

Previously this converted a list of bytes-like objects into a list. Now we consume a single one and use map when dealing with lists.

Handle compression on a frame-by-frame basis

e2b47ca

Set cuda serialization to False rather than None

2a4da33

We've changed the convention so that None now means "proceed as usual" rather than "don't do anything please"

jakirkham reviewed Mar 18, 2020

View reviewed changes

quasiben mentioned this pull request Mar 18, 2020

Don't Split Frames for UCX #3584

Closed

jakirkham approved these changes Mar 18, 2020

View reviewed changes

mrocklin merged commit 2acffc3 into dask:master Mar 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optionally compress on a frame-by-frame basis #3586

Optionally compress on a frame-by-frame basis #3586

mrocklin commented Mar 18, 2020

jakirkham left a comment

jakirkham Mar 18, 2020

jakirkham Mar 18, 2020

mrocklin Mar 18, 2020

jakirkham Mar 18, 2020

mrocklin Mar 18, 2020

jakirkham Mar 18, 2020

mrocklin Mar 18, 2020

quasiben commented Mar 18, 2020

cjnolet commented Mar 18, 2020

cjnolet commented Mar 18, 2020

quasiben commented Mar 18, 2020

jakirkham commented Mar 18, 2020

jakirkham commented Mar 19, 2020

Optionally compress on a frame-by-frame basis #3586

Optionally compress on a frame-by-frame basis #3586

Conversation

mrocklin commented Mar 18, 2020

jakirkham left a comment

Choose a reason for hiding this comment

jakirkham Mar 18, 2020

Choose a reason for hiding this comment

jakirkham Mar 18, 2020

Choose a reason for hiding this comment

mrocklin Mar 18, 2020

Choose a reason for hiding this comment

jakirkham Mar 18, 2020

Choose a reason for hiding this comment

mrocklin Mar 18, 2020

Choose a reason for hiding this comment

jakirkham Mar 18, 2020

Choose a reason for hiding this comment

mrocklin Mar 18, 2020

Choose a reason for hiding this comment

quasiben commented Mar 18, 2020

cjnolet commented Mar 18, 2020

cjnolet commented Mar 18, 2020

quasiben commented Mar 18, 2020

jakirkham commented Mar 18, 2020

jakirkham commented Mar 19, 2020