Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optionally compress on a frame-by-frame basis #3586

Merged
merged 3 commits into from
Mar 19, 2020

Conversation

mrocklin
Copy link
Member

Previously we used to do compression on an all-or-nothing basis for every message. This presented challenges when we bundled multiple kinds of data into the same message, such as when we had a dict that contained both a numpy array and a cupy array.

Now we explicitly walk through each frame and see if we've been asked to compress it or not.

This required some changes to internal utility functions, like frame_split_size, to have them act in a more granular way.

Supercedes #3584

Fixes #3580

cc @quasiben

Previously this converted a list of bytes-like objects into a list.
Now we consume a single one and use map when dealing with lists.
We've changed the convention so that None now means "proceed as usual"
rather than "don't do anything please"
Copy link
Member

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working this Matt! 😄

I'm a little confused on one of the points below. Maybe you can help me understand that better?

frames, head.get("compression") or [None] * len(frames)
):
if compression is None: # default behavior
_frames = frame_split_size(frame)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little worried that this doesn't actually do what we want. Though I could be wrong.

In particular it appears when serializer is "cuda", we set each frame's compression to None. As a result frames will get split here giving us the behavior we wanted to avoid.

Feel free to correct me if I'm just missing something 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the answer is just we should be setting "compression" to False in that case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah indeed. I failed to push, but yes, we want to do exactly that I think

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems reasonable. Thanks Matt! 😄

So I guess whenever we do decide to have compression for CUDA data, we should update for frame splitting and merging behavior accordingly. Does that sound right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that in that situation we might register a frame size per compression type?

max_frame_size = {
    "zstd": None,
    "blosc": 2**31,
    "cuda-zstd": None,
}

I don't know though, I think that it'll be easier once/if that happens.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So None here would mean preserve existing frame size or something else?

Agree we don't need to worry about it now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So None here would mean preserve existing frame size or something else?

Yeah, we only need to split frames if the compression technology requires it.

@quasiben
Copy link
Member

I tested this PR against the MRE and it passed. @cjnolet can you test this PR as well ?

@cjnolet
Copy link

cjnolet commented Mar 18, 2020

@quasiben, looking at this now.

@cjnolet
Copy link

cjnolet commented Mar 18, 2020

This worked for me . Thanks again, guys!

@quasiben
Copy link
Member

I tested the failing tests locally they pass. I think we are ok to merge. @mrocklin ?

@jakirkham
Copy link
Member

Looks like there is some CI testing happening in PR ( #3587 ).

@mrocklin mrocklin merged commit 2acffc3 into dask:master Mar 19, 2020
@jakirkham
Copy link
Member

Thanks Matt! 😄

fjetter pushed a commit to fjetter/distributed that referenced this pull request Mar 19, 2020
Previously this converted a list of bytes-like objects into a list.
Now we consume a single one and use map when dealing with lists.

* Handle compression on a frame-by-frame basis

* Set cuda serialization to False rather than None

We've changed the convention so that None now means "proceed as usual"
rather than "don't do anything please"
fjetter pushed a commit to fjetter/distributed that referenced this pull request Mar 19, 2020
Previously this converted a list of bytes-like objects into a list.
Now we consume a single one and use map when dealing with lists.

* Handle compression on a frame-by-frame basis

* Set cuda serialization to False rather than None

We've changed the convention so that None now means "proceed as usual"
rather than "don't do anything please"
fjetter pushed a commit to fjetter/distributed that referenced this pull request Mar 19, 2020
Previously this converted a list of bytes-like objects into a list.
Now we consume a single one and use map when dealing with lists.

* Handle compression on a frame-by-frame basis

* Set cuda serialization to False rather than None

We've changed the convention so that None now means "proceed as usual"
rather than "don't do anything please"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CuPy (De)serialization error
4 participants