-
-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optionally compress on a frame-by-frame basis #3586
Conversation
Previously this converted a list of bytes-like objects into a list. Now we consume a single one and use map when dealing with lists.
We've changed the convention so that None now means "proceed as usual" rather than "don't do anything please"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working this Matt! 😄
I'm a little confused on one of the points below. Maybe you can help me understand that better?
frames, head.get("compression") or [None] * len(frames) | ||
): | ||
if compression is None: # default behavior | ||
_frames = frame_split_size(frame) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little worried that this doesn't actually do what we want. Though I could be wrong.
In particular it appears when serializer
is "cuda"
, we set each frame's compression to None
. As a result frames will get split here giving us the behavior we wanted to avoid.
Feel free to correct me if I'm just missing something 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the answer is just we should be setting "compression"
to False
in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah indeed. I failed to push, but yes, we want to do exactly that I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems reasonable. Thanks Matt! 😄
So I guess whenever we do decide to have compression for CUDA data, we should update for frame splitting and merging behavior accordingly. Does that sound right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think that in that situation we might register a frame size per compression type?
max_frame_size = {
"zstd": None,
"blosc": 2**31,
"cuda-zstd": None,
}
I don't know though, I think that it'll be easier once/if that happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So None
here would mean preserve existing frame size or something else?
Agree we don't need to worry about it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So None here would mean preserve existing frame size or something else?
Yeah, we only need to split frames if the compression technology requires it.
@quasiben, looking at this now. |
This worked for me . Thanks again, guys! |
I tested the failing tests locally they pass. I think we are ok to merge. @mrocklin ? |
Looks like there is some CI testing happening in PR ( #3587 ). |
Thanks Matt! 😄 |
Previously this converted a list of bytes-like objects into a list. Now we consume a single one and use map when dealing with lists. * Handle compression on a frame-by-frame basis * Set cuda serialization to False rather than None We've changed the convention so that None now means "proceed as usual" rather than "don't do anything please"
Previously this converted a list of bytes-like objects into a list. Now we consume a single one and use map when dealing with lists. * Handle compression on a frame-by-frame basis * Set cuda serialization to False rather than None We've changed the convention so that None now means "proceed as usual" rather than "don't do anything please"
Previously this converted a list of bytes-like objects into a list. Now we consume a single one and use map when dealing with lists. * Handle compression on a frame-by-frame basis * Set cuda serialization to False rather than None We've changed the convention so that None now means "proceed as usual" rather than "don't do anything please"
Previously we used to do compression on an all-or-nothing basis for every message. This presented challenges when we bundled multiple kinds of data into the same message, such as when we had a dict that contained both a numpy array and a cupy array.
Now we explicitly walk through each frame and see if we've been asked to compress it or not.
This required some changes to internal utility functions, like
frame_split_size
, to have them act in a more granular way.Supercedes #3584
Fixes #3580
cc @quasiben