-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update BufferMeta to support multiple codec buffers per table #426
Conversation
Signed-off-by: Jason Lowe <jlowe@nvidia.com>
build |
BufferMeta.startBufferMeta(fbb) | ||
BufferMeta.addId(fbb, tableId) | ||
BufferMeta.addSize(fbb, bufferSize) | ||
BufferMeta.addUncompressedSize(fbb, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious why the uncompressed size is 0. There are no docs anywhere that explain that uncompressed size is optional and in what ways it is optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I missed those. It should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want this to match the size when the buffer is uncompressed that's an easy change to make.
It seems knowing the uncompressed size would be great so we can allocate a buffer with the right size? And also to track things like max bytes in flight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size
is always the size of the buffer, compressed or not. The shuffle transport will only ever need to look at that value, since it is never going to uncompress the data directly rather just transport it as-is.
uncompressedSize
is only useful for those who have already detected the buffer is compressed and are interested in decoding it. If code doesn't want to know or care if the buffer is compressed, size
is all they should ever need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of inflight limit, if compression is really great, the transport may want to use the uncompressed size to throttle, rather than the compressed size. Otherwise it makes the OOM cases harder to configure for (how do I tell the user to pick an in-flight size that fits all their data/jobs?)
BufferMeta.startBufferMeta(fbb) | ||
BufferMeta.addId(fbb, tableId) | ||
BufferMeta.addSize(fbb, bufferSize) | ||
BufferMeta.addUncompressedSize(fbb, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want this to match the size when the buffer is uncompressed that's an easy change to make.
It seems knowing the uncompressed size would be great so we can allocate a buffer with the right size? And also to track things like max bytes in flight.
@jlowe lgtm, I really had more of a question than a change request (see above) |
Not sure that's really what we want since we'll be uncompressing not when the buffers arrive but when they are ultimately coalesced. But in case we need it, I updated this to have |
Ok, thanks for adding @jlowe, I didn't realize we were going to wield these buffers later compressed for a while, but yeah we can remove if not utilized as we go. |
build |
…#426) * Update BufferMeta to support multiple codec buffers per table Signed-off-by: Jason Lowe <jlowe@nvidia.com> * Update uncompressedSize to always have a size
…#426) * Update BufferMeta to support multiple codec buffers per table Signed-off-by: Jason Lowe <jlowe@nvidia.com> * Update uncompressedSize to always have a size
[auto-merge] bot-auto-merge-branch-22.08 to branch-22.10 [skip ci] [bot]
This updates
BufferMeta
messages so that contiguous table buffers could be compressed into more than one codec buffer. It also adds a codec ID for a trivialCOPY
codec that can be used for testing.