Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update BufferMeta to support multiple codec buffers per table #426

Merged
merged 2 commits into from
Jul 28, 2020

Conversation

jlowe
Copy link
Member

@jlowe jlowe commented Jul 24, 2020

This updates BufferMeta messages so that contiguous table buffers could be compressed into more than one codec buffer. It also adds a codec ID for a trivial COPY codec that can be used for testing.

Signed-off-by: Jason Lowe <jlowe@nvidia.com>
@jlowe jlowe added this to the Jul 20 - Jul 31 milestone Jul 24, 2020
@jlowe jlowe requested a review from abellina July 24, 2020 20:48
@jlowe jlowe self-assigned this Jul 24, 2020
@jlowe
Copy link
Member Author

jlowe commented Jul 24, 2020

build

BufferMeta.startBufferMeta(fbb)
BufferMeta.addId(fbb, tableId)
BufferMeta.addSize(fbb, bufferSize)
BufferMeta.addUncompressedSize(fbb, 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious why the uncompressed size is 0. There are no docs anywhere that explain that uncompressed size is optional and in what ways it is optional.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See here and here.

Originally I was planning on using an uncompressedSize of 0 to indicate the data was not compressed with a codec, but I changed it to checking codecBufferDescrsLength > 0 instead. If we want this to match the size when the buffer is uncompressed that's an easy change to make.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed those. It should be fine.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want this to match the size when the buffer is uncompressed that's an easy change to make.

It seems knowing the uncompressed size would be great so we can allocate a buffer with the right size? And also to track things like max bytes in flight.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size is always the size of the buffer, compressed or not. The shuffle transport will only ever need to look at that value, since it is never going to uncompress the data directly rather just transport it as-is.

uncompressedSize is only useful for those who have already detected the buffer is compressed and are interested in decoding it. If code doesn't want to know or care if the buffer is compressed, size is all they should ever need.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of inflight limit, if compression is really great, the transport may want to use the uncompressed size to throttle, rather than the compressed size. Otherwise it makes the OOM cases harder to configure for (how do I tell the user to pick an in-flight size that fits all their data/jobs?)

revans2
revans2 previously approved these changes Jul 27, 2020
@sameerz sameerz added the feature request New feature or request label Jul 27, 2020
BufferMeta.startBufferMeta(fbb)
BufferMeta.addId(fbb, tableId)
BufferMeta.addSize(fbb, bufferSize)
BufferMeta.addUncompressedSize(fbb, 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want this to match the size when the buffer is uncompressed that's an easy change to make.

It seems knowing the uncompressed size would be great so we can allocate a buffer with the right size? And also to track things like max bytes in flight.

@abellina
Copy link
Collaborator

@jlowe lgtm, I really had more of a question than a change request (see above)

@jlowe
Copy link
Member Author

jlowe commented Jul 28, 2020

if compression is really great, the transport may want to use the uncompressed size to throttle, rather than the compressed size

Not sure that's really what we want since we'll be uncompressing not when the buffers arrive but when they are ultimately coalesced. But in case we need it, I updated this to have uncompressedSize always contain a size, even when it's already an uncompressed buffer.

@abellina
Copy link
Collaborator

Ok, thanks for adding @jlowe, I didn't realize we were going to wield these buffers later compressed for a while, but yeah we can remove if not utilized as we go.

@jlowe
Copy link
Member Author

jlowe commented Jul 28, 2020

build

@jlowe jlowe merged commit 2608687 into NVIDIA:branch-0.2 Jul 28, 2020
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
…#426)

* Update BufferMeta to support multiple codec buffers per table

Signed-off-by: Jason Lowe <jlowe@nvidia.com>

* Update uncompressedSize to always have a size
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
…#426)

* Update BufferMeta to support multiple codec buffers per table

Signed-off-by: Jason Lowe <jlowe@nvidia.com>

* Update uncompressedSize to always have a size
@jlowe jlowe deleted the update-buffermeta branch September 10, 2021 15:30
pxLi pushed a commit to pxLi/spark-rapids that referenced this pull request May 12, 2022
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Nov 30, 2023
[auto-merge] bot-auto-merge-branch-22.08 to branch-22.10 [skip ci] [bot]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants