Update BufferMeta to support multiple codec buffers per table #426

jlowe · 2020-07-24T20:48:30Z

This updates BufferMeta messages so that contiguous table buffers could be compressed into more than one codec buffer. It also adds a codec ID for a trivial COPY codec that can be used for testing.

Signed-off-by: Jason Lowe <jlowe@nvidia.com>

jlowe · 2020-07-24T20:48:41Z

build

revans2 · 2020-07-27T16:04:56Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/MetaUtils.scala

+    BufferMeta.startBufferMeta(fbb)
+    BufferMeta.addId(fbb, tableId)
+    BufferMeta.addSize(fbb, bufferSize)
+    BufferMeta.addUncompressedSize(fbb, 0)


Just curious why the uncompressed size is 0. There are no docs anywhere that explain that uncompressed size is optional and in what ways it is optional.

See here and here.

Originally I was planning on using an uncompressedSize of 0 to indicate the data was not compressed with a codec, but I changed it to checking codecBufferDescrsLength > 0 instead. If we want this to match the size when the buffer is uncompressed that's an easy change to make.

Sorry I missed those. It should be fine.

If we want this to match the size when the buffer is uncompressed that's an easy change to make.

It seems knowing the uncompressed size would be great so we can allocate a buffer with the right size? And also to track things like max bytes in flight.

size is always the size of the buffer, compressed or not. The shuffle transport will only ever need to look at that value, since it is never going to uncompress the data directly rather just transport it as-is.

uncompressedSize is only useful for those who have already detected the buffer is compressed and are interested in decoding it. If code doesn't want to know or care if the buffer is compressed, size is all they should ever need.

In the case of inflight limit, if compression is really great, the transport may want to use the uncompressed size to throttle, rather than the compressed size. Otherwise it makes the OOM cases harder to configure for (how do I tell the user to pick an in-flight size that fits all their data/jobs?)

abellina · 2020-07-27T19:43:42Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/MetaUtils.scala

+    BufferMeta.startBufferMeta(fbb)
+    BufferMeta.addId(fbb, tableId)
+    BufferMeta.addSize(fbb, bufferSize)
+    BufferMeta.addUncompressedSize(fbb, 0)


If we want this to match the size when the buffer is uncompressed that's an easy change to make.

It seems knowing the uncompressed size would be great so we can allocate a buffer with the right size? And also to track things like max bytes in flight.

abellina · 2020-07-28T13:33:04Z

@jlowe lgtm, I really had more of a question than a change request (see above)

jlowe · 2020-07-28T15:45:36Z

if compression is really great, the transport may want to use the uncompressed size to throttle, rather than the compressed size

Not sure that's really what we want since we'll be uncompressing not when the buffers arrive but when they are ultimately coalesced. But in case we need it, I updated this to have uncompressedSize always contain a size, even when it's already an uncompressed buffer.

abellina · 2020-07-28T15:54:19Z

Ok, thanks for adding @jlowe, I didn't realize we were going to wield these buffers later compressed for a while, but yeah we can remove if not utilized as we go.

jlowe · 2020-07-28T16:18:34Z

build

…#426) * Update BufferMeta to support multiple codec buffers per table Signed-off-by: Jason Lowe <jlowe@nvidia.com> * Update uncompressedSize to always have a size

[auto-merge] bot-auto-merge-branch-22.08 to branch-22.10 [skip ci] [bot]

Update BufferMeta to support multiple codec buffers per table

10371d4

Signed-off-by: Jason Lowe <jlowe@nvidia.com>

jlowe added this to the Jul 20 - Jul 31 milestone Jul 24, 2020

jlowe requested a review from abellina July 24, 2020 20:48

jlowe self-assigned this Jul 24, 2020

revans2 reviewed Jul 27, 2020

View reviewed changes

revans2 previously approved these changes Jul 27, 2020

View reviewed changes

sameerz added the feature request New feature or request label Jul 27, 2020

abellina requested changes Jul 28, 2020

View reviewed changes

Update uncompressedSize to always have a size

e6cec38

jlowe dismissed revans2’s stale review via e6cec38 July 28, 2020 15:39

abellina approved these changes Jul 28, 2020

View reviewed changes

jlowe merged commit 2608687 into NVIDIA:branch-0.2 Jul 28, 2020

jlowe deleted the update-buffermeta branch September 10, 2021 15:30

pxLi pushed a commit to pxLi/spark-rapids that referenced this pull request May 12, 2022

Add default env var PYTHONPATH if it is not set (NVIDIA#426)

c023cc9

tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Nov 30, 2023

Merge pull request NVIDIA#426 from NVIDIA/bot-auto-merge-branch-22.08

224efbe

[auto-merge] bot-auto-merge-branch-22.08 to branch-22.10 [skip ci] [bot]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update BufferMeta to support multiple codec buffers per table #426

Update BufferMeta to support multiple codec buffers per table #426

jlowe commented Jul 24, 2020

jlowe commented Jul 24, 2020

revans2 Jul 27, 2020

jlowe Jul 27, 2020

revans2 Jul 27, 2020

abellina Jul 27, 2020

jlowe Jul 28, 2020

abellina Jul 28, 2020

abellina Jul 27, 2020

abellina commented Jul 28, 2020

jlowe commented Jul 28, 2020

abellina commented Jul 28, 2020

jlowe commented Jul 28, 2020

Update BufferMeta to support multiple codec buffers per table #426

Update BufferMeta to support multiple codec buffers per table #426

Conversation

jlowe commented Jul 24, 2020

jlowe commented Jul 24, 2020

revans2 Jul 27, 2020

Choose a reason for hiding this comment

jlowe Jul 27, 2020

Choose a reason for hiding this comment

revans2 Jul 27, 2020

Choose a reason for hiding this comment

abellina Jul 27, 2020

Choose a reason for hiding this comment

jlowe Jul 28, 2020

Choose a reason for hiding this comment

abellina Jul 28, 2020

Choose a reason for hiding this comment

abellina Jul 27, 2020

Choose a reason for hiding this comment

abellina commented Jul 28, 2020

jlowe commented Jul 28, 2020

abellina commented Jul 28, 2020

jlowe commented Jul 28, 2020