[FEA] Have GpuColumnarBatchSerializer return GpuColumnVectorFromBuffer instances #849

revans2 · 2020-09-24T19:29:46Z

Is your feature request related to a problem? Please describe.
In order to be able to support spilling GpuCoalesceBatch will end up doing a contiguousSplit on ColumnarBatches when it cannot figure out if they are contiguous or not. It does this by looking at all of the columns and if they are instances of GpuColumnVectorFromBuffer then it is contiguous, otherwise it is not.

For the UCX shuffle case it always outputs these types of buffers. For the regular spark shuffle case it does not, despite the fact that the data is contiguous. It looks like it would be a couple of lines of change inside GpuColumnarBatchSerializer to make it so it is doing the right thing, which should speed up the coalesce after a shuffle a lot.

JCudfSerialization.readTableFrom(dIn) returns a TableAndRowCountPair. Inside it is a ContiguousTable instance, but our code is pulling Table out from it instead of ContiguousTable. If we switch over to pulling out the ContiguousTable and then call GpuColumnVectorFromBuffer.from on it instead of GpuColumnVector.from it should do what we want/need. But we also need to do profiling to be sure that everything is working as expected.

The text was updated successfully, but these errors were encountered:

abellina · 2020-09-24T19:58:07Z

I'll take a look at making this change, it is trivial and helps the non-UCX shuffle case. But this issue got filed because I was nothing some odd regressions in 0.3 ... this is one of them.

revans2 added feature request New feature or request ? - Needs Triage Need team to review and classify performance A performance related task/issue labels Sep 24, 2020

abellina self-assigned this Sep 24, 2020

jlowe assigned jlowe and unassigned abellina Sep 24, 2020

jlowe removed the ? - Needs Triage Need team to review and classify label Sep 24, 2020

jlowe added the P1 Nice to have for release label Sep 24, 2020

jlowe mentioned this issue Sep 24, 2020

Use contiguous table when deserializing columnar batch #851

Merged

revans2 closed this as completed in #851 Sep 25, 2020

sameerz removed the feature request New feature or request label Dec 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Have GpuColumnarBatchSerializer return GpuColumnVectorFromBuffer instances #849

[FEA] Have GpuColumnarBatchSerializer return GpuColumnVectorFromBuffer instances #849

revans2 commented Sep 24, 2020

abellina commented Sep 24, 2020

[FEA] Have GpuColumnarBatchSerializer return GpuColumnVectorFromBuffer instances #849

[FEA] Have GpuColumnarBatchSerializer return GpuColumnVectorFromBuffer instances #849

Comments

revans2 commented Sep 24, 2020

abellina commented Sep 24, 2020