Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Have GpuColumnarBatchSerializer return GpuColumnVectorFromBuffer instances #849

Closed
revans2 opened this issue Sep 24, 2020 · 1 comment · Fixed by #851
Closed

[FEA] Have GpuColumnarBatchSerializer return GpuColumnVectorFromBuffer instances #849

revans2 opened this issue Sep 24, 2020 · 1 comment · Fixed by #851
Assignees
Labels
P1 Nice to have for release performance A performance related task/issue

Comments

@revans2
Copy link
Collaborator

revans2 commented Sep 24, 2020

Is your feature request related to a problem? Please describe.
In order to be able to support spilling GpuCoalesceBatch will end up doing a contiguousSplit on ColumnarBatches when it cannot figure out if they are contiguous or not. It does this by looking at all of the columns and if they are instances of GpuColumnVectorFromBuffer then it is contiguous, otherwise it is not.

For the UCX shuffle case it always outputs these types of buffers. For the regular spark shuffle case it does not, despite the fact that the data is contiguous. It looks like it would be a couple of lines of change inside GpuColumnarBatchSerializer to make it so it is doing the right thing, which should speed up the coalesce after a shuffle a lot.

JCudfSerialization.readTableFrom(dIn) returns a TableAndRowCountPair. Inside it is a ContiguousTable instance, but our code is pulling Table out from it instead of ContiguousTable. If we switch over to pulling out the ContiguousTable and then call GpuColumnVectorFromBuffer.from on it instead of GpuColumnVector.from it should do what we want/need. But we also need to do profiling to be sure that everything is working as expected.

@revans2 revans2 added feature request New feature or request ? - Needs Triage Need team to review and classify performance A performance related task/issue labels Sep 24, 2020
@abellina abellina self-assigned this Sep 24, 2020
@abellina
Copy link
Collaborator

I'll take a look at making this change, it is trivial and helps the non-UCX shuffle case. But this issue got filed because I was nothing some odd regressions in 0.3 ... this is one of them.

@jlowe jlowe assigned jlowe and unassigned abellina Sep 24, 2020
@jlowe jlowe removed the ? - Needs Triage Need team to review and classify label Sep 24, 2020
@jlowe jlowe added the P1 Nice to have for release label Sep 24, 2020
@sameerz sameerz removed the feature request New feature or request label Dec 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 Nice to have for release performance A performance related task/issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants