You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
In order to be able to support spilling GpuCoalesceBatch will end up doing a contiguousSplit on ColumnarBatches when it cannot figure out if they are contiguous or not. It does this by looking at all of the columns and if they are instances of GpuColumnVectorFromBuffer then it is contiguous, otherwise it is not.
For the UCX shuffle case it always outputs these types of buffers. For the regular spark shuffle case it does not, despite the fact that the data is contiguous. It looks like it would be a couple of lines of change inside GpuColumnarBatchSerializer to make it so it is doing the right thing, which should speed up the coalesce after a shuffle a lot.
JCudfSerialization.readTableFrom(dIn) returns a TableAndRowCountPair. Inside it is a ContiguousTable instance, but our code is pulling Table out from it instead of ContiguousTable. If we switch over to pulling out the ContiguousTable and then call GpuColumnVectorFromBuffer.from on it instead of GpuColumnVector.from it should do what we want/need. But we also need to do profiling to be sure that everything is working as expected.
The text was updated successfully, but these errors were encountered:
I'll take a look at making this change, it is trivial and helps the non-UCX shuffle case. But this issue got filed because I was nothing some odd regressions in 0.3 ... this is one of them.
Is your feature request related to a problem? Please describe.
In order to be able to support spilling GpuCoalesceBatch will end up doing a contiguousSplit on ColumnarBatches when it cannot figure out if they are contiguous or not. It does this by looking at all of the columns and if they are instances of GpuColumnVectorFromBuffer then it is contiguous, otherwise it is not.
For the UCX shuffle case it always outputs these types of buffers. For the regular spark shuffle case it does not, despite the fact that the data is contiguous. It looks like it would be a couple of lines of change inside GpuColumnarBatchSerializer to make it so it is doing the right thing, which should speed up the coalesce after a shuffle a lot.
JCudfSerialization.readTableFrom(dIn)
returns aTableAndRowCountPair
. Inside it is aContiguousTable
instance, but our code is pullingTable
out from it instead ofContiguousTable
. If we switch over to pulling out theContiguousTable
and then callGpuColumnVectorFromBuffer.from
on it instead ofGpuColumnVector.from
it should do what we want/need. But we also need to do profiling to be sure that everything is working as expected.The text was updated successfully, but these errors were encountered: