You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
In practice with spillable algorithms you do not want to release the semaphore before calling hasNext or next to get another batch. This is because it allows other tasks onto the GPU that will put more batches in memory and cause more memory pressure, which in turn will cause our batches to spill. Spilling and reading in spilled data are typically blocking operations that happen with the semaphore held. The only time we really want to release the semaphore is if we know that we can overlap slow I/O (reading from a remote disk/etc) with processing on the GPU.
If all algorithms are set up properly to never hold onto a non-spillable batch while calling next or hasNext, which once we have join done should be true, then we can release the semaphore when ever we want, and the only time that we would want to do this is when we are going to do I/O, specifically slow I/O like shuffle or reading input data.
I would propose that the only time we release the semaphore is for these operations, or when we are transitioning to the CPU for processing, because we don't know how long that is going to take.
The text was updated successfully, but these errors were encountered:
revans2
changed the title
[FEA] Have shuffle/file readers automatically release the Semaphore before I/O
[FEA] Have file readers automatically release the Semaphore before I/O
Aug 16, 2024
Is your feature request related to a problem? Please describe.
In practice with spillable algorithms you do not want to release the semaphore before calling
hasNext
ornext
to get another batch. This is because it allows other tasks onto the GPU that will put more batches in memory and cause more memory pressure, which in turn will cause our batches to spill. Spilling and reading in spilled data are typically blocking operations that happen with the semaphore held. The only time we really want to release the semaphore is if we know that we can overlap slow I/O (reading from a remote disk/etc) with processing on the GPU.If all algorithms are set up properly to never hold onto a non-spillable batch while calling
next
orhasNext
, which once we have join done should be true, then we can release the semaphore when ever we want, and the only time that we would want to do this is when we are going to do I/O, specifically slow I/O like shuffle or reading input data.I would propose that the only time we release the semaphore is for these operations, or when we are transitioning to the CPU for processing, because we don't know how long that is going to take.
The text was updated successfully, but these errors were encountered: