[FEA] Have file readers automatically release the Semaphore before I/O #1815

revans2 · 2021-02-25T13:04:30Z

Is your feature request related to a problem? Please describe.
In practice with spillable algorithms you do not want to release the semaphore before calling hasNext or next to get another batch. This is because it allows other tasks onto the GPU that will put more batches in memory and cause more memory pressure, which in turn will cause our batches to spill. Spilling and reading in spilled data are typically blocking operations that happen with the semaphore held. The only time we really want to release the semaphore is if we know that we can overlap slow I/O (reading from a remote disk/etc) with processing on the GPU.

If all algorithms are set up properly to never hold onto a non-spillable batch while calling next or hasNext, which once we have join done should be true, then we can release the semaphore when ever we want, and the only time that we would want to do this is when we are going to do I/O, specifically slow I/O like shuffle or reading input data.

I would propose that the only time we release the semaphore is for these operations, or when we are transitioning to the CPU for processing, because we don't know how long that is going to take.

The text was updated successfully, but these errors were encountered:

mattahrens · 2022-05-11T20:20:44Z

First step is prototyping identification of I/O condition during semaphore usage.

revans2 added feature request New feature or request ? - Needs Triage Need team to review and classify performance A performance related task/issue labels Feb 25, 2021

sameerz removed ? - Needs Triage Need team to review and classify feature request New feature or request labels Mar 2, 2021

mattahrens added the P0 Must have for release label May 11, 2022

mattahrens removed the P0 Must have for release label May 15, 2023

revans2 mentioned this issue Aug 16, 2024

[FEA] triple buffering/pipelineing for SQL #11343

Open

7 tasks

revans2 changed the title ~~[FEA] Have shuffle/file readers automatically release the Semaphore before I/O~~ [FEA] Have file readers automatically release the Semaphore before I/O Aug 16, 2024

revans2 mentioned this issue Aug 16, 2024

[FEA] file reads in a background thread with flow control. #11345

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Have file readers automatically release the Semaphore before I/O #1815

[FEA] Have file readers automatically release the Semaphore before I/O #1815

revans2 commented Feb 25, 2021

mattahrens commented May 11, 2022

[FEA] Have file readers automatically release the Semaphore before I/O #1815

[FEA] Have file readers automatically release the Semaphore before I/O #1815

Comments

revans2 commented Feb 25, 2021

mattahrens commented May 11, 2022