Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Have file readers automatically release the Semaphore before I/O #1815

Open
revans2 opened this issue Feb 25, 2021 · 1 comment
Open
Labels
performance A performance related task/issue

Comments

@revans2
Copy link
Collaborator

revans2 commented Feb 25, 2021

Is your feature request related to a problem? Please describe.
In practice with spillable algorithms you do not want to release the semaphore before calling hasNext or next to get another batch. This is because it allows other tasks onto the GPU that will put more batches in memory and cause more memory pressure, which in turn will cause our batches to spill. Spilling and reading in spilled data are typically blocking operations that happen with the semaphore held. The only time we really want to release the semaphore is if we know that we can overlap slow I/O (reading from a remote disk/etc) with processing on the GPU.

If all algorithms are set up properly to never hold onto a non-spillable batch while calling next or hasNext, which once we have join done should be true, then we can release the semaphore when ever we want, and the only time that we would want to do this is when we are going to do I/O, specifically slow I/O like shuffle or reading input data.

I would propose that the only time we release the semaphore is for these operations, or when we are transitioning to the CPU for processing, because we don't know how long that is going to take.

@revans2 revans2 added feature request New feature or request ? - Needs Triage Need team to review and classify performance A performance related task/issue labels Feb 25, 2021
@sameerz sameerz removed ? - Needs Triage Need team to review and classify feature request New feature or request labels Mar 2, 2021
@mattahrens
Copy link
Collaborator

First step is prototyping identification of I/O condition during semaphore usage.

@mattahrens mattahrens added the P0 Must have for release label May 11, 2022
@mattahrens mattahrens removed the P0 Must have for release label May 15, 2023
@revans2 revans2 changed the title [FEA] Have shuffle/file readers automatically release the Semaphore before I/O [FEA] Have file readers automatically release the Semaphore before I/O Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance A performance related task/issue
Projects
None yet
Development

No branches or pull requests

3 participants