Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] triple buffering/pipelineing for SQL #11343

Open
7 tasks
revans2 opened this issue Aug 16, 2024 · 1 comment
Open
7 tasks

[FEA] triple buffering/pipelineing for SQL #11343

revans2 opened this issue Aug 16, 2024 · 1 comment
Labels
epic Issue that encompasses a significant feature or body of work feature request New feature or request performance A performance related task/issue

Comments

@revans2
Copy link
Collaborator

revans2 commented Aug 16, 2024

Is your feature request related to a problem? Please describe.
The "happy path" for GPU SQL processing is to have one batch of input to a task and after the computation is done we get one batch of output. This way the GPU semaphore can let a task onto the GPU. It computes everything for that task in one go. Then copies the result back to the CPU releases the semaphore, with nothing left in GPU memory, and then writes out the result.

But the real world is messy and very few paths are the "happy path". To be able to deal with these cases we require that when an operator calls next, hasNext, or returns a result from an iterator that all of the GPU memory it is referencing is spillable. This is because at any point in time the GPU Semaphore might be released. This makes it so that we can continue to run, but it also can result in a lot of spilling. There are lots of operators that hold onto memory between batches because they don't have a way to recompute it, or it is expensive to recompute. This is especially a problem when we release the semaphore on the chance that we might do I/O. This results in other tasks being let onto the GPU, and those tasks will increase the memory pressure on the GPU resulting in more spilling.

Currently releasing the semaphore is up to the operator when it calls next or hasNext. This can result in a lot of problems and inconsistent behavior. We can end up doing I/O with the semaphore held. We can end up releasing the semaphore and holding onto a lot of GPU memory just to find out that there is nothing more for us to process.

In an ideal world we want.

  • The GPU Semaphore is released when blocking I/O is required to complete an operation or the task is done using the GPU.
  • I/O is done in the background as much as possible to avoid releasing the semaphore.
  • I/O and GPU Computation have flow control built in so that we can
    • keep I/O as busy as possible
    • keep the GPU as busy as possible
    • Not use too much GPU memory
  • Consistent priority on all computation, spilling, and I/O to reduce context switching of processing on the GPU and memory pressure on the GPU.
  • No dead/live locks

This feels like a lot, but I think we can do it with a few changes at a time.

@revans2 revans2 added feature request New feature or request ? - Needs Triage Need team to review and classify performance A performance related task/issue epic Issue that encompasses a significant feature or body of work labels Aug 16, 2024
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Sep 5, 2024
@revans2
Copy link
Collaborator Author

revans2 commented Oct 9, 2024

We probably also want to come up with a set of standardized benchmarks to cover this use case as NDS does not cover it well.

#11376 (comment)

is a comment I made about them, but I will file a formal issue to create them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic Issue that encompasses a significant feature or body of work feature request New feature or request performance A performance related task/issue
Projects
None yet
Development

No branches or pull requests

2 participants