[FEA] triple buffering/pipelineing for SQL #11343

revans2 · 2024-08-16T16:06:52Z

Is your feature request related to a problem? Please describe.
The "happy path" for GPU SQL processing is to have one batch of input to a task and after the computation is done we get one batch of output. This way the GPU semaphore can let a task onto the GPU. It computes everything for that task in one go. Then copies the result back to the CPU releases the semaphore, with nothing left in GPU memory, and then writes out the result.

But the real world is messy and very few paths are the "happy path". To be able to deal with these cases we require that when an operator calls next, hasNext, or returns a result from an iterator that all of the GPU memory it is referencing is spillable. This is because at any point in time the GPU Semaphore might be released. This makes it so that we can continue to run, but it also can result in a lot of spilling. There are lots of operators that hold onto memory between batches because they don't have a way to recompute it, or it is expensive to recompute. This is especially a problem when we release the semaphore on the chance that we might do I/O. This results in other tasks being let onto the GPU, and those tasks will increase the memory pressure on the GPU resulting in more spilling.

Currently releasing the semaphore is up to the operator when it calls next or hasNext. This can result in a lot of problems and inconsistent behavior. We can end up doing I/O with the semaphore held. We can end up releasing the semaphore and holding onto a lot of GPU memory just to find out that there is nothing more for us to process.

In an ideal world we want.

The GPU Semaphore is released when blocking I/O is required to complete an operation or the task is done using the GPU.
I/O is done in the background as much as possible to avoid releasing the semaphore.
I/O and GPU Computation have flow control built in so that we can
- keep I/O as busy as possible
- keep the GPU as busy as possible
- Not use too much GPU memory
Consistent priority on all computation, spilling, and I/O to reduce context switching of processing on the GPU and memory pressure on the GPU.
No dead/live locks

This feels like a lot, but I think we can do it with a few changes at a time.

[FEA] Create benchmarks for multi-batch processing cases #11575 - Benchmarks so we can measure our progress towards fixing this issue.
[FEA] semaphore prioritization #8301 - This reduces the memory pressure in cases where we are computation bound on the GPU today, but end up releasing the semaphore to do really fast I/O
[FEA] Write shuffle data in a background thread with flow control #11341 - This is to help put shuffle writes in a background thread so that we can overlap them with computation.
[FEA] shuffle reads with flow control in a background thread #11344 - This is to help with the shuffle reads so that we can see if an end to end solution is really going to be great.
[FEA] Put file writes in a background thread #11342 - This is the same as the shuffle write, but for putting file writes in a background thread to try and reduce the number of times we release the semaphore. This assumes that the experiments worked out well.
[FEA] Have file readers automatically release the Semaphore before I/O #1815 - If the shuffle results look good, then we want to try and do the same kind of thing for file reads. The first round would just be to use the existing code and release the semaphore when a blocking I/O would need to take place.
[FEA] file reads in a background thread with flow control. #11345 - is to do the hard work of getting the file readers to play nicely with the read coordinator and also in some cases read data ahead and buffer it.

The text was updated successfully, but these errors were encountered:

revans2 · 2024-10-09T13:37:24Z

We probably also want to come up with a set of standardized benchmarks to cover this use case as NDS does not cover it well.

#11376 (comment)

is a comment I made about them, but I will file a formal issue to create them.

revans2 added feature request New feature or request ? - Needs Triage Need team to review and classify performance A performance related task/issue epic Issue that encompasses a significant feature or body of work labels Aug 16, 2024

mattahrens removed the ? - Needs Triage Need team to review and classify label Sep 5, 2024

revans2 mentioned this issue Oct 9, 2024

[FEA] Create benchmarks for multi-batch processing cases #11575

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] triple buffering/pipelineing for SQL #11343

[FEA] triple buffering/pipelineing for SQL #11343

revans2 commented Aug 16, 2024 •

edited

Loading

revans2 commented Oct 9, 2024

[FEA] triple buffering/pipelineing for SQL #11343

[FEA] triple buffering/pipelineing for SQL #11343

Comments

revans2 commented Aug 16, 2024 • edited Loading

revans2 commented Oct 9, 2024

revans2 commented Aug 16, 2024 •

edited

Loading