Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batch small buffers when spilling via GDS #2295

Merged
merged 18 commits into from
May 10, 2021

Conversation

rongou
Copy link
Collaborator

@rongou rongou commented Apr 29, 2021

Signed-off-by: Rong Ou rong.ou@gmail.com

When spilling via GDS, if the user specifies a large number of shuffle partitions, each shuffle buffer can be fairly small and very inefficient to write to disk individually. This PR adds a batch write buffer that collects these smaller buffers and writes them out in a single cuFileWrite call. Preliminary results show this greatly increases the write throughput and reduces the overall query time.

Signed-off-by: Rong Ou <rong.ou@gmail.com>
@rongou rongou added feature request New feature or request performance A performance related task/issue shuffle things that impact the shuffle plugin labels Apr 29, 2021
@rongou rongou requested review from jlowe and abellina April 29, 2021 04:06
@rongou rongou self-assigned this Apr 29, 2021
@sameerz sameerz added this to the Apr 26 - May 7 milestone Apr 29, 2021
Signed-off-by: Rong Ou <rong.ou@gmail.com>
Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to have at least some testing of this new functionality, for example a test that verifies that buffers below the threshold are collected into a single file, large buffers are not, and that the batched file is removed only after the last buffer that was in the batch is freed.

Signed-off-by: Rong Ou <rong.ou@gmail.com>
Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates, @rongou! All I think is left are the tests.

@rongou
Copy link
Collaborator Author

rongou commented May 8, 2021

Added tests. PTAL

@rongou rongou requested review from jlowe and abellina May 8, 2021 00:42
@rongou
Copy link
Collaborator Author

rongou commented May 8, 2021

build

Copy link
Collaborator

@abellina abellina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rongou

@rongou rongou merged commit 5d629ea into NVIDIA:branch-0.6 May 10, 2021
@rongou rongou deleted the gds-spill-small-buffers branch May 11, 2021 17:49
@sameerz sameerz mentioned this pull request May 11, 2021
11 tasks
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* batch small buffers when spilling via GDS

Signed-off-by: Rong Ou <rong.ou@gmail.com>

* address review feedback

Signed-off-by: Rong Ou <rong.ou@gmail.com>

* more review comments

Signed-off-by: Rong Ou <rong.ou@gmail.com>

* refactoring

Signed-off-by: Rong Ou <rong.ou@gmail.com>

* renaming

Signed-off-by: Rong Ou <rong.ou@gmail.com>

* reduce batch write buffer size

Signed-off-by: Rong Ou <rong.ou@gmail.com>

* remove SingleShotSpiller class

Signed-off-by: Rong Ou <rong.ou@gmail.com>

* add stream sync before reading from cufile

Signed-off-by: Rong Ou <rong.ou@gmail.com>

* add gds store tests

Signed-off-by: Rong Ou <rong.ou@gmail.com>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* batch small buffers when spilling via GDS

Signed-off-by: Rong Ou <rong.ou@gmail.com>

* address review feedback

Signed-off-by: Rong Ou <rong.ou@gmail.com>

* more review comments

Signed-off-by: Rong Ou <rong.ou@gmail.com>

* refactoring

Signed-off-by: Rong Ou <rong.ou@gmail.com>

* renaming

Signed-off-by: Rong Ou <rong.ou@gmail.com>

* reduce batch write buffer size

Signed-off-by: Rong Ou <rong.ou@gmail.com>

* remove SingleShotSpiller class

Signed-off-by: Rong Ou <rong.ou@gmail.com>

* add stream sync before reading from cufile

Signed-off-by: Rong Ou <rong.ou@gmail.com>

* add gds store tests

Signed-off-by: Rong Ou <rong.ou@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request performance A performance related task/issue shuffle things that impact the shuffle plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants