Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retry and SplitAndRetry support to AcceleratedColumnarToRowIterator #9088

Merged
merged 4 commits into from
Aug 24, 2023

Conversation

firestarman
Copy link
Collaborator

fixes #8348

This PR adds retry and SplitAndRetry support to AcceleratedColumnarToRowIterator.

It will retry converting columns to rows by cudf when getting any oom exception.

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman
Copy link
Collaborator Author

build

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman
Copy link
Collaborator Author

build

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman
Copy link
Collaborator Author

build

numOutputRows += cb.numRows()
if (cb.numRows() > 0) {
numOutputRows += scb.numRows()
if (scb.numRows() > 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry one more thing. If scb.numRows() <= 0 we probably should close it. Just to be on the safe side.

Copy link
Collaborator Author

@firestarman firestarman Aug 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. I will do it in a follow-up PR soon.

@firestarman firestarman merged commit daedfe5 into NVIDIA:branch-23.10 Aug 24, 2023
26 of 27 checks passed
@firestarman firestarman deleted the retry-acol2row branch August 24, 2023 01:29
@sameerz sameerz added the reliability Features to improve reliability or bugs that severly impact the reliability of the plugin label Aug 28, 2023
firestarman added a commit that referenced this pull request Aug 31, 2023
…#9102)

This PR adds in retry support for more operations in GpuOutOfCoreSortIterator, including computing the split offset and bringing the data back to GPU to remove the projected columns.

Besides, to keep being eager to close the input batches in the mergeSortAndClose function (introduced by #6931), instead of retrying the call to the whole mergeSortAndClose function, we retry the operations inside it, including bringing the data back to GPU, concatenating tables, sort the concatenated table and merging the input tables.

It also covers a small followup change in GpuColumnToRowExec for PR #9088.
---------

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reliability Features to improve reliability or bugs that severly impact the reliability of the plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Add retry and SplitAndRetry support to AcceleratedColumnarToRowIterator
3 participants