-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add host memory retries for GeneratedInternalRowToCudfRowIterator #9929
Conversation
…CudfRowIterator Signed-off-by: Jim Brennan <jimb@nvidia.com>
build |
1 similar comment
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
...src/test/scala/com/nvidia/spark/rapids/GeneratedInternalRowToCudfRowIteratorRetrySuite.scala
Outdated
Show resolved
Hide resolved
HostMemoryBuffer ob = | ||
RmmRapidsRetryIterator.withRetryNoSplit( () -> { | ||
return HostAlloc$.MODULE$.alloc( | ||
((long) numRowsEstimate + 1) * BYTES_PER_OFFSET, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit,
((long) numRowsEstimate + 1) * BYTES_PER_OFFSET, true); | |
((long) numRowsEstimate + 1) * BYTES_PER_OFFSET, /*preferPinned*/ true); |
batchWithColumnToConvert = makeSpillableBatch(devColumn); | ||
HostMemoryBuffer db = | ||
RmmRapidsRetryIterator.withRetryNoSplit( () -> { | ||
return HostAlloc$.MODULE$.alloc(dataLength, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit,
return HostAlloc$.MODULE$.alloc(dataLength, true); | |
return HostAlloc$.MODULE$.alloc(dataLength, /*preferPinned*/ true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
build |
I have been testing this locally, running nds query75 with the As a comparison, when I run without these changes, but I change the So it doesn't seem like the |
This indicates that we are still using a lot of non-spillable host memory. So we need to track down who is using the memory and not making it spillable. I'll see what I can come up with to be able to help debug this. In the short term I am fine with merging this in as it is not worse than what we have today. |
Thanks @revans2. I will put this in, but will continue to test it as I work on splitting the inputs. |
This PR adds host memory allocation retries for GeneratedInternalRowToCudfRowIterator.
This initial version only handles retries, it does not handle splitting the input data into batches.
It is the first of several PRs that will be used to address #8887.
I have run the existing unit and integration tests, and I verified no performance impact for nds on spark2a, although I do not think nds is exercising this code.
I am putting this up as a draft because it includes a change that is also in #9908, and to do additional testing.