-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow skipping host spill for a direct device->disk spill #9211
Allow skipping host spill for a direct device->disk spill #9211
Conversation
Signed-off-by: Alessandro Bellina <abellina@nvidia.com>
b365369
to
49659ba
Compare
sql-plugin/src/main/scala/com/nvidia/spark/rapids/AbstractHostByteBufferIterator.scala
Outdated
Show resolved
Hide resolved
throw new IllegalStateException("copying from buffer without device memory") | ||
stream: Cuda.Stream): Option[RapidsBufferBase] = { | ||
val wouldFit = trySpillToMaximumSize(other, stream) | ||
// TODO: this is disabled for now since subsequent work will tie this into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@revans2 this is I think an integration point with the host allocator, so I left this as a TODO disabled by default so it does what it used to do, if we decide to merge this. Let me know if I should look into the apis and try to integrate things, or if this is what you needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a quick pass and it is looking good. I want to spend some more time on it later to dig in a little deeper.
I have an issue with the posted code where I will loose a RapidsBuffer if it spills while I am spilling. This is because of how I grouped them together and updated tiers all at once. I am testing a fix and will push today. |
…ing spilled at a higher tier
Signed-off-by: Alessandro Bellina <abellina@nvidia.com>
@revans2 this is ready for another pass |
@@ -659,38 +642,19 @@ class RapidsBufferCatalog( | |||
logDebug(s"Skipping spilling $buffer ${buffer.id} to ${spillStore.name} as it is " + | |||
s"already stored in multiple tiers") | |||
} | |||
>>>>>>> 740d1175060555ad373f30ed780a58215b6c3419 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the upmerge was done properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right, this method needs to go away (it's now in the store). Fixed in 4916333
build |
// try one last time after locking the catalog (slow path) | ||
// if there is a lot of contention here, I would rather lock the world than | ||
// have tasks error out with "Unable to acquire" | ||
synchronized { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not strictly necessary, but while debugging some other issue I thought it wouldn't hurt. Before this we would perform the attempts and throw, without trying to lock the catalog and try it one last time. I don't have evidence of this saving a task that would otherwise fail however.
My intellij was running without assertions , so I didn't see test failures reported by CI. Fixed. |
build |
Contributes to #8881.
This is a work in progress to allow spill to skip host. I need to run this under a workload (by enabling the "skip host" mode locally) and check for leaks and make sure all the accounting is working correctly. It leaves that mode disabled by default because I think this is something that will be integrated later against the new
HostAlloc
api, but I can try integrating that in this PR if needed.I did some minor refactoring because I didn't want to preclude a certain order between calling
free
and getting the size of the spilledRapidsBuffer
in the source tier. I made that method a val, so we are guaranteed that it is set until GC.This also fixes #9223. This is an issue with 0-byte RapidsBuffers as they spill to disk.
mmap
on 0-byte length file segment is not valid, yet this can very well given aColumnarBatch
with 0 rows for example. The code in this PR will not allow these 0-byte buffers to be marked as spillable, so they will never reach disk.