-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Call GpuDeviceManager.shutdown when the executor plugin is shutting down #1713
Call GpuDeviceManager.shutdown when the executor plugin is shutting down #1713
Conversation
Signed-off-by: Alessandro Bellina <abellina@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -210,6 +210,7 @@ class RapidsExecutorPlugin extends ExecutorPlugin with Logging { | |||
override def shutdown(): Unit = { | |||
GpuSemaphore.shutdown() | |||
PythonWorkerSemaphore.shutdown() | |||
GpuDeviceManager.shutdown() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should make these classes (Auto)Closable? then we could use hadoop's IOUtils
to IOUtils.cleanup(null, GpuSemaphore, PythonWorkerSemaphore, GpuDeviceManager)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we actually have safeClose
already. safeClose
will call .close()
on every member of the collection, but then will throw at the end if there was a problem closing any member. I can certainly do this as part of this PR, the classes we care about are exactly these three.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree not necessary for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok lets skip it for now, we can follow up with a cleanup PR.
build |
…own (NVIDIA#1713) * Call GpuDeviceManager.shutdown when the executor plugin is shutting down Signed-off-by: Alessandro Bellina <abellina@nvidia.com> * Update copyright
…own (NVIDIA#1713) * Call GpuDeviceManager.shutdown when the executor plugin is shutting down Signed-off-by: Alessandro Bellina <abellina@nvidia.com> * Update copyright
Signed-off-by: Alessandro Bellina abellina@nvidia.com
@gerashegalov noticed the following host memory leak in unit tests:
This was because the
RapidsHostMemoryStore
(and its pool) were not being shutdown. The included here removes the leak from the tests.