-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lower GPU memory reserve to 256MB #4046
Conversation
Signed-off-by: Rong Ou <rong.ou@gmail.com>
The reason this was set to a large value originally is because a customer query was casting strings to timestamps, and that's not a rare thing for a query to do. Back then, the string-to-timestamp casting code involved quite a few complicated regex replace with back references, and that kernel had a large thread stack space requirement. So large that the query would not run at all unless the reserve was set to 1GB. I totally get the desire to lower the amount of reserve memory (it's like getting a bigger GPU for free! 😄), but IMO there's a significant associated risk. The default reserve setting could end up being too low to the point where it would crash on some relatively common query operations. That leads to a very poor first impression, as users would not be told when it crashes that they could try tuning this reserve config or, counterintuitively, lower the amount of memory the plugin memory pool is using to fix the out of memory issue. (This is insufficient memory for the driver to complete a kernel launch rather than insufficient memory for a plugin allocation request.) The timestamp casting code has changed quite a bit since the reserve was tuned, and there may have been libcudf improvements in the memory utilization of the regex backref kernel. We might be able to get away with a significantly lower setting. However I think we should test the proposed setting given that casting strings to timestamps is probably not a rare occurrence in many ETL pipelines. |
For For reference, when I originally worked on the arena allocator, I had to reserve some memory too when the max is not specified, and some testing also led to 64MB: https://github.com/rapidsai/rmm/blob/branch-21.12/include/rmm/mr/device/detail/arena.hpp#L295 I think the reason it worked before was more or less a coincidence, as setting the max pool size to total memory minus a 1GB reserve was not very sensical, but just happened to work. I'm hoping the changes I'm making to the arena allocator would make it more resistant to memory fragmentation, so maybe we can raise this reserve later. |
build |
Signed-off-by: Rong Ou <rong.ou@gmail.com>
build |
The current reserve is set to 1GB, this may be too high causing some queries to run out of memory. Lowering it to a more reasonable value. If a user runs out of memory launching large kernels, they can still tweak this parameter.
Fixes #4045
Signed-off-by: Rong Ou rong.ou@gmail.com