-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] test_hash_reduction_decimal_overflow_sum[30] failed OOM in integration tests #4315
Comments
going to give it more reasonable rmm pool size |
reopened, still seeing intermittent OOM issue, going to decrease concurrentGpuTasks and do some testing |
ran dozens of tests, did not see this again after decrease concurrentGPUTasks, I will keep monitoring this |
Saw this in one of the nightly tests again. |
This feels like there is some other issue that is happening here that concern me. I removed the test in #4570 so technically we can close this because the test is gone so it is impossible for this to happen again. But... I hacked together a quick max mem allocated patch for CUDF so we could tell how much memory each test took, and this test took 453 MiB. By far the largest amount of memory consumed, but 500 MiB should not be too much if we are shooting for 2 GiB of GPU memory per test task (which does not work, but I'll address that in a separate issue), and especially when #4336 supposedly split this test off to run with a hard coded parallelism of 2, which in theory should make it so we get at least 8 GiB of memory. So I think the next step is to start playing around with the nightly test script to really clean it up and verify that everything is working as expected. But I am going to close this because this particular test is no more. |
Describe the bug
related to #4272
FAILED ../../src/main/python/hash_aggregate_test.py::test_hash_reduction_decimal_overflow_sum[30]
The text was updated successfully, but these errors were encountered: