-
Notifications
You must be signed in to change notification settings - Fork 891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QST] NumbaPerformanceWarning triggered by DataFrame.iloc
#9247
Comments
I just remembered there's a config variable in Numba for this, which we could set, perhaps in https://github.com/rapidsai/cudf/blob/branch-21.10/python/cudf/cudf/__init__.py - if import cudf
from numba import config
config.CUDA_LOW_OCCUPANCY_WARNINGS = 0
df = cudf.DataFrame({'a': list(range(20)),
'b': list(reversed(range(20))),
'c': list(range(20))
})
df.iloc[0] runs with no warnings produced. |
I've also been made aware of #9236 which seems to be related in that it moves towards using CuPy arrays rather than Numba device arrays, similar / relevant to:
|
@gmarkall feel free to reach out and we can chat about this (and in the context of #9236) when you're free. In general, though, my inclination is that numba should keep this kind of warning but cuDF should catch and silence it. cuDF disables certain operations that are possible in pandas because they would be prohibitively slow, for example implicit conversion to numpy arrays via |
I believe this code is for generating a column of a size given a scalar and predates having the libcudf function for this. In general, the whole |
Numba 0.54 introduces performance warnings whenever a kernel is launched with low occupancy. Certain operations in cuDF can indirectly cause this warning to be emitted, for example `DataFrame.iloc` (which one would not expect to be particularly efficient anyway). This message could be disconcerting for users and appear in a lot of workflows; therefore, this commit configures Numba at cuDF import time to suppress the low occupancy warning. Fixes rapidsai#9247.
Numba 0.54 introduces performance warnings whenever a kernel is launched with low occupancy. Certain operations in cuDF can indirectly cause this warning to be emitted, for example `DataFrame.iloc` (which one would not expect to be particularly efficient anyway). This message could be disconcerting for users and appear in a lot of workflows; therefore, this commit configures Numba at cuDF import time to suppress the low occupancy warning. Fixes rapidsai#9247.
(Background: This originally was brought to my attention by @rnyak, who showed that the warning popped up surprisingly in an NVTabular workflow.)
Numba 0.54 added the emission of warnings when a kernel is launched with a configuration that will result in low occupancy - for examples, see https://nbviewer.jupyter.org/github/gmarkall/numba-examples/blob/cuda-054-release-notebook/notebooks/Numba_054_CUDA_Release_Demo.ipynb#Small-grid-warnings
Using
df.iloc
results in one of these warnings, becausescalar_broadcast_to
uses__setitem__
on a Numba device array, which in turn launches a kernel that has low occupancy. This is called from:cudf/python/cudf/cudf/utils/utils.py
Line 77 in 11781e8
To reproduce one of these warnings, running:
is sufficient, generating the output:
I think there are some different conclusions we could come to, but I don't have much feel for which ones are best:
__setitem__
)?__setitem__
of device arrays should not raise warnings like this because it's up to Numba to implement it efficiently, and / or it shouldn't be expected that__setitem__
on a device array is an SOL implementation.My guess would be that this will pop up in a lot of different workflows, if they use
iloc
.Can others add their thoughts / help guide this issue in the right direction please?
The text was updated successfully, but these errors were encountered: