-
Notifications
You must be signed in to change notification settings - Fork 891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] compile_udf: Cache PTX for similar functions #7371
Conversation
Compiling a UDF generated in a loop will result in a distinct compilation for each loop iteration, because each new definition of the UDF does not compare equal to any previous definition, and a new compilation occurs. Furthermore, each new compilation returns PTX that differs only in a trivial way (the generated code is the same but function names are different), so JITify's cache also misses. For example: ```python for data_size in range(3): data = Series([3] * (2 ** data_size), dtype=np.float64) for i in range(3): data.applymap(lambda x: x + 1) ``` results in nine compilations when one would have sufficed. This commit adds an additional cache to `compile_udf` keyed on the signature, code, and closure variables of the function. This can hit for distinct definitions of the same function. The existing `lru_cache` wrapping `compile_udf` is left in place as it is expected to be able to hash the function much more quickly, though I don't know if this has a noticeable impact on performance - perhaps it would be worth removing it for simplicity, so that there is only one level of caching.
@@ -1,9 +1,11 @@ | |||
# Copyright (c) 2018-2021, NVIDIA CORPORATION. | |||
from functools import lru_cache | |||
|
|||
import cachetools |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a new dependency for this? Can we not use functools.lru_cache
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see a way to control the key with functools.lru_cache
- is there a way I have missed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. Let's move forward as is for now but we should generally try to avoid adding new dependencies whenever possible unless they're needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the cachetools
dependency as I noticed that something that cuDF depends on seems to depend on it, so it was already installed in my environment... However, the relevant code is approximately 160 lines - would vendoring it be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah this is fine. Wasn't aware this was already in the dependency tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gmarkall, @kkraus14, since this is already somewhere in our dependency tree, should it still be added as an explicit dependency in the integration repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajschmidt8 Would that be added to the run
section in https://github.com/rapidsai/integration/blob/branch-0.18/conda/recipes/rapids-build-env/meta.yaml ? If that's intended to install all packages that cuDF might use, then I'd guess it should be (my answer is a bit tentative because I'm not particularly familiar with the integration repo).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, i will open a PR to add it there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR added here: rapidsai/integration#220
Codecov Report
@@ Coverage Diff @@
## branch-0.19 #7371 +/- ##
===============================================
+ Coverage 81.80% 82.28% +0.47%
===============================================
Files 101 101
Lines 16695 17084 +389
===============================================
+ Hits 13658 14057 +399
+ Misses 3037 3027 -10
Continue to review full report at Codecov.
|
The dask-cudf test fail looks a little suspicious - there may be something I've missed in computing the key. |
The dask-cudf failure doesn't use UDFs at all so it should be good to go. |
rerun tests |
Thanks - I was having a lot of difficulty replicating it locally! |
@gpucibot merge |
Was the fail just a CI hiccup? Details below (log from gpu/3.8/centos7/10.2 on https://gpuci.gpuopenanalytics.com/job/rapidsai/job/gpuci/job/cudf/job/prb/job/cudf-gpu-test/970/): CI Log
|
Yup CI hiccup. I'll kick it back off. rerun tests |
Compiling a UDF generated in a loop will result in a distinct compilation for each loop iteration, because each new definition of the UDF does not compare equal to any previous definition, and a new compilation occurs. Furthermore, each new compilation returns PTX that differs only in a trivial way (the generated code is the same but function names are different), so JITify's cache also misses.
For example:
results in nine compilations when one would have sufficed.
This commit adds an additional cache to
compile_udf
keyed on the signature, code, and closure variables of the function. This can hit for distinct definitions of the same function. The existinglru_cache
wrappingcompile_udf
is left in place as it is expected to be able to hash the function much more quickly, though I don't know if this has a noticeable impact on performance - perhaps it would be worth removing it for simplicity, so that there is only one level of caching.