[REVIEW] compile_udf: Cache PTX for similar functions #7371

gmarkall · 2021-02-11T17:40:59Z

Compiling a UDF generated in a loop will result in a distinct compilation for each loop iteration, because each new definition of the UDF does not compare equal to any previous definition, and a new compilation occurs. Furthermore, each new compilation returns PTX that differs only in a trivial way (the generated code is the same but function names are different), so JITify's cache also misses.

For example:

for data_size in range(3):
    data = Series([3] * (2 ** data_size), dtype=np.float64)
    for i in range(3):
        data.applymap(lambda x: x + 1)

results in nine compilations when one would have sufficed.

This commit adds an additional cache to compile_udf keyed on the signature, code, and closure variables of the function. This can hit for distinct definitions of the same function. The existing lru_cache wrapping compile_udf is left in place as it is expected to be able to hash the function much more quickly, though I don't know if this has a noticeable impact on performance - perhaps it would be worth removing it for simplicity, so that there is only one level of caching.

Compiling a UDF generated in a loop will result in a distinct compilation for each loop iteration, because each new definition of the UDF does not compare equal to any previous definition, and a new compilation occurs. Furthermore, each new compilation returns PTX that differs only in a trivial way (the generated code is the same but function names are different), so JITify's cache also misses. For example: ```python for data_size in range(3): data = Series([3] * (2 ** data_size), dtype=np.float64) for i in range(3): data.applymap(lambda x: x + 1) ``` results in nine compilations when one would have sufficed. This commit adds an additional cache to `compile_udf` keyed on the signature, code, and closure variables of the function. This can hit for distinct definitions of the same function. The existing `lru_cache` wrapping `compile_udf` is left in place as it is expected to be able to hash the function much more quickly, though I don't know if this has a noticeable impact on performance - perhaps it would be worth removing it for simplicity, so that there is only one level of caching.

kkraus14 · 2021-02-11T17:47:15Z

python/cudf/cudf/utils/cudautils.py

@@ -1,9 +1,11 @@
 # Copyright (c) 2018-2021, NVIDIA CORPORATION.
 from functools import lru_cache

+import cachetools


Do we need a new dependency for this? Can we not use functools.lru_cache?

I didn't see a way to control the key with functools.lru_cache - is there a way I have missed?

Fair enough. Let's move forward as is for now but we should generally try to avoid adding new dependencies whenever possible unless they're needed.

I added the cachetools dependency as I noticed that something that cuDF depends on seems to depend on it, so it was already installed in my environment... However, the relevant code is approximately 160 lines - would vendoring it be better?

Nah this is fine. Wasn't aware this was already in the dependency tree.

@gmarkall, @kkraus14, since this is already somewhere in our dependency tree, should it still be added as an explicit dependency in the integration repo?

@ajschmidt8 Would that be added to the run section in https://github.com/rapidsai/integration/blob/branch-0.18/conda/recipes/rapids-build-env/meta.yaml ? If that's intended to install all packages that cuDF might use, then I'd guess it should be (my answer is a bit tentative because I'm not particularly familiar with the integration repo).

cool, i will open a PR to add it there

PR added here: rapidsai/integration#220

python/cudf/cudf/utils/cudautils.py

codecov · 2021-02-11T22:03:46Z

Codecov Report

Merging #7371 (b5a926d) into branch-0.19 (43b44e1) will increase coverage by 0.47%.
The diff coverage is 97.77%.

@@               Coverage Diff               @@
##           branch-0.19    #7371      +/-   ##
===============================================
+ Coverage        81.80%   82.28%   +0.47%     
===============================================
  Files              101      101              
  Lines            16695    17084     +389     
===============================================
+ Hits             13658    14057     +399     
+ Misses            3037     3027      -10

Impacted Files	Coverage Δ
python/cudf/cudf/core/frame.py	`89.35% <ø> (+0.09%)`	⬆️
python/cudf/cudf/core/column_accessor.py	`95.47% <95.65%> (+2.53%)`	⬆️
python/cudf/cudf/core/dataframe.py	`90.58% <100.00%> (+0.12%)`	⬆️
python/cudf/cudf/utils/cudautils.py	`52.94% <100.00%> (+8.11%)`	⬆️
python/cudf/cudf/io/feather.py	`100.00% <0.00%> (ø)`
python/cudf/cudf/comm/serialize.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/_fuzz_testing/io.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/column/struct.py	`100.00% <0.00%> (ø)`
python/dask_cudf/dask_cudf/_version.py	`0.00% <0.00%> (ø)`
python/dask_cudf/dask_cudf/io/tests/test_csv.py	`100.00% <0.00%> (ø)`
... and 41 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d4583ec...e6655e2. Read the comment docs.

python/cudf/cudf/tests/test_compile_udf.py

gmarkall · 2021-02-12T22:28:09Z

The dask-cudf test fail looks a little suspicious - there may be something I've missed in computing the key.

kkraus14 · 2021-02-16T17:08:44Z

The dask-cudf failure doesn't use UDFs at all so it should be good to go.

kkraus14 · 2021-02-16T17:08:49Z

rerun tests

gmarkall · 2021-02-16T17:15:05Z

The dask-cudf failure doesn't use UDFs at all so it should be good to go.

Thanks - I was having a lot of difficulty replicating it locally!

python/cudf/cudf/tests/test_compile_udf.py

kkraus14 · 2021-02-23T22:42:19Z

@gpucibot merge

gmarkall · 2021-02-25T10:17:43Z

Was the fail just a CI hiccup? Details below (log from gpu/3.8/centos7/10.2 on https://gpuci.gpuopenanalytics.com/job/rapidsai/job/gpuci/job/cudf/job/prb/job/cudf-gpu-test/970/):

CI Log

00:29:52 FATAL: command execution failed
00:29:52 java.nio.channels.ClosedChannelException
00:29:52 	at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)
00:29:52 	at org.jenkinsci.remoting.protocol.impl.NIONetworkLayer.ready(NIONetworkLayer.java:179)
00:29:52 	at org.jenkinsci.remoting.protocol.IOHub$OnReady.run(IOHub.java:795)
00:29:52 	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
00:29:52 	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
00:29:52 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
00:29:52 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
00:29:52 	at java.lang.Thread.run(Thread.java:748)
00:29:52 Caused: java.io.IOException: Backing channel 'JNLP4-connect connection from thunderhill.nvidia.com/216.228.112.22:53546' is disconnected.
00:29:52 	at hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:216)
00:29:52 	at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:286)
00:29:52 	at com.sun.proxy.$Proxy107.isAlive(Unknown Source)
00:29:52 	at hudson.Launcher$RemoteLauncher$ProcImpl.isAlive(Launcher.java:1211)
00:29:52 	at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:1203)
00:29:52 	at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:194)
00:29:52 	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:144)
00:29:52 	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:91)
00:29:52 	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
00:29:52 	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:741)
00:29:52 	at hudson.model.Build$BuildExecution.build(Build.java:206)
00:29:52 	at hudson.model.Build$BuildExecution.doRun(Build.java:163)
00:29:52 	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504)
00:29:52 	at hudson.model.Run.execute(Run.java:1894)
00:29:52 	at hudson.matrix.MatrixRun.run(MatrixRun.java:153)
00:29:52 	at hudson.model.ResourceController.execute(ResourceController.java:97)
00:29:52 	at hudson.model.Executor.run(Executor.java:428)
00:29:52 FATAL: Unable to delete script file /tmp/jenkins6347770917269100853.sh
00:29:52 java.nio.channels.ClosedChannelException
00:29:52 	at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)
00:29:52 	at org.jenkinsci.remoting.protocol.impl.NIONetworkLayer.ready(NIONetworkLayer.java:179)
00:29:52 	at org.jenkinsci.remoting.protocol.IOHub$OnReady.run(IOHub.java:795)
00:29:52 	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
00:29:52 	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
00:29:52 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
00:29:52 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
00:29:52 	at java.lang.Thread.run(Thread.java:748)
00:29:52 Caused: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@6cc8c65f:JNLP4-connect connection from thunderhill.nvidia.com/216.228.112.22:53546": Remote call on JNLP4-connect connection from thunderhill.nvidia.com/216.228.112.22:53546 failed. The channel is closing down or has closed down
00:29:52 	at hudson.remoting.Channel.call(Channel.java:994)
00:29:52 	at hudson.FilePath.act(FilePath.java:1070)
00:29:52 	at hudson.FilePath.act(FilePath.java:1059)
00:29:52 	at hudson.FilePath.delete(FilePath.java:1582)
00:29:52 	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:162)
00:29:52 	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:91)
00:29:52 	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
00:29:52 	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:741)
00:29:52 	at hudson.model.Build$BuildExecution.build(Build.java:206)
00:29:52 	at hudson.model.Build$BuildExecution.doRun(Build.java:163)
00:29:52 	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504)
00:29:52 	at hudson.model.Run.execute(Run.java:1894)
00:29:52 	at hudson.matrix.MatrixRun.run(MatrixRun.java:153)
00:29:52 	at hudson.model.ResourceController.execute(ResourceController.java:97)
00:29:52 	at hudson.model.Executor.run(Executor.java:428)
00:29:52 Build step 'Execute shell' marked build as failure
00:29:52 FATAL: Channel "hudson.remoting.Channel@6cc8c65f:JNLP4-connect connection from thunderhill.nvidia.com/216.228.112.22:53546": Remote call on JNLP4-connect connection from thunderhill.nvidia.com/216.228.112.22:53546 failed. The channel is closing down or has closed down
00:29:52 java.nio.channels.ClosedChannelException
00:29:52 	at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)
00:29:52 	at org.jenkinsci.remoting.protocol.impl.NIONetworkLayer.ready(NIONetworkLayer.java:179)
00:29:52 	at org.jenkinsci.remoting.protocol.IOHub$OnReady.run(IOHub.java:795)
00:29:52 	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
00:29:52 	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
00:29:52 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
00:29:52 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
00:29:52 	at java.lang.Thread.run(Thread.java:748)
00:29:52 Caused: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@6cc8c65f:JNLP4-connect connection from thunderhill.nvidia.com/216.228.112.22:53546": Remote call on JNLP4-connect connection from thunderhill.nvidia.com/216.228.112.22:53546 failed. The channel is closing down or has closed down
00:29:52 	at hudson.remoting.Channel.call(Channel.java:994)
00:29:52 	at hudson.Launcher$RemoteLauncher.launch(Launcher.java:1121)
00:29:52 	at hudson.Launcher$ProcStarter.start(Launcher.java:508)
00:29:52 	at hudson.Launcher$ProcStarter.join(Launcher.java:519)
00:29:52 	at com.gpuopenanalytics.jenkins.remotedocker.DockerState.execute(DockerState.java:88)
00:29:52 	at com.gpuopenanalytics.jenkins.remotedocker.DockerState.tearDown(DockerState.java:98)
00:29:52 	at com.gpuopenanalytics.jenkins.remotedocker.RemoteDockerBuildWrapper$DockerEnvironment.tearDown(RemoteDockerBuildWrapper.java:198)
00:29:52 	at hudson.model.Build$BuildExecution.doRun(Build.java:174)
00:29:52 	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504)
00:29:52 	at hudson.model.Run.execute(Run.java:1894)
00:29:52 	at hudson.matrix.MatrixRun.run(MatrixRun.java:153)
00:29:52 	at hudson.model.ResourceController.execute(ResourceController.java:97)
00:29:52 	at hudson.model.Executor.run(Executor.java:428)
00:29:52 ERROR: Step ‘Publish JUnit test result report’ failed: no workspace for rapidsai/gpuci/cudf/prb/cudf-gpu-test/CUDA=10.2,GPU_LABEL=gpu,OS=centos7,PYTHON=3.8 #970
00:29:52 ERROR: Step ‘Set GitHub commit status (universal)’ failed: no workspace for rapidsai/gpuci/cudf/prb/cudf-gpu-test/CUDA=10.2,GPU_LABEL=gpu,OS=centos7,PYTHON=3.8 #970
00:29:52 Finished: FAILURE

kkraus14 · 2021-02-25T13:48:29Z

Yup CI hiccup. I'll kick it back off.

rerun tests

gmarkall requested review from a team as code owners February 11, 2021 17:41

gmarkall requested review from shwina and brandon-b-miller February 11, 2021 17:41

github-actions bot added conda Python Affects Python cuDF API. labels Feb 11, 2021

kkraus14 added the non-breaking Non-breaking change label Feb 11, 2021

kkraus14 reviewed Feb 11, 2021

View reviewed changes

gmarkall commented Feb 11, 2021

View reviewed changes

python/cudf/cudf/utils/cudautils.py Outdated Show resolved Hide resolved

Add cachetools run dependency in conda recipe

029c9c3

kkraus14 added the improvement Improvement / enhancement to an existing function label Feb 12, 2021

shwina reviewed Feb 12, 2021

View reviewed changes

python/cudf/cudf/tests/test_compile_udf.py Show resolved Hide resolved

esoha-nvidia reviewed Feb 23, 2021

View reviewed changes

python/cudf/cudf/tests/test_compile_udf.py Show resolved Hide resolved

ajschmidt8 mentioned this pull request Feb 23, 2021

Add cachetools to rapids-build-env rapidsai/integration#220

Merged

Remove lru_cache from compile_udf and update tests

75f39d7

kkraus14 approved these changes Feb 23, 2021

View reviewed changes

Merge remote-tracking branch 'upstream/branch-0.19' into grm-code-cache

e6655e2

ajschmidt8 approved these changes Feb 26, 2021

View reviewed changes

rapids-bot bot merged commit 7526be7 into rapidsai:branch-0.19 Feb 26, 2021

gmarkall mentioned this pull request Mar 23, 2021

Turn off function name numbering for easier caching for CUDA numba/numba#4396

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW] compile_udf: Cache PTX for similar functions #7371

[REVIEW] compile_udf: Cache PTX for similar functions #7371

gmarkall commented Feb 11, 2021

kkraus14 Feb 11, 2021

gmarkall Feb 11, 2021

kkraus14 Feb 12, 2021

gmarkall Feb 12, 2021

kkraus14 Feb 12, 2021

ajschmidt8 Feb 16, 2021

gmarkall Feb 23, 2021

ajschmidt8 Feb 23, 2021

ajschmidt8 Feb 23, 2021

codecov bot commented Feb 11, 2021 •

edited

Loading

gmarkall commented Feb 12, 2021

kkraus14 commented Feb 16, 2021

kkraus14 commented Feb 16, 2021

gmarkall commented Feb 16, 2021

kkraus14 commented Feb 23, 2021

gmarkall commented Feb 25, 2021

kkraus14 commented Feb 25, 2021

[REVIEW] compile_udf: Cache PTX for similar functions #7371

[REVIEW] compile_udf: Cache PTX for similar functions #7371

Conversation

gmarkall commented Feb 11, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Feb 11, 2021 • edited Loading

Codecov Report

gmarkall commented Feb 12, 2021

kkraus14 commented Feb 16, 2021

kkraus14 commented Feb 16, 2021

gmarkall commented Feb 16, 2021

kkraus14 commented Feb 23, 2021

gmarkall commented Feb 25, 2021

kkraus14 commented Feb 25, 2021

codecov bot commented Feb 11, 2021 •

edited

Loading