Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] compile_udf: Cache PTX for similar functions #7371

Merged
merged 4 commits into from
Feb 26, 2021

Conversation

gmarkall
Copy link
Contributor

Compiling a UDF generated in a loop will result in a distinct compilation for each loop iteration, because each new definition of the UDF does not compare equal to any previous definition, and a new compilation occurs. Furthermore, each new compilation returns PTX that differs only in a trivial way (the generated code is the same but function names are different), so JITify's cache also misses.

For example:

for data_size in range(3):
    data = Series([3] * (2 ** data_size), dtype=np.float64)
    for i in range(3):
        data.applymap(lambda x: x + 1)

results in nine compilations when one would have sufficed.

This commit adds an additional cache to compile_udf keyed on the signature, code, and closure variables of the function. This can hit for distinct definitions of the same function. The existing lru_cache wrapping compile_udf is left in place as it is expected to be able to hash the function much more quickly, though I don't know if this has a noticeable impact on performance - perhaps it would be worth removing it for simplicity, so that there is only one level of caching.

Compiling a UDF generated in a loop will result in a distinct
compilation for each loop iteration, because each new definition of the
UDF does not compare equal to any previous definition, and a new
compilation occurs. Furthermore, each new compilation returns PTX that
differs only in a trivial way (the generated code is the same but
function names are different), so JITify's cache also misses.

For example:

```python
for data_size in range(3):
    data = Series([3] * (2 ** data_size), dtype=np.float64)
    for i in range(3):
        data.applymap(lambda x: x + 1)
```

results in nine compilations when one would have sufficed.

This commit adds an additional cache to `compile_udf` keyed on the
signature, code, and closure variables of the function. This can hit for
distinct definitions of the same function. The existing `lru_cache`
wrapping `compile_udf` is left in place as it is expected to be able to
hash the function much more quickly, though I don't know if this has a
noticeable impact on performance - perhaps it would be worth removing it
for simplicity, so that there is only one level of caching.
@gmarkall gmarkall requested review from a team as code owners February 11, 2021 17:41
@github-actions github-actions bot added conda Python Affects Python cuDF API. labels Feb 11, 2021
@kkraus14 kkraus14 added the non-breaking Non-breaking change label Feb 11, 2021
@@ -1,9 +1,11 @@
# Copyright (c) 2018-2021, NVIDIA CORPORATION.
from functools import lru_cache

import cachetools
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a new dependency for this? Can we not use functools.lru_cache?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see a way to control the key with functools.lru_cache - is there a way I have missed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. Let's move forward as is for now but we should generally try to avoid adding new dependencies whenever possible unless they're needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the cachetools dependency as I noticed that something that cuDF depends on seems to depend on it, so it was already installed in my environment... However, the relevant code is approximately 160 lines - would vendoring it be better?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah this is fine. Wasn't aware this was already in the dependency tree.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmarkall, @kkraus14, since this is already somewhere in our dependency tree, should it still be added as an explicit dependency in the integration repo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajschmidt8 Would that be added to the run section in https://github.com/rapidsai/integration/blob/branch-0.18/conda/recipes/rapids-build-env/meta.yaml ? If that's intended to install all packages that cuDF might use, then I'd guess it should be (my answer is a bit tentative because I'm not particularly familiar with the integration repo).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, i will open a PR to add it there

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR added here: rapidsai/integration#220

@codecov
Copy link

codecov bot commented Feb 11, 2021

Codecov Report

Merging #7371 (b5a926d) into branch-0.19 (43b44e1) will increase coverage by 0.47%.
The diff coverage is 97.77%.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.19    #7371      +/-   ##
===============================================
+ Coverage        81.80%   82.28%   +0.47%     
===============================================
  Files              101      101              
  Lines            16695    17084     +389     
===============================================
+ Hits             13658    14057     +399     
+ Misses            3037     3027      -10     
Impacted Files Coverage Δ
python/cudf/cudf/core/frame.py 89.35% <ø> (+0.09%) ⬆️
python/cudf/cudf/core/column_accessor.py 95.47% <95.65%> (+2.53%) ⬆️
python/cudf/cudf/core/dataframe.py 90.58% <100.00%> (+0.12%) ⬆️
python/cudf/cudf/utils/cudautils.py 52.94% <100.00%> (+8.11%) ⬆️
python/cudf/cudf/io/feather.py 100.00% <0.00%> (ø)
python/cudf/cudf/comm/serialize.py 0.00% <0.00%> (ø)
python/cudf/cudf/_fuzz_testing/io.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/struct.py 100.00% <0.00%> (ø)
python/dask_cudf/dask_cudf/_version.py 0.00% <0.00%> (ø)
python/dask_cudf/dask_cudf/io/tests/test_csv.py 100.00% <0.00%> (ø)
... and 41 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d4583ec...e6655e2. Read the comment docs.

@kkraus14 kkraus14 added the improvement Improvement / enhancement to an existing function label Feb 12, 2021
@gmarkall
Copy link
Contributor Author

The dask-cudf test fail looks a little suspicious - there may be something I've missed in computing the key.

@kkraus14
Copy link
Collaborator

The dask-cudf failure doesn't use UDFs at all so it should be good to go.

@kkraus14
Copy link
Collaborator

rerun tests

@gmarkall
Copy link
Contributor Author

The dask-cudf failure doesn't use UDFs at all so it should be good to go.

Thanks - I was having a lot of difficulty replicating it locally!

@kkraus14
Copy link
Collaborator

@gpucibot merge

@gmarkall
Copy link
Contributor Author

Was the fail just a CI hiccup? Details below (log from gpu/3.8/centos7/10.2 on https://gpuci.gpuopenanalytics.com/job/rapidsai/job/gpuci/job/cudf/job/prb/job/cudf-gpu-test/970/):

CI Log
00:29:52 FATAL: command execution failed
00:29:52 java.nio.channels.ClosedChannelException
00:29:52 	at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)
00:29:52 	at org.jenkinsci.remoting.protocol.impl.NIONetworkLayer.ready(NIONetworkLayer.java:179)
00:29:52 	at org.jenkinsci.remoting.protocol.IOHub$OnReady.run(IOHub.java:795)
00:29:52 	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
00:29:52 	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
00:29:52 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
00:29:52 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
00:29:52 	at java.lang.Thread.run(Thread.java:748)
00:29:52 Caused: java.io.IOException: Backing channel 'JNLP4-connect connection from thunderhill.nvidia.com/216.228.112.22:53546' is disconnected.
00:29:52 	at hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:216)
00:29:52 	at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:286)
00:29:52 	at com.sun.proxy.$Proxy107.isAlive(Unknown Source)
00:29:52 	at hudson.Launcher$RemoteLauncher$ProcImpl.isAlive(Launcher.java:1211)
00:29:52 	at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:1203)
00:29:52 	at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:194)
00:29:52 	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:144)
00:29:52 	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:91)
00:29:52 	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
00:29:52 	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:741)
00:29:52 	at hudson.model.Build$BuildExecution.build(Build.java:206)
00:29:52 	at hudson.model.Build$BuildExecution.doRun(Build.java:163)
00:29:52 	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504)
00:29:52 	at hudson.model.Run.execute(Run.java:1894)
00:29:52 	at hudson.matrix.MatrixRun.run(MatrixRun.java:153)
00:29:52 	at hudson.model.ResourceController.execute(ResourceController.java:97)
00:29:52 	at hudson.model.Executor.run(Executor.java:428)
00:29:52 FATAL: Unable to delete script file /tmp/jenkins6347770917269100853.sh
00:29:52 java.nio.channels.ClosedChannelException
00:29:52 	at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)
00:29:52 	at org.jenkinsci.remoting.protocol.impl.NIONetworkLayer.ready(NIONetworkLayer.java:179)
00:29:52 	at org.jenkinsci.remoting.protocol.IOHub$OnReady.run(IOHub.java:795)
00:29:52 	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
00:29:52 	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
00:29:52 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
00:29:52 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
00:29:52 	at java.lang.Thread.run(Thread.java:748)
00:29:52 Caused: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@6cc8c65f:JNLP4-connect connection from thunderhill.nvidia.com/216.228.112.22:53546": Remote call on JNLP4-connect connection from thunderhill.nvidia.com/216.228.112.22:53546 failed. The channel is closing down or has closed down
00:29:52 	at hudson.remoting.Channel.call(Channel.java:994)
00:29:52 	at hudson.FilePath.act(FilePath.java:1070)
00:29:52 	at hudson.FilePath.act(FilePath.java:1059)
00:29:52 	at hudson.FilePath.delete(FilePath.java:1582)
00:29:52 	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:162)
00:29:52 	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:91)
00:29:52 	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
00:29:52 	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:741)
00:29:52 	at hudson.model.Build$BuildExecution.build(Build.java:206)
00:29:52 	at hudson.model.Build$BuildExecution.doRun(Build.java:163)
00:29:52 	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504)
00:29:52 	at hudson.model.Run.execute(Run.java:1894)
00:29:52 	at hudson.matrix.MatrixRun.run(MatrixRun.java:153)
00:29:52 	at hudson.model.ResourceController.execute(ResourceController.java:97)
00:29:52 	at hudson.model.Executor.run(Executor.java:428)
00:29:52 Build step 'Execute shell' marked build as failure
00:29:52 FATAL: Channel "hudson.remoting.Channel@6cc8c65f:JNLP4-connect connection from thunderhill.nvidia.com/216.228.112.22:53546": Remote call on JNLP4-connect connection from thunderhill.nvidia.com/216.228.112.22:53546 failed. The channel is closing down or has closed down
00:29:52 java.nio.channels.ClosedChannelException
00:29:52 	at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)
00:29:52 	at org.jenkinsci.remoting.protocol.impl.NIONetworkLayer.ready(NIONetworkLayer.java:179)
00:29:52 	at org.jenkinsci.remoting.protocol.IOHub$OnReady.run(IOHub.java:795)
00:29:52 	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
00:29:52 	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
00:29:52 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
00:29:52 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
00:29:52 	at java.lang.Thread.run(Thread.java:748)
00:29:52 Caused: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@6cc8c65f:JNLP4-connect connection from thunderhill.nvidia.com/216.228.112.22:53546": Remote call on JNLP4-connect connection from thunderhill.nvidia.com/216.228.112.22:53546 failed. The channel is closing down or has closed down
00:29:52 	at hudson.remoting.Channel.call(Channel.java:994)
00:29:52 	at hudson.Launcher$RemoteLauncher.launch(Launcher.java:1121)
00:29:52 	at hudson.Launcher$ProcStarter.start(Launcher.java:508)
00:29:52 	at hudson.Launcher$ProcStarter.join(Launcher.java:519)
00:29:52 	at com.gpuopenanalytics.jenkins.remotedocker.DockerState.execute(DockerState.java:88)
00:29:52 	at com.gpuopenanalytics.jenkins.remotedocker.DockerState.tearDown(DockerState.java:98)
00:29:52 	at com.gpuopenanalytics.jenkins.remotedocker.RemoteDockerBuildWrapper$DockerEnvironment.tearDown(RemoteDockerBuildWrapper.java:198)
00:29:52 	at hudson.model.Build$BuildExecution.doRun(Build.java:174)
00:29:52 	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504)
00:29:52 	at hudson.model.Run.execute(Run.java:1894)
00:29:52 	at hudson.matrix.MatrixRun.run(MatrixRun.java:153)
00:29:52 	at hudson.model.ResourceController.execute(ResourceController.java:97)
00:29:52 	at hudson.model.Executor.run(Executor.java:428)
00:29:52 ERROR: Step ‘Publish JUnit test result report’ failed: no workspace for rapidsai/gpuci/cudf/prb/cudf-gpu-test/CUDA=10.2,GPU_LABEL=gpu,OS=centos7,PYTHON=3.8 #970
00:29:52 ERROR: Step ‘Set GitHub commit status (universal)’ failed: no workspace for rapidsai/gpuci/cudf/prb/cudf-gpu-test/CUDA=10.2,GPU_LABEL=gpu,OS=centos7,PYTHON=3.8 #970
00:29:52 Finished: FAILURE

@kkraus14
Copy link
Collaborator

Yup CI hiccup. I'll kick it back off.

rerun tests

@rapids-bot rapids-bot bot merged commit 7526be7 into rapidsai:branch-0.19 Feb 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants