Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Improve performance of benchmark input generation #9857

Closed
robertmaynard opened this issue Dec 7, 2021 · 0 comments · Fixed by #10109
Closed

[FEA] Improve performance of benchmark input generation #9857

robertmaynard opened this issue Dec 7, 2021 · 0 comments · Fixed by #10109
Labels
0 - Backlog In queue waiting for assignment feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Performance Performance related issue

Comments

@robertmaynard
Copy link
Contributor

robertmaynard commented Dec 7, 2021

Is your feature request related to a problem? Please describe.
As identified by #5773 (comment) a significant portion of the runtime for some benchmarks is data generation instead of micro-benchmarking.

Specifically the issue is that the benchmark fixtures spend significant time in Setup/TearDown initializing state.

Describe the solution you'd like

  • Cache input state when possible across Setup/TearDown
  • Perform as much of generate_benchmark_input.hpp on the benchmarked GPU as possiblem
@robertmaynard robertmaynard added feature request New feature or request 0 - Backlog In queue waiting for assignment libcudf Affects libcudf (C++/CUDA) code. Performance Performance related issue labels Dec 7, 2021
rapids-bot bot pushed a commit that referenced this issue Mar 22, 2022
To speedup generate benchmark input generation, move all data generation to device.
To address #5773 (comment)
This PR moves the random input generation to device.

Rest all of the original work in this PR was split to multiple PRs and merged.
#10277
#10278
#10279
#10280
#10281
#10300

With all of these changes, single iteration of all benchmark runs in <1000 seconds. (from 3067s to 964s).
Running more iterations would see higher benefit too because the benchmark is restarted several times during run which again calls benchmark input generation code.

closes #9857

Authors:
  - Karthikeyan (https://github.com/karthikeyann)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Vukasin Milovanovic (https://github.com/vuule)
  - David Wendt (https://github.com/davidwendt)

URL: #10109
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 - Backlog In queue waiting for assignment feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Performance Performance related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant