Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add gbenchmark for cudf::strings::to_lower #7316

Merged
merged 15 commits into from
Feb 10, 2021

Conversation

davidwendt
Copy link
Contributor

Reference #5698
This creates a gbenchmark for the cudf::strings::to_lower. The device logic is the same for cudf::strings::to_upper and cudf::strings::swapcase so this a good measure for the 3 APIs.

This PR is dependent on changes in PR #7292
These are mostly in the generate_benchmark_input.cpp

The initial results were as follows:

--------------------------------------------------------------------------------------------------
Benchmark                                        Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------------
StringCase/strings/4096/manual_time          0.278 ms        0.296 ms         2514 bytes_per_second=248.756M/s
StringCase/strings/32768/manual_time         0.289 ms        0.307 ms         2421 bytes_per_second=1.86625G/s
StringCase/strings/262144/manual_time        0.419 ms        0.438 ms         1662 bytes_per_second=10.2869G/s
StringCase/strings/2097152/manual_time        2.59 ms         2.61 ms          269 bytes_per_second=13.3449G/s
StringCase/strings/16777216/manual_time       25.9 ms         25.9 ms           27 bytes_per_second=10.6531G/s

The convert_case code here is a bit old. I changed it to use the more efficient make_strings_children utility and found the performance improved by 2x

--------------------------------------------------------------------------------------------------
Benchmark                                        Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------------
StringCase/strings/4096/manual_time          0.117 ms        0.135 ms         5877 bytes_per_second=592.795M/s
StringCase/strings/32768/manual_time         0.122 ms        0.140 ms         5641 bytes_per_second=4.42664G/s
StringCase/strings/262144/manual_time        0.274 ms        0.292 ms         2535 bytes_per_second=15.768G/s
StringCase/strings/2097152/manual_time        1.59 ms         1.61 ms          441 bytes_per_second=21.759G/s
StringCase/strings/16777216/manual_time       12.1 ms         12.1 ms           58 bytes_per_second=22.8626G/s

So these changes are also included in this PR.

@davidwendt davidwendt added 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. Performance Performance related issue strings strings issues (C++ and Python) improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Feb 4, 2021
@davidwendt davidwendt self-assigned this Feb 4, 2021
@davidwendt davidwendt requested review from a team as code owners February 4, 2021 19:35
Copy link
Contributor

@codereport codereport left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, couple small things

cpp/benchmarks/common/generate_benchmark_input.cpp Outdated Show resolved Hide resolved
cpp/benchmarks/common/generate_benchmark_input.cpp Outdated Show resolved Hide resolved
cpp/benchmarks/common/generate_benchmark_input.cpp Outdated Show resolved Hide resolved
cpp/src/strings/case.cu Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Feb 5, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-0.19@7e0437d). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@              Coverage Diff               @@
##             branch-0.19    #7316   +/-   ##
==============================================
  Coverage               ?   82.22%           
==============================================
  Files                  ?      100           
  Lines                  ?    16969           
  Branches               ?        0           
==============================================
  Hits                   ?    13953           
  Misses                 ?     3016           
  Partials               ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7e0437d...4b45594. Read the comment docs.

@davidwendt
Copy link
Contributor Author

rerun tests

@davidwendt
Copy link
Contributor Author

rerun tests

Copy link
Member

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmake approval

@davidwendt
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit f7b3f75 into rapidsai:branch-0.19 Feb 10, 2021
@davidwendt davidwendt deleted the benchmark-strings-case branch February 10, 2021 12:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Performance Performance related issue strings strings issues (C++ and Python)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants