Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor strings column factories #7397

Merged

Conversation

harrism
Copy link
Member

@harrism harrism commented Feb 17, 2021

This PR refactors strings column factories to eliminate the use of device_vector and std::vector parameters, and to facility more use of device_uvector in calls to the factories. This is a small part of #7287 . Multiple versions of make_strings_columns take device_vector parameters. This PR expands the use of iterator and device_span versions to enable switching to device_uvector as described in #7287. It also adds new make_device_uvector_async/sync utility functions.

This will help facilitate safe CUDA stream usage.

@harrism harrism added feature request New feature or request 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. breaking Breaking change labels Feb 17, 2021
@harrism harrism self-assigned this Feb 17, 2021
@harrism harrism changed the base branch from branch-0.18 to branch-0.19 February 17, 2021 05:59
@codecov
Copy link

codecov bot commented Feb 23, 2021

Codecov Report

Merging #7397 (aeff8d2) into branch-0.19 (43b44e1) will increase coverage by 0.45%.
The diff coverage is 87.45%.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.19    #7397      +/-   ##
===============================================
+ Coverage        81.80%   82.26%   +0.45%     
===============================================
  Files              101      101              
  Lines            16695    17244     +549     
===============================================
+ Hits             13658    14185     +527     
- Misses            3037     3059      +22     
Impacted Files Coverage Δ
python/cudf/cudf/utils/docutils.py 97.36% <50.00%> (-2.64%) ⬇️
python/cudf/cudf/testing/testing.py 80.00% <57.14%> (-1.04%) ⬇️
python/cudf/cudf/core/column/timedelta.py 88.57% <75.00%> (+0.33%) ⬆️
python/cudf/cudf/core/column/categorical.py 91.74% <77.14%> (-0.49%) ⬇️
python/cudf/cudf/core/column/datetime.py 89.63% <78.57%> (+0.58%) ⬆️
python/dask_cudf/dask_cudf/core.py 71.67% <79.41%> (-2.60%) ⬇️
python/cudf/cudf/core/multiindex.py 82.90% <90.90%> (+0.73%) ⬆️
python/cudf/cudf/core/tools/datetimes.py 84.53% <93.33%> (+0.89%) ⬆️
python/cudf/cudf/core/column_accessor.py 95.47% <95.65%> (+2.53%) ⬆️
python/cudf/cudf/core/column/column.py 87.86% <96.42%> (+0.43%) ⬆️
... and 64 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4d0c160...fb3c023. Read the comment docs.

@harrism harrism marked this pull request as ready for review February 24, 2021 10:10
@harrism harrism requested review from a team as code owners February 24, 2021 10:10
@github-actions github-actions bot added the conda label Feb 24, 2021
@harrism harrism requested a review from jrhemstad March 2, 2021 02:38
0,
stream);
auto repl = make_strings_column(cudf::detail::make_device_uvector_async(repl_chars, stream),
cudf::detail::make_device_uvector_async(repl_offsets, stream),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since _chars and _offsets are always the same, can we move them to a broader scope to avoid needing stream synchronization?

cpp/include/cudf/column/column_factories.hpp Outdated Show resolved Hide resolved
cpp/include/cudf/column/column_factories.hpp Outdated Show resolved Hide resolved
cpp/include/cudf/column/column_factories.hpp Outdated Show resolved Hide resolved
cpp/tests/strings/combine_tests.cpp Outdated Show resolved Hide resolved
cpp/tests/strings/combine_tests.cpp Outdated Show resolved Hide resolved
@harrism
Copy link
Member Author

harrism commented Mar 2, 2021

@cwharris what changes are you requesting?

Copy link
Contributor

@davidwendt davidwendt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the files need 2021 added to their copyright line.

@harrism
Copy link
Member Author

harrism commented Mar 3, 2021

Some of the files need 2021 added to their copyright line.

Thanks. Done.

@harrism harrism requested a review from davidwendt March 3, 2021 01:20
@shwina
Copy link
Contributor

shwina commented Mar 3, 2021

rerun tests

@cwharris
Copy link
Contributor

cwharris commented Mar 3, 2021

@harrism you addressed my concerns. 👍

@harrism
Copy link
Member Author

harrism commented Mar 4, 2021

@gpucibot merge

@harrism harrism added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 2 - In Progress Currently a work in progress labels Mar 4, 2021
@rapids-bot rapids-bot bot merged commit e5d0ec9 into rapidsai:branch-0.19 Mar 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge breaking Breaking change feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants