Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor multibyte_split output_builder #11945

Merged

Conversation

upsj
Copy link
Contributor

@upsj upsj commented Oct 19, 2022

Description

This PR moves the output_builder and split_device_span classes out of multibyte_split and adds an iterator for the split_device_span, enabling it to be used directly in Thrust algorithms.

I also included a fix from #11875 to make the integration easier once that is merged.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@upsj upsj added 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. tech debt non-breaking Non-breaking change labels Oct 19, 2022
@upsj upsj requested a review from a team as a code owner October 19, 2022 10:22
@upsj upsj self-assigned this Oct 19, 2022
@upsj upsj requested a review from elstehle October 19, 2022 10:22
@upsj upsj requested a review from mythrocks October 19, 2022 10:22
@upsj upsj added the improvement Improvement / enhancement to an existing function label Oct 19, 2022
@codecov
Copy link

codecov bot commented Oct 19, 2022

Codecov Report

Base: 87.40% // Head: 88.16% // Increases project coverage by +0.75% 🎉

Coverage data is based on head (f297789) compared to base (f72c4ce).
Patch has no changes to coverable lines.

Additional details and impacted files
@@               Coverage Diff                @@
##           branch-22.12   #11945      +/-   ##
================================================
+ Coverage         87.40%   88.16%   +0.75%     
================================================
  Files               133      133              
  Lines             21833    21977     +144     
================================================
+ Hits              19084    19375     +291     
+ Misses             2749     2602     -147     
Impacted Files Coverage Δ
python/strings_udf/strings_udf/__init__.py 86.27% <0.00%> (-10.61%) ⬇️
python/cudf/cudf/io/text.py 91.66% <0.00%> (-8.34%) ⬇️
python/cudf/cudf/core/_base_index.py 82.20% <0.00%> (-3.35%) ⬇️
python/strings_udf/strings_udf/_typing.py 94.73% <0.00%> (-1.06%) ⬇️
python/cudf/cudf/testing/dataset_generator.py 72.83% <0.00%> (-0.42%) ⬇️
python/dask_cudf/dask_cudf/core.py 73.72% <0.00%> (-0.41%) ⬇️
python/dask_cudf/dask_cudf/backends.py 84.90% <0.00%> (-0.37%) ⬇️
python/cudf/cudf/io/orc.py 92.94% <0.00%> (-0.09%) ⬇️
python/cudf/cudf/__init__.py 90.69% <0.00%> (ø)
python/cudf/cudf/core/udf/_ops.py 100.00% <0.00%> (ø)
... and 23 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@upsj upsj force-pushed the refactor/output_builder_generalize branch from f408fab to ff3f1b9 Compare October 26, 2022 11:22
@upsj upsj added cuIO cuIO issue 4 - Needs cuIO Reviewer and removed 3 - Ready for Review Ready for review by team labels Oct 26, 2022
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good to me.

Copy link
Contributor

@hyperbolic2346 hyperbolic2346 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a weird thing. A few nit pick comments.

cpp/src/io/utilities/output_builder.cuh Show resolved Hide resolved
cpp/src/io/utilities/output_builder.cuh Outdated Show resolved Hide resolved
*/
void advance_output(size_type actual_size, rmm::cuda_stream_view stream)
{
CUDF_EXPECTS(actual_size <= _max_write_size, "Internal error");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more of an external error, isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

External to the data structure, yes, but internal to cuDF. Originally this was an assertion, but I changed it to also fail in release builds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw this was copied. I'm somewhat torn here. I don't like a message like this with no real information about what went wrong, but I also don't see it as a big deal since CUDF_EXPECTS will output file and line number information so it is easy to find out where the error lives.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought here was that any kind of error message will probably end up with the user, and they will not really be able to make much sense of it either way (nondescript vs. descriptive), let alone do anything against it, since this would point to a bug, not user error.

Co-authored-by: Mike Wilson <hyperbolic2346@users.noreply.github.com>
@upsj upsj added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 4 - Needs cuIO Reviewer labels Oct 27, 2022
@upsj
Copy link
Contributor Author

upsj commented Oct 27, 2022

rerun tests

@upsj
Copy link
Contributor Author

upsj commented Oct 27, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 43eb7a0 into rapidsai:branch-22.12 Oct 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge cuIO cuIO issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants