Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Add an internal utility API to return an offsets column of a sliced column starting with zero #9256

Open
ttnghia opened this issue Sep 20, 2021 · 5 comments
Assignees
Labels
feature request New feature or request improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change

Comments

@ttnghia
Copy link
Contributor

ttnghia commented Sep 20, 2021

For nested sliced columns that have an offsets column child, their offsets columns may contain values that do not start from zero. For example:

offsets = [5, 7, 20, ...]

Many operations on these sliced columns need to generate an output offsets column that starts with zero. For example, with the input column having offsets given above, the output offsets column should be:

offsets = [0, 2, 13, ...]

Such output offsets column is generated simply by subtracting all the values with the first value. Yes, very simple.

I would like to have an internal API implementing this feature. Currently, there are several other APIs using it by implementing private code in their .cu files. For example:

@ttnghia ttnghia added feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Sep 20, 2021
@jrhemstad
Copy link
Contributor

Are there situations where we could instead create a transform iterator to materialize the "clean" offsets instead of needing to materialize the new offsets?

@ttnghia
Copy link
Contributor Author

ttnghia commented Sep 20, 2021

I believe the answer is yes. However, in the two files listed above, a new offsets column is needed to generating an output lists column (using make_lists_column).

@jrhemstad
Copy link
Contributor

Maybe the solution here can then be to add that iterator and then provide a convenience function that just uses that iterator to fill a new column.

@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@github-actions
Copy link

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

No branches or pull requests

4 participants