Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Create column_view of one row out of scalar #10893

Closed
ttnghia opened this issue May 19, 2022 · 6 comments
Closed

[FEA] Create column_view of one row out of scalar #10893

ttnghia opened this issue May 19, 2022 · 6 comments
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.

Comments

@ttnghia
Copy link
Contributor

ttnghia commented May 19, 2022

In many operations such as searching/sorting etc, we may want to compare rows of a column with a scalar. Most of the time, we have to convert the input scalar into a column of one row. This is not trivial. For some scalar types such as string and list, we have to create an offsets buffer in memory for such conversion.

We should have internal APIs to support creating a column_view of one row that views the content of the input scalar as its (unique) row.

@davidwendt
Copy link
Contributor

This would only work for non-compound types. You still have to create the offsets buffer for a strings or list column. The column_view cannot own the offsets buffer. You could have the API return a column_view and buffer to free but that would not be much different than just returning a column. And we already have a function that converts a scalar into a column.

@ttnghia
Copy link
Contributor Author

ttnghia commented May 19, 2022

This would only work for non-compound types. You still have to create the offsets buffer for a strings or list column. The column_view cannot own the offsets buffer.

That is implementation details, I did that in my draft PR: returning pair<column_view, rmm::device_buffer>. This is similar to many other utility functions that return the auxiliary allocated buffer to keep while doing computation.

@davidwendt
Copy link
Contributor

I disagree with this. What does this save over building a column?
Allocating the data memory is not an issue with RMM. Simple device copy of a single value?
Will this handle an invalid scalar (one that represents null)? If so, you need another buffer for the bitmask.

@ttnghia
Copy link
Contributor Author

ttnghia commented May 19, 2022

But what if the scalar is nested type with a lot of data? Creating a new column will copy all data over. On the other hand, just creating a column_view will only allocate an offsets buffer.

@davidwendt
Copy link
Contributor

I believe each nested layer in a lists column has a unique set of offsets, right?
Really, I believe most of the work is building the offsets which this does not save us.

@jrhemstad
Copy link
Contributor

I disagree with this. What does this save over building a column? Allocating the data memory is not an issue with RMM. Simple device copy of a single value? Will this handle an invalid scalar (one that represents null)? If so, you need another buffer for the bitmask.

This is a good point. I didn't think about the fact that a column_view of a nested scalar requires a new allocation. If we're already doing an allocation, then I agree there's no need for a new API here. The column(scalar const&) ctor is sufficient.

@ttnghia ttnghia closed this as completed May 23, 2022
@bdice bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

No branches or pull requests

4 participants