-
Notifications
You must be signed in to change notification settings - Fork 891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] concatenate array of strings #7727
Comments
@revans2 any chance you could write out a basic example of what this operation does? I'm not quite following if this is an elementwise concatenation of the list elements that returns the same number of rows as the input, or similar to a Table concatenation. |
Also, does this relate to #4728? |
OK so here is a spark example for concat_ws
The first parameter to concate_ws is the separator string. All of the other parameters are to be concatenated together into an output string. If one of the parameters is an array/list of strings then the strings are each pulled out(similar to a flat map) and just treated like params to a regular concat. If you want to update
and possibly
These would act exactly like the existing #4728 is different, but also kind of related. That is a requirement we have to cast a struct to a string. The oddness there is that Spark has two different modes for this. In one case it inserts a "null" for null values. In the other it inserts an empty string. But @rwlee knows the details of what is needed better than I do. But it is related because at some point we will have to be able to cast an array to a string and we will start to run into similar situations that would have overlaps between this request and #4728 |
I'm having trouble mapping the example to the proposed API.
Where is the |
Hi David. The first parameter is just a |
Given a lists column of strings (each row is a list of strings), this PR facilitates the concatenation of strings within each list. For example: ``` s = [ {'aa', 'bb', 'cc'}, null, {null, 'dd'}, {'ee', 'ff'} ] r = strings::concatenate_list_elements(s, '+++') r is ['aa+++bb+++cc', null, null, 'ee+++ff'] ``` This PR is similar to Spark's `concat_ws`, and closes #7727. Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - David Wendt (https://github.com/davidwendt) - Mike Wilson (https://github.com/hyperbolic2346) - Jake Hemstad (https://github.com/jrhemstad) URL: #7929
Is your feature request related to a problem? Please describe.
We would like to support Spark's
concat_ws
function that can take any combination of strings or arrays of strings.Describe the solution you'd like
cudf already offers a number of string concat APIs that can take a table of strings and concat them. What I would like is the ability to take a single string column that is an array of strings and concatenate them just like the table APIs do. With that and the existing table APIs we should be able to build up
concat_ws
Describe alternatives you've considered
There is no good alternative The arrays could be variable length, so we cannot use any of the existing APIs that all assume a fixed number of inputs.
Additional context
Ideally we would want APIs that can either take a scalar string as the separator or a column_view of strings as the separator. If we could only get one of them, then the column_view version would be better.
The text was updated successfully, but these errors were encountered: