Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Add list len support #7157

Closed
albert17 opened this issue Jan 15, 2021 · 4 comments
Closed

[FEA] Add list len support #7157

albert17 opened this issue Jan 15, 2021 · 4 comments
Assignees
Labels
feature request New feature or request Python Affects Python cuDF API.

Comments

@albert17
Copy link

Is your feature request related to a problem? Please describe.
I need to be able to get list len for the entries of a column whose data type is list. @kkraus14 told me this is not implemented yet.

It works for string: ddf[col] = ddf[col].map_partitions(lambda x: x.str.len())

But not for list: ddf[col] = ddf[col].map_partitions(lambda x: x.list.len(), meta=(col, ddf_dtypes[col].dtype)) I get the error Exception: AttributeError("'ListMethods' object has no attribute 'len'")

@albert17 albert17 added Needs Triage Need team to review and classify feature request New feature or request labels Jan 15, 2021
@kkraus14 kkraus14 added Python Affects Python cuDF API. libcudf Affects libcudf (C++/CUDA) code. and removed Needs Triage Need team to review and classify labels Jan 15, 2021
@kkraus14
Copy link
Collaborator

I believe we'd need a new libcudf function of something like cudf::list::count_elements or something similar, where the output would be a column of integers with the number of elements in each list element.

@revans2
Copy link
Contributor

revans2 commented Jan 15, 2021

+1 we could definitely use this for spark too.

@kkraus14 kkraus14 added the Spark Functionality that helps Spark RAPIDS label Jan 15, 2021
@davidwendt davidwendt self-assigned this Jan 19, 2021
rapids-bot bot pushed a commit that referenced this issue Jan 25, 2021
This adds the libcudf part of #7157 

```
std::unique_ptr<column> cudf::lists::count_elements(
  lists_column_view const& input,
  rmm::mr::device_memory_resource* mr);
```

Returns the size of each element in the input lists column.
The PR also includes gtests for this new API.

Authors:
  - David (@davidwendt)

Approvers:
  - @nvdbaranec
  - AJ Schmidt (@ajschmidt8)
  - Karthikeyan (@karthikeyann)
  - Mark Harris (@harrism)

URL: #7173
@isVoid
Copy link
Contributor

isVoid commented Jan 29, 2021

@shwina and I will pair on introducing the python API.

@kkraus14 kkraus14 removed Spark Functionality that helps Spark RAPIDS libcudf Affects libcudf (C++/CUDA) code. labels Jan 29, 2021
rapids-bot bot pushed a commit that referenced this issue Feb 4, 2021
Closes #7157 

This PR adds `ListMethods.len()` API that returns an integer column that contains the length for each element in a `ListColumn`.
Example:
```python
>>> s = cudf.Series([[1,2], None, [3]])
>>> s
0    [1, 2]
1      None
2       [3]
dtype: list
>>> s.list.len()
0       2
1    <NA>
2       1
dtype: int32
```

Authors:
  - Michael Wang (@isVoid)
  - Ashwin Srinath (@shwina)

Approvers:
  - Keith Kraus (@kkraus14)
  - @brandon-b-miller

URL: #7283
@kkraus14
Copy link
Collaborator

kkraus14 commented Feb 4, 2021

Fixed by #7283

@kkraus14 kkraus14 closed this as completed Feb 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request Python Affects Python cuDF API.
Projects
None yet
Development

No branches or pull requests

6 participants