Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] cudf.concat resets field for struct columns #8802

Closed
shaneding opened this issue Jul 20, 2021 · 1 comment · Fixed by #8811
Closed

[BUG] cudf.concat resets field for struct columns #8802

shaneding opened this issue Jul 20, 2021 · 1 comment · Fixed by #8811
Assignees
Labels
bug Something isn't working Python Affects Python cuDF API.

Comments

@shaneding
Copy link
Contributor

Describe the bug
When concatenating two struct columns, the resulting struct field is reset.

Steps/Code to reproduce bug

>>> import cudf
>>> sr1 = cudf.Series([{"a": 5}, {"c": "hello"}, {"b": 7}])
>>> sr2 = cudf.Series([{"a": 5, "c": "hello", "b": 7}])
>>> cudf.concat([sr1, sr2])
0        {'0': 5.0, '1': None, '2': None}
1    {'0': None, '1': None, '2': 'hello'}
2        {'0': None, '1': 7.0, '2': None}
0      {'0': 5.0, '1': 7.0, '2': 'hello'}
dtype: struct
>>> 

Expected behavior
The result of the concatenation should be of the form

0        {'a': 5.0, 'b': None, 'c': None}
1    {'a': None, 'b': None, 'c': 'hello'}
2        {'a': None, 'b': 7.0, 'c': None}
0      {'a': 5.0, 'b': 7.0, 'c': 'hello'}
@shaneding shaneding added bug Something isn't working Needs Triage Need team to review and classify labels Jul 20, 2021
@shaneding shaneding added the Python Affects Python cuDF API. label Jul 20, 2021
@shaneding
Copy link
Contributor Author

I have narrowed the issue to the fact that concat in reshape.py is not properly re-populating the fields for the struct. However, one issue that needs to be addressed is when the two series to be concatenated have different fields for their structColumn, in that case we have no real way of generating fields in the correct final series. Hence, I am thinking of throwing an error for this particular case when there is a field mismatch.

@shwina shwina removed the Needs Triage Need team to review and classify label Jul 21, 2021
@shaneding shaneding self-assigned this Jul 21, 2021
rapids-bot bot pushed a commit that referenced this issue Jul 21, 2021
Closes #8802. Currently, when struct series are concatenated, the resulting fields are reset, this PR addresses this by reconstructing the fields once the concatenated series is returned.

Authors:
  - https://github.com/shaneding

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Charles Blackmon-Luca (https://github.com/charlesbluca)

URL: #8811
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants