Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Bad results for Groupby.get_group when the index has repeated values #14727

Closed
rjzamora opened this issue Jan 9, 2024 · 0 comments · Fixed by #14728
Closed

[BUG] Bad results for Groupby.get_group when the index has repeated values #14727

rjzamora opened this issue Jan 9, 2024 · 0 comments · Fixed by #14728
Assignees
Labels
bug Something isn't working Python Affects Python cuDF API.

Comments

@rjzamora
Copy link
Member

rjzamora commented Jan 9, 2024

Describe the bug
Using Groupby.get_group produces bad results when the selected group has repeated index values.

Steps/Code to reproduce bug

import cudf

df = cudf.DataFrame(
    {"a": range(10), "b": [0] * 10},
    index=[0] + list(range(9)),
)
pdf = df.to_pandas()
pdf.groupby("b").get_group(0)

Pandas returns expected behavior:

   a  b
0  0  0
0  1  0
1  2  0
2  3  0
3  4  0
4  5  0
5  6  0
6  7  0
7  8  0
8  9  0
df.groupby("b").get_group(0)

cuDF duplicates all rows with a 0 index:

   a  b
0  0  0
0  1  0
0  0  0
0  1  0
1  2  0
2  3  0
3  4  0
4  5  0
5  6  0
6  7  0
7  8  0
8  9  0

Additional context
The bug seems to be in this line. That is, I'm pretty sure this line needs to change to something like:

return obj.loc[self.groups[name].drop_duplicates()]
@rjzamora rjzamora added bug Something isn't working Needs Triage Need team to review and classify Python Affects Python cuDF API. and removed Needs Triage Need team to review and classify labels Jan 9, 2024
@rjzamora rjzamora self-assigned this Jan 9, 2024
@rjzamora rjzamora mentioned this issue Jan 9, 2024
3 tasks
rapids-bot bot pushed a commit that referenced this issue Jan 12, 2024
Closes #14727

Authors:
  - Richard (Rick) Zamora (https://github.com/rjzamora)
  - Charles Blackmon-Luca (https://github.com/charlesbluca)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Charles Blackmon-Luca (https://github.com/charlesbluca)

URL: #14728
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Affects Python cuDF API.
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant