We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Groupby.get_group
Describe the bug Using Groupby.get_group produces bad results when the selected group has repeated index values.
Steps/Code to reproduce bug
import cudf df = cudf.DataFrame( {"a": range(10), "b": [0] * 10}, index=[0] + list(range(9)), ) pdf = df.to_pandas() pdf.groupby("b").get_group(0)
Pandas returns expected behavior:
a b 0 0 0 0 1 0 1 2 0 2 3 0 3 4 0 4 5 0 5 6 0 6 7 0 7 8 0 8 9 0
df.groupby("b").get_group(0)
cuDF duplicates all rows with a 0 index:
0
a b 0 0 0 0 1 0 0 0 0 0 1 0 1 2 0 2 3 0 3 4 0 4 5 0 5 6 0 6 7 0 7 8 0 8 9 0
Additional context The bug seems to be in this line. That is, I'm pretty sure this line needs to change to something like:
return obj.loc[self.groups[name].drop_duplicates()]
The text was updated successfully, but these errors were encountered:
Fix Groupby.get_group (#14728)
7ca988f
Closes #14727 Authors: - Richard (Rick) Zamora (https://github.com/rjzamora) - Charles Blackmon-Luca (https://github.com/charlesbluca) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Charles Blackmon-Luca (https://github.com/charlesbluca) URL: #14728
rjzamora
Successfully merging a pull request may close this issue.
Describe the bug
Using
Groupby.get_group
produces bad results when the selected group has repeated index values.Steps/Code to reproduce bug
Pandas returns expected behavior:
cuDF duplicates all rows with a
0
index:Additional context
The bug seems to be in this line. That is, I'm pretty sure this line needs to change to something like:
The text was updated successfully, but these errors were encountered: