-
Notifications
You must be signed in to change notification settings - Fork 891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Unstack function changes column names to numeric format #7365
Comments
It appears that the dtype of the column we're trying to unstack is "category." It may be worth investigating why the "unstack" function changes the variable names for "category" dtypes. To resolve this, I set the column we wanted to unstack astype "object" and it worked seamlessly. Adjusting the code shown in the initial comment, it now works as follows.
|
Thanks for reporting. I hope you don't mind that I reopened this issue because this certainly sound like there's still a bug somewhere. One shouldn't have to convert their categorical data to |
I was able to track this down to a bug in the cudf/python/cudf/cudf/core/frame.py Lines 3493 to 3496 in aa72df7
Before returning |
That's very true. Thank you for reopening @shwina. I'll follow up on this as well. |
@BenikaHall - if you (or anyone else) would like to contribute a fix here, it should be relatively easy 🙂 In addition to the fix above, we'll need a couple of tests for this case in |
I'll see if I have some time next week to get a few tests done and a PR submitted. Thanks again @shwina. |
Slightly related - calling unstack on a Series:
working around by calling
|
Fixes #7365 Applies column metadata to the output columns of `keys` in `Frame._encode`; skipping this step meant that the output of `DataFrame.unstack` would not have the expected metadata for index columns: ```python import pandas as pd import cudf pdf = pd.DataFrame( { "foo": ["one", "one", "one", "two", "two", "two"], "bar": pd.Categorical(["A", "B", "C", "A", "B", "C"]), "baz": [1, 2, 3, 4, 5, 6], "zoo": ["x", "y", "z", "q", "w", "t"], }).set_index(["foo", "bar", "baz"]) gdf = cudf.from_pandas(pdf) pdf.unstack("baz") zoo baz 1 2 3 4 5 6 foo bar one A x NaN NaN NaN NaN NaN B NaN y NaN NaN NaN NaN C NaN NaN z NaN NaN NaN two A NaN NaN NaN q NaN NaN B NaN NaN NaN NaN w NaN C NaN NaN NaN NaN NaN t gdf.unstack("baz") zoo baz 1 2 3 4 5 6 foo bar one 0 x <NA> <NA> <NA> <NA> <NA> 1 <NA> y <NA> <NA> <NA> <NA> 2 <NA> <NA> z <NA> <NA> <NA> two 0 <NA> <NA> <NA> q <NA> <NA> 1 <NA> <NA> <NA> <NA> w <NA> 2 <NA> <NA> <NA> <NA> <NA> t ``` Authors: - Charles Blackmon-Luca (https://github.com/charlesbluca) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #8560
Describe the bug
After applying the unstack function, the variable names change to numeric format.
Steps/Code to reproduce bug
Expected behavior
After applying the unstack function, the column names should be preserved and not changed to 0,1,2,3,4,....
Environment overview (please complete the following information)
docker pull
&docker run
commands usedEnvironment details
Please run and paste the output of the
cudf/print_env.sh
script here, to gather any other relevant environment detailsAdditional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: