Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] MultiIndex.to_frame not throwing an error for duplicate names #14085

Closed
galipremsagar opened this issue Sep 12, 2023 · 1 comment · Fixed by #14105
Closed

[BUG] MultiIndex.to_frame not throwing an error for duplicate names #14085

galipremsagar opened this issue Sep 12, 2023 · 1 comment · Fixed by #14105
Assignees
Labels
bug Something isn't working Python Affects Python cuDF API.

Comments

@galipremsagar
Copy link
Contributor

Describe the bug
MultiIndex allows for duplicate names, but when we call to_frame, that shouldn't be allowed. Cudf currently ignores this and creates a dataframe.

Steps/Code to reproduce bug

In [1]: import pandas as pd

In [2]: data = [(1, 2), (3, 4)]
   ...: names = ["a", "a"]
   ...: index = pd.MultiIndex.from_tuples(data, names=names)

In [3]: index
Out[3]: 
MultiIndex([(1, 2),
            (3, 4)],
           names=['a', 'a'])

In [4]: index.to_frame()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[4], line 1
----> 1 index.to_frame()

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/pandas/core/indexes/multi.py:1822, in MultiIndex.to_frame(self, index, name, allow_duplicates)
   1819     idx_names = self._get_level_names()
   1821 if not allow_duplicates and len(set(idx_names)) != len(idx_names):
-> 1822     raise ValueError(
   1823         "Cannot create duplicate column labels if allow_duplicates is False"
   1824     )
   1826 # Guarantee resulting column order - PY36+ dict maintains insertion order
   1827 result = DataFrame(
   1828     {level: self._get_level_values(level) for level in range(len(self.levels))},
   1829     copy=False,
   1830 )

ValueError: Cannot create duplicate column labels if allow_duplicates is False

In [5]: import cudf
mid
In [6]: midx = cudf.from_pandas(index)
m
In [7]: midx
Out[7]: 
MultiIndex([(1, 2),
            (3, 4)],
           names=['a', 'a'])

In [8]: midx.to_frame()
Out[8]: 
     0  1
a a      
1 2  1  2
3 4  3  4

Expected behavior
A clear and concise description of what you expected to happen.

Environment overview (please complete the following information)

  • Environment location: [Bare-metal]
  • Method of cuDF install: [from source]
@galipremsagar galipremsagar added bug Something isn't working Python Affects Python cuDF API. labels Sep 12, 2023
@galipremsagar galipremsagar self-assigned this Sep 12, 2023
@galipremsagar
Copy link
Contributor Author

Note to self, an example that has to still work in cudf too:

In [1]: import pandas as pd

In [2]: data = [(1, 2), (3, 4)]
   ...: names = ["a", "a"]
   ...: index = pd.MultiIndex.from_tuples(data, names=names)

In [3]: pd.Series(index=index)
<ipython-input-3-993c0f1bab81>:1: FutureWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
  pd.Series(index=index)
Out[3]: 
a  a
1  2   NaN
3  4   NaN
dtype: float64

In [4]: pd.Series([1, 2], index=index)
Out[4]: 
a  a
1  2    1
3  4    2
dtype: int64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Affects Python cuDF API.
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant