forked from rapidsai/cudf
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Rewrite
DataFrame.stack
to support multi level column names (rapids…
…ai#13927) This PR rewrites `DataFrame.stack()`. Adding support to stacking multiple levels in the dataframe. User can now specify one or more levels from the column names to stack. Example: ```python >>> multicol1 = pd.MultiIndex.from_tuples([('weight', 'kg'), ... ('weight', 'pounds')]) >>> df_multi_level_cols1 = cudf.DataFrame([[1, 2], [2, 4]], ... index=['cat', 'dog'], ... columns=multicol1) >>> df_multi_level_cols1.stack(0) kg pounds cat weight 1 2 dog weight 2 4 >>> df_multi_level_cols1.stack([0, 1]) cat weight kg 1 pounds 2 dog weight kg 2 pounds 4 dtype: int64 ``` The implementation heavily uses pandas index methods on the column axis. This assumes that the width of the cudf column is limited. The combination of `len(level) > 1 and dropna=False` is currently unsupported. The corresponding behavior in pandas is due to be deprecated in 3.0. See pandas-dev/pandas#53515. closes rapidsai#13739 Authors: - Michael Wang (https://github.com/isVoid) Approvers: - Lawrence Mitchell (https://github.com/wence-) - Matthew Roeschke (https://github.com/mroeschke) URL: rapidsai#13927
- Loading branch information
Showing
2 changed files
with
363 additions
and
40 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.