Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix cumulative count index behavior (#11188)
During #10889 I found that the result was wrong for `cumcount` in the case of more than one single partition. Digging I found that this was because cuDF python always resets the index of `cumcount` operations meaning the index of the reassembled result would be wrong. It also needs the temporary object it groups on to have to original objects index in order for the post-processing functions to correctly set the index. This PR fixes it as such and adds a test. example old behavior: ```python >>> import pandas as pd >>> import cudf >>> df = pd.DataFrame({ ... 'a':[1,2,3,4,5,6] ... }, index=[1,2,3,4,5,6] ... ) >>> df a 1 1 2 2 3 3 4 4 5 5 6 6 >>> df.groupby('a').cumcount() 1 0 2 0 3 0 4 0 5 0 6 0 dtype: int64 >>> cudf.from_pandas(df).groupby('a').cumcount() 0 0 1 0 2 0 3 0 4 0 5 0 dtype: int32 ``` Authors: - https://github.com/brandon-b-miller Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Vyas Ramasubramani (https://github.com/vyasr) URL: #11188
- Loading branch information