Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] JIT GroupBy.apply aggregations with int32 dtype may return different results from pandas when large values are present #13873

Closed
brandon-b-miller opened this issue Aug 14, 2023 · 0 comments · Fixed by #13943
Assignees
Labels
bug Something isn't working numba Numba issue Python Affects Python cuDF API.

Comments

@brandon-b-miller
Copy link
Contributor

Describe the bug
When executing a sum aggregation in a groupby apply UDF, the result can diverge from pandas in some cases, specifically when large values are present. This is likely due to some result not being promoted to int64 somewhere it ought to be.

Steps/Code to reproduce bug

import pandas as pd
import cudf
import numpy as np

int32_max = np.iinfo(np.int32).max

df = cudf.DataFrame(
    {
        'a':[0,0,0,0,0,0,0,0,0,0],
        'b':[int32_max,2,3,4,5,6,7,8,9,10]        
    }
)

df['b'] = df['b'].astype('int32')
pdf = df.to_pandas()

def func(group):
    return group['b'].sum()

pandas_result = pdf.groupby('a').apply(func)
cudf_result = df.groupby('a').apply(func, engine='jit')

print(pandas_result)
print(cudf_result)

Expected behavior
Results should be the same as pandas.

Environment overview (please complete the following information)
Bare-metal, 23.10

Additional context
came up during #13813. The above test fails for mean, var, and std as well. Since these three reductions also involve a summation, I suspect the root cause could be the same.

@brandon-b-miller brandon-b-miller added bug Something isn't working numba Numba issue Python Affects Python cuDF API. labels Aug 14, 2023
@brandon-b-miller brandon-b-miller self-assigned this Aug 14, 2023
rapids-bot bot pushed a commit that referenced this issue Aug 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working numba Numba issue Python Affects Python cuDF API.
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant