Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Allow where() to work with a Series and other=cudf.NA #8969

Closed
sarahyurick opened this issue Aug 5, 2021 · 2 comments · Fixed by #9019
Closed

[BUG] Allow where() to work with a Series and other=cudf.NA #8969

sarahyurick opened this issue Aug 5, 2021 · 2 comments · Fixed by #9019
Assignees
Labels
bug Something isn't working Python Affects Python cuDF API.

Comments

@sarahyurick
Copy link
Contributor

In Pandas, I can do this:

import pandas as pd

df = pd.DataFrame()
df['id'] = [0, 1, 1]
df['val'] = [0, 4, 2]

df.where(df['id']==1, pd.NA)
	id	val
0	<NA>	<NA>
1	1	4
2	1	2

however, when I try in cuDF, I get:

import cudf

df = cudf.DataFrame()
df['id'] = [0, 1, 1]
df['val'] = [0, 4, 2]

df.where(df['id']==1, cudf.NA)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_50448/3560034258.py in <module>
      5 df['val'] = [0, 4, 2]
      6 
----> 7 df.where(df['id']==1, cudf.NA)

~/miniconda3/envs/cudf_dev/lib/python3.8/site-packages/cudf/core/frame.py in where(self, cond, other, inplace)
    878         """
    879 
--> 880         return cudf.core._internals.where(
    881             frame=self, cond=cond, other=other, inplace=inplace
    882         )

~/miniconda3/envs/cudf_dev/lib/python3.8/site-packages/cudf/core/_internals/where.py in where(frame, cond, other, inplace)
    263             cond.columns = frame.columns
    264 
--> 265         (source_df, others,) = _normalize_columns_and_scalars_type(
    266             frame, other
    267         )

~/miniconda3/envs/cudf_dev/lib/python3.8/site-packages/cudf/core/_internals/where.py in _normalize_columns_and_scalars_type(frame, other, inplace)
    158                     source_col,
    159                     other_scalar,
--> 160                 ) = _check_and_cast_columns_with_other(
    161                     source_col=source_df._data[col_name],
    162                     other=other

~/miniconda3/envs/cudf_dev/lib/python3.8/site-packages/cudf/core/_internals/where.py in _check_and_cast_columns_with_other(source_col, other, inplace)
     44 
     45     if cudf.utils.dtypes.is_scalar(other):
---> 46         device_obj = _normalize_scalars(source_col, other)
     47     else:
     48         device_obj = other

~/miniconda3/envs/cudf_dev/lib/python3.8/site-packages/cudf/core/_internals/where.py in _normalize_scalars(col, other)
     28         )
     29 
---> 30     return cudf.Scalar(other, dtype=col.dtype if other in None else None)
     31 
     32 

TypeError: argument of type 'NoneType' is not iterable
@sarahyurick sarahyurick added bug Something isn't working Python Affects Python cuDF API. labels Aug 5, 2021
@sarahyurick sarahyurick self-assigned this Aug 5, 2021
@beckernick
Copy link
Member

Are we unable to handle the series itself or just the other=cudf.NA? I believe we recently did some work on this in #7383

@sarahyurick
Copy link
Contributor Author

sarahyurick commented Aug 5, 2021

@beckernick Good point - I just tested it and I get:

import pandas as pd

df = pd.DataFrame()
df['id'] = [0, 1, 1]
df['val'] = [0, 4, 2]

df.where(df['id']==1, 28)
	id	val
0	28	28
1	1	4
2	1	2

versus

import cudf

df = cudf.DataFrame()
df['id'] = [0, 1, 1]
df['val'] = [0, 4, 2]

df.where(df['id']==1, 28)
	id	val
0	28	<NA>
1	1	<NA>
2	1	<NA>

so I'll have to make sure that case is handled, too.

EDIT: Actually, it looks like it's getting mangled with some of the code I've already written. I'll make sure to keep this case and PR #8747 in mind, though.

@sarahyurick sarahyurick changed the title [BUG] Allow where() to work with a Series [BUG] Allow where() to work with a Series and other=cudf.NA Aug 5, 2021
rapids-bot bot pushed a commit that referenced this issue Aug 11, 2021
Fixes #8969.

Duplicate of #8977 - some of the checks are erroring and I'm seeing strange messages about the git commits, so I'm re-opening the PR here to see if that fixes it.

Authors:
  - Sarah Yurick (https://github.com/sarahyurick)

Approvers:
  - Ashwin Srinath (https://github.com/shwina)

URL: #9019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Affects Python cuDF API.
Projects
None yet
3 participants