Skip to content

Commit

Permalink
Fix null handling
Browse files Browse the repository at this point in the history
The result returned from libcudf has the correct values hidden behind
the null mask of the searched-for needles. We can therefore just drop
one if it exists to obtain the result we want to match pandas.
  • Loading branch information
wence- committed Nov 23, 2023
1 parent 7848147 commit cef017e
Showing 1 changed file with 8 additions and 2 deletions.
10 changes: 8 additions & 2 deletions python/cudf/cudf/core/column/column.py
Original file line number Diff line number Diff line change
Expand Up @@ -918,8 +918,14 @@ def _obtain_isin_result(self, rhs: ColumnBase) -> ColumnBase:
"""
# We've already matched dtypes by now
result = libcudf.search.contains(rhs, self)
if result.null_count:
return result.fillna(False)
# libcudf contains runs the search with nulls comparing equal
# and then copying the bitmask from the needles to the result.
# In cudf, we want nulls in both needle and haystack to
# produce True and nulls in only one to produce False. If we
# drop the bitmask of the result on the floor, it already
# contains the values we want. We can do this unilaterally
# without checking the null count of the result.
result.set_base_mask(None)
return result

def as_mask(self) -> Buffer:
Expand Down

0 comments on commit cef017e

Please sign in to comment.