Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] cudf.testing.assert_*_equal raises AssertionError for equivalent DecimalDtyped objects #16635

Open
mroeschke opened this issue Aug 21, 2024 · 3 comments
Labels
bug Something isn't working Python Affects Python cuDF API.

Comments

@mroeschke
Copy link
Contributor

Describe the bug

In [1]: import cudf

In [2]: ser = cudf.Series([1], dtype=cudf.Decimal128Dtype(1))

In [3]: cudf.testing.assert_series_equal(ser, ser)

AssertionError: ColumnBase are different

values are different (100.0 %)
[left]:  {"[Decimal('1')]"}
[right]: {"[Decimal('1')]"}

Expected behavior
I would expect no AssertionError.

It appears there's a testing function, dtype_can_compare_equal_to_other, used in column comparisons that over-zealously assumes two objects with DecimalDtypes shouldn't be compared to each other.

Environment overview (please complete the following information)

  • Environment location: Bare-metal
  • Method of cuDF install: from source
@mroeschke mroeschke added bug Something isn't working Python Affects Python cuDF API. labels Aug 21, 2024
@AntiKnot
Copy link

AntiKnot commented Aug 23, 2024

hi @mroeschke

Based on change history

The changes introduce type checks on DecimalDtype that are not necessary to fix the bug,I think it's over-zealously.

@AntiKnot
Copy link

AntiKnot commented Aug 26, 2024

Hypothesis

cupy does not fully implement numpy's asarray method, at least dtype does not support Decimal128Dtype

Reproduce

I try to remove cudf.core.dtypes.DecimalDtype, in fun dtype_can_compare_equal_to_other, so Decimal128Dtype as a numeric dtype and can compare equal to other type.

def assert_column_equal(
...
                    left.apply_boolean_mask(
                        left.isnull().unary_operator("not")
                    ).values,
...

cudf/cudf/core/column/column.py

    @property
    def values(self) -> cupy.ndarray:
        """
        Return a CuPy representation of the Column.
        """
        if len(self) == 0:
            return cupy.array([], dtype=self.dtype)

        if self.has_nulls():
            raise ValueError("Column must have no nulls.")

        return cupy.asarray(self.data_array_view(mode="write"))

will raise

TypeError: Cannot interpret 'Decimal128Dtype(precision=1, scale=0)' as a data type

Reproduce the code example:

import cudf
ser = cudf.Series([1], dtype=cudf.Decimal128Dtype(1))
left = ser._column
left.apply_boolean_mask(left.isnull().unary_operator("not")).values

if numpy

import numpy
obj = left.apply_boolean_mask(left.isnull().unary_operator("not"))
numpy.asarray(obj)
Out[11]:
array(<cudf.core.column.decimal.Decimal128Column object at 0x726ea7de4f70>
[
  1
]
dtype: decimal128, dtype=object)

if cupy

import cupy
cupy.asarray(obj)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[14], line 1
----> 1 cupy.asarray(obj)

File ~/Code/cudf/.venv/lib/python3.10/site-packages/cupy/_creation/from_data.py:88, in asarray(a, dtype, order, blocking)
     56 def asarray(a, dtype=None, order=None, *, blocking=False):
     57     """Converts an object to array.
     58
     59     This is equivalent to ``array(a, dtype, copy=False, order=order)``.
   (...)
     86
     87     """
---> 88     return _core.array(a, dtype, False, order, blocking=blocking)

File cupy/_core/core.pyx:2408, in cupy._core.core.array()

File cupy/_core/core.pyx:2435, in cupy._core.core.array()

File cupy/_core/core.pyx:2574, in cupy._core.core._array_default()

ValueError: Unsupported dtype object

@mroeschke
Copy link
Contributor Author

We'll first need to assert that the dtypes are equivalent then probably use pandas assertion functions instead of cupy/numpy for comparing decimal values

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Affects Python cuDF API.
Projects
None yet
Development

No branches or pull requests

2 participants