Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: to_timedelta raises unexpected OutOfBoundsTimedelta error with development version of NumPy #56996

Open
3 tasks done
spencerkclark opened this issue Jan 21, 2024 · 1 comment
Labels
Compat pandas objects compatability with Numpy or Python functions Dependencies Required and optional dependencies Timedelta Timedelta data type

Comments

@spencerkclark
Copy link
Contributor

spencerkclark commented Jan 21, 2024

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

>>> pd.to_timedelta(np.int32(0), "D")
Traceback (most recent call last):
  File "conversion.pyx", line 228, in pandas._libs.tslibs.conversion.cast_from_unit
OverflowError: Python integer 86400000000000 out of bounds for int32

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "timedeltas.pyx", line 377, in pandas._libs.tslibs.timedeltas._maybe_cast_from_unit
  File "conversion.pyx", line 230, in pandas._libs.tslibs.conversion.cast_from_unit
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: cannot convert input 0 with the unit 'D'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/spencer/mambaforge/envs/2024-01-21-upstream-minimal/lib/python3.11/site-packages/pandas/core/tools/timedeltas.py", line 225, in to_timedelta
    return _coerce_scalar_to_timedelta_type(arg, unit=unit, errors=errors)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/spencer/mambaforge/envs/2024-01-21-upstream-minimal/lib/python3.11/site-packages/pandas/core/tools/timedeltas.py", line 235, in _coerce_scalar_to_timedelta_type
    result = Timedelta(r, unit)
             ^^^^^^^^^^^^^^^^^^
  File "timedeltas.pyx", line 1896, in pandas._libs.tslibs.timedeltas.Timedelta.__new__
  File "timedeltas.pyx", line 354, in pandas._libs.tslibs.timedeltas.convert_to_timedelta64
  File "timedeltas.pyx", line 379, in pandas._libs.tslibs.timedeltas._maybe_cast_from_unit
pandas._libs.tslibs.np_datetime.OutOfBoundsTimedelta: Cannot cast 0 from D to 'ns' without overflow.

Issue Description

A test failure in xarray's build against the development versions of upstream packages (pydata/xarray#8623) can be boiled down to the reproducible example above. It seems like something goes wrong in casting an int32 value to a Timedelta with the development version of NumPy.

Expected Behavior

I would expect the example to run without an error:

>>> pd.to_timedelta(np.int32(0), "D")
Timedelta('0 days 00:00:00')

Installed Versions

/Users/spencer/mambaforge/envs/2024-01-21-upstream-minimal/lib/python3.11/site-packages/_distutils_hack/init.py:33: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")

INSTALLED VERSIONS

commit : 21b3906
python : 3.11.7.final.0
python-bits : 64
OS : Darwin
OS-release : 23.2.0
Version : Darwin Kernel Version 23.2.0: Wed Nov 15 21:54:10 PST 2023; root:xnu-10002.61.3~2/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 3.0.0.dev0+149.g21b3906a39
numpy : 2.0.0.dev0+git20240120.6bd3abf
pytz : 2023.3.post1
dateutil : 2.8.2
setuptools : 69.0.3
pip : 23.3.2
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.4
qtpy : None
pyqt5 : None

@spencerkclark
Copy link
Contributor Author

spencerkclark commented Mar 23, 2024

The issue appears to be in this section of the code:

# cast the unit, multiply base/frac separately
# to avoid precision issues from float -> int
try:
base = <int64_t>ts
except OverflowError as err:
raise OutOfBoundsDatetime(
f"cannot convert input {ts} with the unit '{unit}'"
) from err
frac = ts - base
if p:
frac = round(frac, p)
try:
return <int64_t>(base * m) + <int64_t>(frac * m)
except OverflowError as err:
raise OutOfBoundsDatetime(
f"cannot convert input {ts} with the unit '{unit}'"
) from err

In line 223, ts is of type np.int32 and base is a Python integer. With NumPy < 2, frac is a Python integer:

>>> np.int32(0) - 0
0

but with NumPy >= 2, frac is of type np.int32:

>>> np.int32(0) - 0
np.int32(0)

Further, NumPy >= 2 then raises when the np.int32 frac is multiplied by m, which is a Python integer:

>>> np.int32(0) * 86400000000000
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python integer 86400000000000 out of bounds for int32

I believe this change in behavior is described more fully in NEP 50. What do we think the cleanest way to address this might be? It seems some sort of manual type promotion will be necessary (or perhaps just not using frac if ts is an integer, since it should always be zero).

@lithomas1 lithomas1 added Timedelta Timedelta data type Compat pandas objects compatability with Numpy or Python functions Dependencies Required and optional dependencies and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Dependencies Required and optional dependencies Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants