Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic in calling to_numpy on a concatted DataFrame #16375

Closed
2 tasks done
rhshadrach-8451 opened this issue May 21, 2024 · 3 comments · Fixed by #16393
Closed
2 tasks done

Panic in calling to_numpy on a concatted DataFrame #16375

rhshadrach-8451 opened this issue May 21, 2024 · 3 comments · Fixed by #16393
Assignees
Labels
A-interop-numpy Area: interoperability with NumPy accepted Ready for implementation bug Something isn't working P-high Priority: high python Related to Python Polars regression Issue introduced by a new release

Comments

@rhshadrach-8451
Copy link

rhshadrach-8451 commented May 21, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

pl.concat(
    [
        pl.DataFrame({"a": [1, 1, 2], "b": [2, 3, 4]}),
        pl.DataFrame({"a": [1, 1, 2], "b": [2, 3, 4]}),
    ]
).to_numpy()

Log output

thread 'polars-0' panicked at /home/runner/work/polars/polars/crates/polars-core/src/chunked_array/ndarray.rs:156:33:
source slice length (3) does not match destination slice length (6)
thread 'polars-0' panicked at /home/runner/work/polars/polars/crates/polars-core/src/chunked_array/ndarray.rs:156:33:
source slice length (3) does not match destination slice length (6)
---------------------------------------------------------------------------
PanicException                            Traceback (most recent call last)
/tmp/ipykernel_26173/271991162.py in ?()
      5     [
      6         pl.DataFrame({"a": [1, 1, 2], "b": [2, 3, 4]}),
      7         pl.DataFrame({"a": [1, 1, 2], "b": [2, 3, 4]}),
      8     ]
----> 9 ).to_numpy()

[snip]/python3.10/site-packages/polars/_utils/deprecation.py in ?(*args, **kwargs)
    223         @wraps(function)
    224         def wrapper(*args: P.args, **kwargs: P.kwargs) -> T:
    225             if len(args) > num_allowed_args:
    226                 issue_deprecation_warning(msg, version=version)
--> 227             return function(*args, **kwargs)

[snip]/python3.10/site-packages/polars/dataframe/frame.py in ?(self, structured, order, allow_copy, writable, use_pyarrow)
   1583             for idx, c in enumerate(self.columns):
   1584                 out[c] = arrays[idx]
   1585             return out
   1586 
-> 1587         return self._df.to_numpy(order, writable=writable, allow_copy=allow_copy)

PanicException: source slice length (3) does not match destination slice length (6)

Issue description

The code sample above worked in 0.20.26, but not in 0.20.27. May be due to #16288.

Expected behavior

Not raising.

Installed versions

--------Version info---------
Polars:               0.20.27
Index type:           UInt32
Platform:             Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
Python:               3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            0.10.4
fsspec:               2024.5.0
gevent:               <not installed>
hvplot:               0.10.0
matplotlib:           3.9.0
nest_asyncio:         1.6.0
numpy:                1.26.4
openpyxl:             <not installed>
pandas:               2.2.2
pyarrow:              16.1.0
pydantic:             2.7.1
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           <not installed>
torch:                <not installed>
xlsx2csv:             0.8.2
xlsxwriter:           3.2.0
@rhshadrach-8451 rhshadrach-8451 added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels May 21, 2024
@cmdlineluser
Copy link
Contributor

I think you are right about the cause, as it does work with rechunk enabled.

pl.concat(
    [
        pl.DataFrame({"a": [1, 1, 2], "b": [2, 3, 4]}),
        pl.DataFrame({"a": [1, 1, 2], "b": [2, 3, 4]}),
    ],
    rechunk = True
).to_numpy()

# array([[1, 2],
#        [1, 3],
#        [2, 4],
#        [1, 2],
#        [1, 3],
#        [2, 4]])

@stinodego stinodego added A-interop Area: interoperability with other libraries regression Issue introduced by a new release P-high Priority: high A-interop-numpy Area: interoperability with NumPy and removed needs triage Awaiting prioritization by a maintainer A-interop Area: interoperability with other libraries labels May 21, 2024
@yburke94
Copy link

This error is also hitting us since upgrading to 0.20.27

@ritchie46
Copy link
Member

Fix coming up. I think I will do a patch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-interop-numpy Area: interoperability with NumPy accepted Ready for implementation bug Something isn't working P-high Priority: high python Related to Python Polars regression Issue introduced by a new release
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

6 participants