Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression from 0.20.21 -> 0.20.22-rc.1 pl.Expr.list.to_array(n) is throwing polars.exceptions.ComputeError: not all elements have the specified width n #16693

Closed
2 tasks done
kszlim opened this issue Jun 3, 2024 · 1 comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@kszlim
Copy link
Contributor

kszlim commented Jun 3, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

I have a ldf in memory already, this isn't a pure MRE because it seems to rely on memory layout in some way

The following cases seem to fail:

ldf.with_columns(pl.col("some_list_col").list.to_array(3).alias("some_arr_col")).collect()
ldf.collect().with_columns(pl.col("some_list_col").list.to_array(3).alias("some_arr_col"))

But these succeed:

This one is interesting, as rechunk at the DF level seems to solve it, looks like it's a memory layout issue.

ldf.collect().rechunk().with_columns(pl.col("some_list_col").list.to_array(3).alias("some_arr_col"))
ldf.select(pl.col("some_list_col").list.to_array(3).alias("some_arr_col")).collect()
ldf.collect().write_parquet("output.parquet")
df = pl.read_parquet("output.parquet")
df.with_columns(pl.col("some_list_col").list.to_array(3).alias("some_arr_col"))

Log output

`run StackExec` # this happens in the:
`ldf.collect().with_columns(pl.col("some_list_col").list.to_array(3).alias("some_arr_col"))` case

Issue description

When converting a list to array, it seems to have a shape error:
ComputeError: not all elements have the specified width 3

Despite me checking that we have no nulls, all lists are of length 3. It works in a select context too (as opposed to with_columns).

I've gone through the versions and have bisected to this pull request which introduces this regression #15686

Might relate to #16540?

Expected behavior

Should work (and create a new column of array[f64, 3]

Installed versions

--------Version info---------
Polars:               0.20.31
Index type:           UInt32
Platform:             Linux-5.10.217-183.860.x86_64-x86_64-with-glibc2.26
Python:               3.11.7 (main, Dec  5 2023, 22:00:36) [GCC 7.3.1 20180712 (Red Hat 7.3.1-17)]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          3.0.0
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               2024.5.0
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           3.8.4
nest_asyncio:         1.6.0
numpy:                1.26.4
openpyxl:             <not installed>
pandas:               2.2.2
pyarrow:              16.1.0
pydantic:             2.7.2
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           <not installed>
torch:                <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>
@kszlim kszlim added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Jun 3, 2024
@kszlim
Copy link
Contributor Author

kszlim commented Jun 4, 2024

@ritchie46 would it be acceptable to revert that PR? It seems like it's spawned a couple bugs.

Seems like #16733 fixes the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

2 participants