Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot chain over() and list.set_intersection() in polars-1.0.0beta1 (works in 0.20.31) #17129

Closed
2 tasks done
ruomad opened this issue Jun 22, 2024 · 1 comment · Fixed by #17154
Closed
2 tasks done
Assignees
Labels
accepted Ready for implementation bug Something isn't working P-high Priority: high python Related to Python Polars regression Issue introduced by a new release
Milestone

Comments

@ruomad
Copy link

ruomad commented Jun 22, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

df = pl.DataFrame({
	"a": [1, 2, 2, 3, 3],
	"b": [2, 2, 4, 7, 8]
	})

df.with_columns(
      pl.col("b")
	 .over("a", mapping_strategy = 'join')
	 .list.set_intersection([4, 8])
         .alias("intersect")
	)

yields expected result in polars 0.20.31 :

shape: (5, 3)
┌─────┬─────┬───────────┐
│ a   ┆ b   ┆ intersect │
│ --- ┆ --- ┆ ---       │
│ i64 ┆ i64 ┆ list[i64] │
╞═════╪═════╪═══════════╡
│ 1   ┆ 2   ┆ []        │
│ 2   ┆ 2   ┆ [4]       │
│ 2   ┆ 4   ┆ [4]       │
│ 3   ┆ 7   ┆ [8]       │
│ 3   ┆ 8   ┆ [8]       │
└─────┴─────┴───────────┘

but fails in 1.0.0beta1 with :

Traceback (most recent call last):
  File "...\Lib\site-packages\polars\dataframe\frame.py", line 8592, in with_columns
    return self.lazy().with_columns(*exprs, **named_exprs).collect(_eager=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\Lib\site-packages\polars\lazyframe\frame.py", line 1896, in collect
    return wrap_df(ldf.collect(callback))
                   ^^^^^^^^^^^^^^^^^^^^^
polars.exceptions.InvalidOperationError: could not determine supertype of: [i64, list[i64]]

Log output

No response

Issue description

Chaining over() and list.set_intersection() in the same expression works in 0.20.31 but fails in 1.0.0beta1

It works though if the operations are not directly chained :

(df.with_columns(
      pl.col("b")
	.over("a", mapping_strategy = 'join')
      )
   .with_columns(
      pl.col("b")
	.list.set_intersection([4, 8])
        .alias("intersect")
      )
)

Is this expected ?

Expected behavior

No error and correct result

Installed versions

--------Version info---------
Polars:               1.0.0-beta.1
Index type:           UInt32
Platform:             Windows-10-10.0.19045-SP0
Python:               3.12.4 (tags/v3.12.4:8e8a4ba, Jun  6 2024, 19:30:16) [MSC v.1940 64 bit (AMD64)]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               2024.6.0
gevent:               <not installed>
great_tables:         <not installed>
hvplot:               <not installed>
matplotlib:           <not installed>
nest_asyncio:         <not installed>
numpy:                1.26.4
openpyxl:             <not installed>
pandas:               <not installed>
pyarrow:              16.1.0
pydantic:             <not installed>
pyiceberg:            <not installed>
sqlalchemy:           <not installed>
torch:                <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           3.2.0
@ruomad ruomad added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Jun 22, 2024
@stinodego stinodego added the regression Issue introduced by a new release label Jun 22, 2024
@stinodego
Copy link
Member

Thanks for the report. This must be a byproduct of #16918

@stinodego stinodego added P-high Priority: high and removed needs triage Awaiting prioritization by a maintainer labels Jun 22, 2024
@stinodego stinodego added this to the 1.0.0 milestone Jun 22, 2024
@ritchie46 ritchie46 self-assigned this Jun 24, 2024
@c-peters c-peters added the accepted Ready for implementation label Jul 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation bug Something isn't working P-high Priority: high python Related to Python Polars regression Issue introduced by a new release
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants