Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't use the condition "if var is in a list of values" in pl.filter #6458

Closed
2 tasks done
oliviermeslin opened this issue Jan 26, 2023 · 4 comments
Closed
2 tasks done
Labels
bug Something isn't working python Related to Python Polars

Comments

@oliviermeslin
Copy link

oliviermeslin commented Jan 26, 2023

Polars version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of Polars.

Issue description

First, thanks for developing this amazing library! Coming from R, I feel like polars is as good as data.table (and this really means something to me!).

TL; DR: I would like to use filter like this: df.filter(pl.col("myvar") in liste_values), and I can't find how to do it. I read the documentation carefully but did not find anything on this specific point.

Details:

  • I frequently select rows in dataframes by a condition where the value of a column must be in a list of values. In pandas, I use this syntax : df.query('cars in ["beetle", "SUV"]').
  • For some reason, I couldn't figure out how to the same thing in polars. I think I might not understand completely how Expressions work in polars, given that I get an error ValueError: Since Expr are lazy, the truthiness of an Expr is ambiguous. Hint: use '&' or '|' to chain Expr together, not and/or.

Question:

Is there a simple syntax that would make it possible to use this filtering method? Of course, I can propose a PR on the documentation to add some details on this use case if there is a solution.

Reproducible example

###### Pandas case
import pandas as pd
df = pd.DataFrame(
    {
        "A": [1, 2, 3, 4, 5],
        "fruits": ["banana", "banana", "apple", "apple", "banana"],
        "B": [5, 4, 3, 2, 1],
        "cars": ["beetle", "audi", "beetle", "beetle", "SUV"],
    }
)

# This works
df.query('cars in ["beetle", "SUV"]')

###### Polars case
import polars as pl
df = pl.DataFrame(
    {
        "A": [1, 2, 3, 4, 5],
        "fruits": ["banana", "banana", "apple", "apple", "banana"],
        "B": [5, 4, 3, 2, 1],
        "cars": ["beetle", "audi", "beetle", "beetle", "SUV"],
    }
)

# This does not work
df.filter(pl.col("car") in ["beetle", "SUV"])

Expected behavior

###### Polars case
import polars as pl
df = pl.DataFrame(
    {
        "A": [1, 2, 3, 4, 5],
        "fruits": ["banana", "banana", "apple", "apple", "banana"],
        "B": [5, 4, 3, 2, 1],
        "cars": ["beetle", "audi", "beetle", "beetle", "SUV"],
    }
)

###### An expression starting with df.filter()
###### Output

┌─────┬────────┬─────┬────────┐
│ A   ┆ fruits ┆ B   ┆ cars   │
│ --- ┆ ---    ┆ --- ┆ ---    │
│ i64 ┆ str    ┆ i64 ┆ str    │
╞═════╪════════╪═════╪════════╡
│ 1   ┆ banana ┆ 5   ┆ beetle │
│ 3   ┆ apple  ┆ 3   ┆ beetle │
│ 4   ┆ apple  ┆ 2   ┆ beetle │
│ 5   ┆ banana ┆ 1   ┆ SUV    │
└─────┴────────┴─────┴────────┘

Installed versions

---Version info---
Polars: 0.15.17
Index type: UInt32
Platform: Windows-10-10.0.17763-SP0
Python: 3.10.8 | packaged by conda-forge | (main, Nov 24 2022, 14:07:00) [MSC v.1916 64 bit (AMD64)]
---Optional dependencies---
pyarrow: 10.0.1
pandas: 1.5.2
numpy: 1.23.5
fsspec: 2022.11.0
connectorx: <not installed>
xlsx2csv: <not installed>
deltalake: <not installed>
matplotlib: 3.6.2```

</details>
@oliviermeslin oliviermeslin added bug Something isn't working python Related to Python Polars labels Jan 26, 2023
@gfmartins
Copy link

the syntax is wrong
try this:
df.filter(pl.col("cars").is_in(["beetle", "SUV"]))

https://pola-rs.github.io/polars/py-polars/html/reference/expressions/api/polars.Expr.is_in.html#polars.Expr.is_in

@oliviermeslin
Copy link
Author

Thanks a lot @gfmartins! Although I thought I had read it carefully, the solution was indeed in the documentation... Sorry for opening a pointless issue!

@alexander-beedie
Copy link
Collaborator

@oliviermeslin: FYI, I'm adding a little extra help to the error that gets raised here, to point anyone else doing the same thing in the right direction 👍

# ValueError: Since Expr are lazy, the truthiness of an Expr is ambiguous. 
#  Hint: use '&' or '|' to logically combine Expr, not 'and'/'or', and
#  use 'x.is_in([y,z])' instead of 'x in [y,z]' to check membership.

@oliviermeslin
Copy link
Author

@alexander-beedie : That is a good idea! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

3 participants