Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): Support shortcut eval of common boolean filters in SQL interface "WHERE" clause #18571

Merged
merged 2 commits into from
Sep 7, 2024

Conversation

alexander-beedie
Copy link
Collaborator

@alexander-beedie alexander-beedie commented Sep 5, 2024

Closes #18373.

Update

  • Shortcut-eval for common SQL "WHERE" clause filters such as:
    • WHERE TRUE
    • WHERE FALSE
    • WHERE 1 = 1
    • WHERE 1 != 0

This pattern is a bit odd-looking, but often arises from programmatic query builders that want to append to a WHERE clause without having to check if a new filter is the first (so they can always "AND" it onto the existing clause). As a result a query may always contain "WHERE 1=1" (or equivalent), with the expectation that it is optimised out.

When the SQL parser sees that a WHERE clause represents such a pattern it now skips frame filtering entirely, immediately returning the same frame (if TRUE) or an empty frame with an identical schema (if FALSE).

(Note: you usually see the "TRUE" version of this, but once in a blue moon you'll see something evaluating to "FALSE" as some query-building optimisers may generate this if they can evaluate the truth-value of a constraint clause before it hits the backend).

Also

Minor internal module cleanup/refactor:

  • Datatype-specific code moved from sql_expr.rs to new types.rs.

Examples

import polars as pl

df = pl.DataFrame({
  "x": ["aa", "bb", "cc"],
  "y": [1, 2, 3],
})

df.sql("SELECT * FROM self WHERE TRUE")
# shape: (3, 2)
# ┌─────┬─────┐
# │ x   ┆ y   │
# │ --- ┆ --- │
# │ str ┆ i64 │
# ╞═════╪═════╡
# │ aa  ┆ 1   │
# │ bb  ┆ 2   │
# │ cc  ┆ 3   │
# └─────┴─────┘

df.sql("SELECT * FROM self WHERE 1 != 1")
# shape: (0, 2)
# ┌─────┬─────┐
# │ x   ┆ y   │
# │ --- ┆ --- │
# │ str ┆ i64 │
# ╞═════╪═════╡
# └─────┴─────┘

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars labels Sep 5, 2024
@alexander-beedie alexander-beedie force-pushed the sql-shortcut-filter-eval branch 5 times, most recently from 80219c7 to f211265 Compare September 6, 2024 07:30
@alexander-beedie alexander-beedie added the A-sql Area: Polars SQL functionality label Sep 6, 2024
@ritchie46 ritchie46 merged commit ac4b114 into pola-rs:main Sep 7, 2024
28 of 30 checks passed
@alexander-beedie alexander-beedie deleted the sql-shortcut-filter-eval branch September 7, 2024 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql Area: Polars SQL functionality enhancement New feature or an improvement of an existing feature python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"WHERE true/false" breaks in SQLContext on DataFrame with null columns
2 participants