Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): Add top-level pl.sql function #16528

Merged
merged 1 commit into from
May 29, 2024

Conversation

alexander-beedie
Copy link
Collaborator

@alexander-beedie alexander-beedie commented May 27, 2024

Closes #16503.

In a bit of synchronicity with the universe, @lucazanna raised the above request hours before I was going to add this feature anyway 😎

It's a natural extension to the earlier #15783, offering a top-level pl.sql function and, as you'd expect, similarly takes advantage of SQLContext under the hood, automatically looking into the globals to identify frames referenced in the query.

Example

from datetime import date
import polars as pl

df1 = pl.DataFrame({
    "a": [1, 2, 3],
    "b": ["zz", "yy", "xx"],
    "c": [date(1999,12,31), date(2010,10,10), date(2077,8,8)],
})
df2 = pl.DataFrame({
    "a": [3, 2, 1],
    "d": [125, -654, 888],
})

pl.sql("""
    SELECT df1.*, d
    FROM df1
    INNER JOIN df2 USING (a)
    WHERE a > 1 AND EXTRACT(year FROM c) < 2050
""").collect()

# shape: (1, 4)
# ┌─────┬─────┬────────────┬──────┐
# │ a   ┆ b   ┆ c          ┆ d    │
# │ --- ┆ --- ┆ ---        ┆ ---  │
# │ i64 ┆ str ┆ date       ┆ i64  │
# ╞═════╪═════╪════════════╪══════╡
# │ 2   ┆ yy  ┆ 2010-10-10 ┆ -654 │
# └─────┴─────┴────────────┴──────┘

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars labels May 27, 2024
@alexander-beedie alexander-beedie added the A-sql Area: Polars SQL functionality label May 27, 2024
Copy link

codecov bot commented May 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.53%. Comparing base (cc2c905) to head (32de331).
Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #16528      +/-   ##
==========================================
+ Coverage   81.51%   81.53%   +0.01%     
==========================================
  Files        1410     1411       +1     
  Lines      185063   185116      +53     
  Branches     2983     2983              
==========================================
+ Hits       150860   150928      +68     
+ Misses      33687    33672      -15     
  Partials      516      516              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@djouallah
Copy link

djouallah commented May 27, 2024

ok that's interesting !!!!
would you mind please adding show method for compatibility reason with all other Python SQL Engines

pl.sql("""
    SELECT df1.*, d
    FROM df1
    INNER JOIN df2 USING (a)
    WHERE a > 1 AND EXTRACT(year FROM c) < 2050
""").show()

@alexander-beedie
Copy link
Collaborator Author

alexander-beedie commented May 27, 2024

ok that's interesting !!!!

Thanks ;)

ok that's interesting !!!! would you mind please adding show method for compatibility reason with all other Python SQL Engines

I don't really see why we would need/want that? The default return type from this function is a standard polars LazyFrame, which you can call .collect() on if you want to see the resulting DataFrame. That will show you a table representation of the data with its standard repr output 👍

(Alternatively you can also pass "eager=True" to get a DataFrame back directly).

@djouallah
Copy link

just for syntax compatibility with datafusion, duckdb, pyspark and glaredb.

https://colab.research.google.com/drive/1_iD6S6MR88B1Ym91pXksmtqaQzF4seQa

@alexander-beedie
Copy link
Collaborator Author

alexander-beedie commented May 27, 2024

just for syntax compatibility with datafusion, duckdb, pyspark and glaredb.

Ahhh; ok, a show() method for us would have to be a feature request on DataFrame rather than anything special to the SQL interface, as pl.sql will just return a LazyFrame or DataFrame as usual - it isn't a different type of polars object because the SQL interface was used. If you can make a feature request for something like that we can take a look at it separately 👍

@djouallah
Copy link

thanks done
#16534

@ritchie46 ritchie46 merged commit 243b61e into pola-rs:main May 29, 2024
18 checks passed
@alexander-beedie alexander-beedie deleted the top-level-sql-function branch May 29, 2024 08:45
Wouittone pushed a commit to Wouittone/polars that referenced this pull request Jun 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql Area: Polars SQL functionality enhancement New feature or an improvement of an existing feature python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a pl.sql method for running SQL commands
3 participants