Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): Improve Schema and DataType interop with Python types #18308

Merged
merged 3 commits into from
Aug 23, 2024

Conversation

alexander-beedie
Copy link
Collaborator

@alexander-beedie alexander-beedie commented Aug 22, 2024

A few updates that largely finish making interop between Polars/Python types a little smoother and more consistent for the DataType and Schema classes.

  • Adds missing conversion/mapping for Struct and Object dtypes.
  • Adds a to_python method to both DataType and Schema classes.
  • Parses any Python types given to Schema __init__ (or __setitem__).

In addition to making Schema init consistent with the allowed DataFrame/Series dtype init this looser schema definition can also be helpful during EDA, for example, before assigning more exact/specific dtypes in a final production pipeline.

Examples

from datetime import date
import polars as pl

schema = pl.Schema({
    "foo": pl.Int8(),
    "bar": pl.String(),
    "baz": pl.Object(),
    "ham": pl.Categorical("lexical"),
    "spam": pl.Struct({"time": pl.List(pl.Duration), "dist": pl.Float64}),
})

schema.to_python()
# {"foo": int, "bar": str, "baz": object, "ham": str, "spam": dict}
schema = pl.Schema({"foo": int, "bar": str, "baz": object})
schema["ham"] = date
# Schema([('foo', Int64), ('bar', String), ('baz', Object), ('ham', Date)])
pl.DataType.from_python(int)
# pl.Int64
pl.Object().to_python()
# object

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars labels Aug 22, 2024
Copy link

codecov bot commented Aug 22, 2024

Codecov Report

Attention: Patch coverage is 85.18519% with 4 lines in your changes missing coverage. Please review.

Project coverage is 80.34%. Comparing base (70459e4) to head (f246ae5).
Report is 15 commits behind head on main.

Files Patch % Lines
py-polars/polars/datatypes/classes.py 71.42% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #18308      +/-   ##
==========================================
+ Coverage   80.23%   80.34%   +0.10%     
==========================================
  Files        1501     1501              
  Lines      199180   199236      +56     
  Branches     2837     2841       +4     
==========================================
+ Hits       159822   160078     +256     
+ Misses      38830    38631     -199     
+ Partials      528      527       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ritchie46 ritchie46 merged commit 764ee49 into pola-rs:main Aug 23, 2024
15 checks passed
@alexander-beedie alexander-beedie deleted the python-dtypes-schema-interop branch August 23, 2024 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants