-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(python): Expose string expression nodes to python #16221
feat(python): Expose string expression nodes to python #16221
Conversation
Yeap, seems good. Left one comment to shrink the match. |
StringFunction::ConcatHorizontal { | ||
delimiter, | ||
ignore_nulls, | ||
} => ("concathorizontal", delimiter, ignore_nulls).to_object(py), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question I have here: we're effectively using a tuple with a string as a poor approximation of a tagged union type (AKA, the incoming enum type on the rust side).
This means that in Python it's not easily to introspect the set of values that we can receive here.
One alternative option (which is more typing) is to have an enum class here that is exposed. This could either be a unit variant enum that replaces the use of the string "tag" with an enum value:
#[pyclass]
enum PyStringFunction {
ConcatHorizontal,
ConcatVertical,
...
}
...
match strfun {
StringFunction::ConcatHorizontal { delimiter, ignore_nulls } => (PyStringFunction, delimiter, ignore_nulls).to_object(py),
...
}
Or, I think, using the new features of pyo3 for struct variant enums:
#[pyclass]
enum PyStringFunction {
ConcatHorizontal { delimiter: Whatever, ignore_nulls: bool },
...
}
...
match strfn {
StringFunction::ConcatHorizontal { d, i } => PyStringFunction::ConcatHorizontal {d, i},
...
}
AFAICT, from some (very minimal) microbenchmarking, this all have pretty much the same performance characteristics.
One niggle for the last option is that (for now) you can't mix unit an struct enum variants, and tuple enum variants aren't supported at all, so you can't write:
#[pyclass]
enum SomeEnum {
Unit,
Tuple(i32),
Struct { a: i32, b: i64 },
}
Instead one needs to map every to (possibly empty) structs and come up with a name for the unnamed tuple variant field.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do like that more as we can more easily change our internals whilst maintaining the same exposed enums. It is some extra typing atm, but it gives us an extra indirection that will likely make it more maintainable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added something in da70f23 that hopefully moves us in this direction 👍
infer_schema_len, | ||
} => ( | ||
"jsondecode", | ||
Wrap(dtype.as_ref().unwrap().clone()).to_object(py), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question (for @ritchie46): are these dtypes
that live inside the variants different from the ones that AExpr.to_field(...)
would return?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should always trust AExpr::to_field
. That this variant holds a dtype
is an implementation detail. Those details are now leaked and might be changed. So be aware of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could therefore just not leak these out: it makes the mapping not one-to-one, but if to_field
is the correct source of truth then it makes sense to not lead someone down a bad path by providing these slots in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, agree.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16221 +/- ##
==========================================
- Coverage 80.99% 80.82% -0.18%
==========================================
Files 1392 1394 +2
Lines 178930 179569 +639
Branches 2904 2913 +9
==========================================
+ Hits 144920 145130 +210
- Misses 33507 33936 +429
Partials 503 503 ☔ View full report in Codecov by Sentry. |
Hello! First of all, thanks for polars!
This PR follows up recent work to expose the logical plan to python by adding string expressions. This should allow a consumer of the plan on the python side to implement accelerated string computations.
cc @wence-