Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(rust, python): allow arithmetic operations between numeric Series and list Series #18901

Closed
wants to merge 14 commits into from

Conversation

itamarst
Copy link
Contributor

Fixes #8006
Fixes #14711

Followup to #9188, allow arithmetic between:

  • Left-hand side list Series (including nesting!) with numeric leaf, and right-hand side numeric Series. Add/sub/mul/div/rem.
  • Right-hand numeric Series, and left-hand side list Series. This is limited to add and multiply since the other operation seemed kinda strange semantically, as they are not commutative. One could perhaps argue for subtraction, but lack of negation operator suggests maybe not.

@@ -53,23 +53,112 @@ fn lists_same_shapes(left: &ArrayRef, right: &ArrayRef) -> bool {
}
}

/// Arithmetic operations that can be applied to a Series
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some existing code that does this? There's a whole bunch of similar things but I may've missed an actual implementation.


impl Op {
/// Apply the operation to a pair of Series.
fn apply_with_series(&self, lhs: &Series, rhs: &Series) -> PolarsResult<Series> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to do these two functions as a single generic function and failed...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why exactly? Seems this is possible with generics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't figure out the constraints that work. I guess I can try another round and report back when I get stuck.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the issue is that when you add two &Series, you get a PolarsResult<Series>, and when you add a &Series and u8 you get Series. Since the return types don't match, the constraints conflict (the former wants T: std:ops::Add<Output=PolarsResult<Series>>, the latter wants T: std:ops::Add<Output=Series>).

@itamarst itamarst changed the title feat: allow arithmetic operations between numeric Series and list Series feat(rust, python): allow arithmetic operations between numeric Series and list Series Sep 24, 2024
Copy link

codecov bot commented Sep 24, 2024

Codecov Report

Attention: Patch coverage is 99.09091% with 1 line in your changes missing coverage. Please review.

Project coverage is 79.77%. Comparing base (e186c03) to head (e916b56).
Report is 39 commits behind head on main.

Files with missing lines Patch % Lines
crates/polars-plan/src/plans/aexpr/schema.rs 94.44% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main   #18901   +/-   ##
=======================================
  Coverage   79.77%   79.77%           
=======================================
  Files        1531     1531           
  Lines      208487   208536   +49     
  Branches     2913     2913           
=======================================
+ Hits       166314   166370   +56     
+ Misses      41622    41615    -7     
  Partials      551      551           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

codspeed-hq bot commented Sep 24, 2024

CodSpeed Performance Report

Merging #18901 will not alter performance

Comparing itamarst:8006-list-arithmetic-part-2 (e916b56) with main (da8f358)

Summary

✅ 40 untouched benchmarks

@ritchie46
Copy link
Member

Hey @itamarst Thanks for the PR. Before I review, can you rebase? There seem to be many commits from previous iteration.

@itamarst
Copy link
Contributor Author

Will try.

@itamarst
Copy link
Contributor Author

Looks like maybe one or two changes might be unnecessary, as reflected in the coverage report, so will make this draft while I fix that.

@itamarst itamarst marked this pull request as draft September 24, 2024 18:08
@itamarst itamarst marked this pull request as ready for review September 24, 2024 18:37
@itamarst
Copy link
Contributor Author

OK, it's ready for review again. Not sure what's up with Windows failures.

@itamarst
Copy link
Contributor Author

And it's both ready for review and green.

let mut new_right_dtype = right_dtype.cast_leaf(leaf_super_dtype);

// If we have e.g. Array and List, we want to convert those too.
if (left_dtype.is_list() && right_dtype.is_array())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should do that. I want those casts to be explicit for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By explicit do you mean "explicitly cast list to array" and vice versa as needed? This section is definitely necessary in some form, if I remove it:

[gw4] linux -- Python 3.12.3 /home/itamarst/devel/polars/.venv/bin/python3
tests/unit/datatypes/test_array.py:106: in test_array_list_supertype
    result = s1 == s2
polars/series/series.py:773: in __eq__
    return self._comp(other, "eq")
polars/series/series.py:753: in _comp
    return self._from_pyseries(getattr(self._s, op)(other._s))
E   pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: SchemaMismatch(ErrString("invalid series dtype: expected `FixedSizeList`, got `list[f64]`"))
-------------------------------- Captured stderr call --------------------------------thread '<unnamed>' panicked at crates/polars-core/src/series/comparison.rs:132:9:
called `Result::unwrap()` on an `Err` value: SchemaMismatch(ErrString("invalid series dtype: expected `FixedSizeList`, got `list[f64]`"))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
_______________________________ test_eq_array_cmp_list _______________________________[gw10] linux -- Python 3.12.3 /home/itamarst/devel/polars/.venv/bin/python3
tests/unit/series/test_equals.py:57: in test_eq_array_cmp_list
    result = s == [1, 2]
polars/series/series.py:773: in __eq__
    return self._comp(other, "eq")
polars/series/series.py:753: in _comp
    return self._from_pyseries(getattr(self._s, op)(other._s))
E   pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: SchemaMismatch(ErrString("invalid series dtype: expected `FixedSizeList`, got `list[i64]`"))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the user should cast explicitly. And we shuold return an error instead of panicking if they pass wrong types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that, but given this will involving changing the behavior of those two tests (from e.g. "these two things are equal" to "this now errors asking for a cast"), this is a backwards incompatible change. So as long as that's OK.


impl Op {
/// Apply the operation to a pair of Series.
fn apply_with_series(&self, lhs: &Series, rhs: &Series) -> PolarsResult<Series> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why exactly? Seems this is possible with generics.

@nameexhaustion
Copy link
Collaborator

Thanks for the PR - we want to try doing this with a more optimized kernel that operates directly on the primitive values - I will take it from here

@nameexhaustion nameexhaustion marked this pull request as draft October 2, 2024 06:23
@@ -47,55 +47,6 @@ fn is_cat_str_binary(type_left: &DataType, type_right: &DataType) -> bool {
}
}

fn process_list_arithmetic(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove logic from the type_coercion optimizer - this should already be handled by get_arithmetic_field during conversion

Multiply => wrap!(lhs * rhs),
Divide => wrap!(lhs / rhs),
Remainder => wrap!(lhs % rhs),
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make Op generic but it costs a little bit of sanity I think 😄

@nameexhaustion
Copy link
Collaborator

Closed in favor of #19162

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Divide each element in a list by an int Simple arithmetic operations on the "list" type columns
4 participants