Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add ewm_mean_by #15638

Merged
merged 4 commits into from
Apr 15, 2024
Merged

feat: add ewm_mean_by #15638

merged 4 commits into from
Apr 15, 2024

Conversation

MarcoGorelli
Copy link
Collaborator

@MarcoGorelli MarcoGorelli commented Apr 13, 2024

closes #10794

Demo of why this matters: for non-evenly spaced points, there's a noticeable difference compared with plain ewm_mean:

image


question - should it sort on behalf of users, or raise?

What I've tried introducing here is a check_sorted argument:

  • if it's set to check_sorted=False, then Polars assumes your data to be
  • else, instead of erroring, Polars takes care of this for you

Trying to follow @orlp 's guidance here

Other options could be:

  • if the data isn't sorted, then sort for the user, and raise a warning. What's annoying about this one is that sortedness flags aren't preserved in .over, and so pl.col('a').ewm_mean_by('times', half_life='4d').over('groups') would always warn. I'd find that quite annoying as a user
  • raise if the data isn't sorted. As a user, I find such errors a bit annoying, but not terribly so

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Apr 13, 2024
Copy link

codspeed-hq bot commented Apr 13, 2024

CodSpeed Performance Report

Merging #15638 will not alter performance

Comparing MarcoGorelli:ewm-mean-time (2d0b52c) with main (8ef2e21)

Summary

✅ 22 untouched benchmarks

Copy link

codecov bot commented Apr 14, 2024

Codecov Report

Attention: Patch coverage is 98.46154% with 3 lines in your changes are missing coverage. Please review.

❗ No coverage uploaded for pull request base (main@19f0939). Click here to learn what that means.
Report is 1 commits behind head on main.

❗ Current head ad8bb68 differs from pull request most recent head 2d0b52c. Consider uploading reports for the commit 2d0b52c to get more accurate results

Files Patch % Lines
crates/polars-ops/src/series/ops/ewm_by.rs 98.52% 2 Missing ⚠️
crates/polars-plan/src/dsl/function_expr/mod.rs 83.33% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main   #15638   +/-   ##
=======================================
  Coverage        ?   81.34%           
=======================================
  Files           ?     1374           
  Lines           ?   176473           
  Branches        ?     2544           
=======================================
  Hits            ?   143549           
  Misses          ?    32443           
  Partials        ?      481           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@MarcoGorelli MarcoGorelli force-pushed the ewm-mean-time branch 2 times, most recently from 160c01d to 0ac478c Compare April 14, 2024 11:44
@MarcoGorelli MarcoGorelli force-pushed the ewm-mean-time branch 2 times, most recently from 4f6f59a to 8a56c16 Compare April 15, 2024 08:58
@MarcoGorelli MarcoGorelli marked this pull request as ready for review April 15, 2024 15:53
crates/polars-ops/src/series/ops/ewm_by.rs Outdated Show resolved Hide resolved
_ => None,
};
if half_life.negative() {
polars_bail!(InvalidOperation: "half_life cannot be negative");
Copy link
Member

@ritchie46 ritchie46 Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We could do the assert immediately in polars_ensure

@ritchie46 ritchie46 merged commit 1e7fa8e into pola-rs:main Apr 15, 2024
24 checks passed
@hixan
Copy link

hixan commented Apr 18, 2024

I know I am a little late to the party, but would it be a better approach to give the ability to provide the half_life as an expression instead? This would allow the above behaviour with something like: (taking liberties with the half life dtype)

def ewm_mean_by(input_column, times_column, alpha):
    return input_column.ewm_mean(alpha = alpha * times_column.diff())

(haven't checked the math here), It could be used to perform the operation in update<T> in this PR while giving users the ability to also define variable half lives

@MarcoGorelli
Copy link
Collaborator Author

could you open a new issue for this please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EWM Mean/Std should work with a "times" column
3 participants