-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Don't require data to be sorted by by
column in rolling_*_by
operations
#16249
Conversation
225232d
to
aa28909
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16249 +/- ##
==========================================
- Coverage 80.82% 80.78% -0.04%
==========================================
Files 1393 1393
Lines 179341 179517 +176
Branches 2918 2929 +11
==========================================
+ Hits 144955 145031 +76
- Misses 33884 33976 +92
- Partials 502 510 +8 ☔ View full report in Codecov by Sentry. |
13522e6
to
ce172cc
Compare
crates/polars-time/src/chunkedarray/rolling_window/rolling_kernels/no_nulls.rs
Outdated
Show resolved
Hide resolved
f0ae832
to
2021ca2
Compare
// start with a dummy index, will be overwritten on first iteration. | ||
let mut agg_window = Agg::new(values, 0, 0, params); | ||
|
||
let mut out = zeroed_vec(values.len()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @orlp - I've tried using zeroed_vec
here, and creating a validity mask separately. Any feedback on how I've done this would be appreciated if possible, thanks 🙏
e38bc8c
to
23beb74
Compare
23beb74
to
ef90e64
Compare
let mut agg_window = Agg::new(values, 0, 0, params); | ||
|
||
let mut out = zeroed_vec(values.len()); | ||
let mut null_positions = Vec::with_capacity(values.len()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this allocate this much?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope, just pushing an update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just chatted with Orson about this
The difficulty is that it's not known in advance (before starting the loop) whether there will be nulls
He's suggested making an Option<MutableBitmap>
, which gets set on the first null
There is an extra if validity.is_none()
check, but nulls produced by the rolling operation should be fairly rare anyway
I've pushed an update - thanks @orlp, much appreciated 🙏
by
column in rolling_*_by
operationsby
column in rolling_*_by
operations
by
column in rolling_*_by
operationsby
column in rolling_*_by
operations
@@ -99,25 +99,29 @@ where | |||
// `by`, which has already been checked to be the same length as the values. | |||
unsafe { *out.get_unchecked_mut(*out_idx as usize) = res }; | |||
} else { | |||
null_positions.push(*out_idx as usize) | |||
if validity.is_none() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe put this in a closure as that ensure we have a single code block and call that closure twice.
I think the overhead of the function call is not too much here as you said, you don't expect many nulls.
Thanks for this one @MarcoGorelli. I do agree it is an improvement to answer the query. :) |
yay, thank you so much, and thanks @orlp for helping me - been wanting to see this happen for a while 🥳 I think users will definitely appreciate it |
Needs rebasing onto #16247In summary this splits
rolling_apply_agg_window
into:rolling_apply_agg_window_sorted
(no difference to the currentrolling_apply_agg_window
, this is the fastpath when data is already known to be sorted byby
)rolling_apply_agg_window
: doesn't require the data to be sorted byby
, which is why it needs an extra argument (sorting_indices
) which is used to place the results in the right place of theout
Vec
warn_if_unsorted
on the Python side, removed it from the Rust sideThe gist of the change is
rolling_apply_agg_window
in crates/polars-time/src/chunkedarray/rolling_window/rolling_kernels/no_nulls.rsPerf impact:
arg_sort
is performed in order to createsorting_indices
. Once this has been created, it's just used to place the elements in theout
Vec in the right place (rolling_apply_agg_window
)