-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test(python): Parametric test coverage for EWM functions #5011
test(python): Parametric test coverage for EWM functions #5011
Conversation
Wow! That are a lot of tests! I think we should really utilize this for the operations that are sensitive to breaking (because I optimize them much, and create new fast paths). Here I think of combinations of |
The joy of parametric testing; you get a lot of bang for your buck ;)
Definitely; in conjunction with tests like this (where there is an existing well-tested implementation that we can reference) it hugely expands the coverage surface, actively probing all kinds of different corner-cases. There is an interesting option for future consideration, which would map very well onto what you describe above: stateful testing with rule-based state machines. The combination of parametrically-generated operator chains ( |
@matteosantama - hope you don't mind looking over these
Misc: Some of the other tests that don't tie-out may be connected to subnormal number handling (or something along those lines) as they are generating extreme values; I'm going to see about constraining the tolerance for these before considering them a problem. |
Will have to look closer at the As for the |
Where is the chunking happening? Is that taken care of in the generation of the input |
Ok, so I'm inclined to say we should just special-case the Since this problem only seems to affect |
Yup; unless you explicitly set True (always chunked) or False (never chunked), it is applied randomly to 50% of the output Series. This way everything subject to parametric testing is automatically validated over chunked/non-chunked data.
I knew I'd forgotten something... will take care of it!
Doesn't sound crazy - and with this new coverage we'll find out pretty fast if it has any unforeseen side-effects... ;) |
@alexander-beedie Great! I just opened a pull request with the change. Feel free to merge it into this PR, or we can keep them separate. |
Fantastic; lets keep them separate for now - there was still some other noise that I want to track down/explain. (It looks more like noise than something problematic; will take a look against your patch above to see what remains ;) Just added |
Don't forget the |
Was about to ask - what is |
Ah, it’s a parameter you pass directly to either https://pandas.pydata.org/docs/reference/api/pandas.core.window.ewm.ExponentialMovingWindow.var.html pd.Series(xs).ewm().var(bias=True) |
Got it. FYI: with all the parameter/data variations, I think we're now looking at a little over 4,000 validating tests per EWM type (each of |
@matteosantama: have compiled-in your update to validate against the parametric tests; found a difference in behavior with the alpha optimization, and one question about acceptable tolerance (vs pandas/reference):
|
Wow, your test are catching everything! This is definitely an unforeseen change, but I do think it is correct. Consider the standard equation for ewm mean, I'm OK with returning that null -- its defensible because its exactly what the math says. If someone wants the
I think thats probably acceptable. If someone opens an issue and provides us a reason to tighten it we can revisit. |
Parametric testing... "tell your friends" 🤣
Great; have made a note to that effect in the tests, and applied fill_null/forward when alpha=1 (just in the tests), which takes care of that entire class of discrepancy.
With a little more experimentation it seems that by excluding @ritchie46: think we're good to go; I've rebased/squashed to capture @matteosantama's alpha optimization and have uncommented Any other areas you'd like subjected to more intense testing? |
I just had to tempt fate... ;p |
Yes, the relational primitives combined. |
Permuting the optimisations is very doable. Need to create some sort of mechanism of combining chained operations with reasonable parameters (which is also something I want at work for our own data API, so it'll happen one way or another ;) I suspect the rules-based state machine approach is going to be useful here. Will investigate properly when I have a good block of time. In the meantime I'll probably look at hardening other areas as well (for example: IO round trip via different formats could be a fruitful area, rolling functions, etc). |
Interesting stuff! One thing I noticed though: the test pipelines are a lot slower now. What about the trade-off here? Are the tests slow or is calculating the coverage slowing everything down? |
@ritchie46: @matteosantama: For final validation I ran 20,000 iterations locally (which translates to ~1.1 million successful tests for each of the three decay params; I'm happy to report that every one of those tests passed with flying colours ;) @stinodego: |
Thanks a lot for the extensive tests. Given that this adds a huge lot of tests for only one of our many operations. I wonder a bit how we should go about scaling this. Would it be possible to keep the parameters around that lead to bugs? Maybe we could run those around as these were the found edge cases. Any suggestions/ideas? |
Encoding too many assumptions tends to be a bad idea, as it's hard to say in advance what combination of parameters/values may lead to bugs. Arguably you're more likely to find bugs where you weren't expecting them, for exactly that reason, heh. (Which isn't to say you shouldn't encode any assumptions, just that you should do so judiciously).
Plenty :) There are lots of ways to tune things; given how few parameterised tests there are at the moment I have spent essentially no time assessing them in terms of performance tradeoff. One immediately available option is the ability to control the number of iterations on a per-test basis. When we have a heavier test (like this new EWM one) we can dial it back a bit. For instance, the Series slice test runs 500 iterations in ~450ms, but I think 200 iterations of EWM takes around 16 secs (I can optimize this significantly; every single combination of every param on every iteration is overkill). Setting per-test iterations is trivial; just a decorator:
Could also keep the default number of iterations during CI the same as a local run (100), rather than have CI do more (200). The author of the test can always run more before committing (I ran 20,000 iterations on this one). I'm also very curious about @stinodego's suggestion that coverage may interact badly here; I want to look into that and see if it's a factor. How about I optimize this test a bit more, then we merge it, and I'll start looking at things from a higher-level performance perspective next? |
Also: this test is a bit of an outlier in another way - you don't necessarily want to be limited to testing against a reference implementation (though here it makes sense to do so). One of the ways this test becomes more expensive is that we aren't just testing our own functionality, we're testing pandas as well. One of the best ways to employ this type of testing more generically is to validate invariants; essentially you test the properties of your results to check for violations of the invariants, rather than limit yourself to testing the values of your results against some reference implementation (hence the term "property-based testing"). By chance I happened to find "Considerations for control of data invariants" by @jorgecarleitao on YouTube recently, so that ties-in nicely ;) There's a very clean description of how to think about property-based testing here. |
@stinodego / @ritchie46: hah. ok, there was actually a lot of low-hanging fruit; I just committed a patch that generically speeds-up a lot of the polars-specific parametric testing functionality. That plus some test-specific optimisations made these EWM tests run about ~10x faster. Probably more can be done (haven't looked at coverage yet), but already it's a big improvement; much-reduced CI impact. Generic speedups
EWM speedups
Misc speedups
|
I think the test pipeline is now faster than it was before this PR 😄 nice work! |
Heh. It is actually... ;) |
Should we wait then, or should we follow up with that? |
No, let's not wait; this is now ready to go and speeds up the existing pipeline (by making all the non-EWM parametric tests faster). If there's a good follow-up (eg: can tweak coverage behaviour, or make other worthwhile adjustments to the polars-specific hypothesis primitives) it will be separate from this. |
…ic coverage for EWM functions
Rebased/squashed (merge conflict with updated |
Alright! Weekend! 🎉😉 |
Massive expansion of EWM test coverage, using pandas (with
ignore_na=True
) as a reference implementation for comparison purposes.Parametric tests cover...
com
,span
,half_life
.null
values present / not present.min_period
values.int
andfloat
series.