-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
depr(python): Deprecate allow_infinities
and null_probability
args to parametric test strategies
#16183
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16183 +/- ##
==========================================
- Coverage 80.99% 80.98% -0.01%
==========================================
Files 1392 1392
Lines 178884 178937 +53
Branches 2893 2897 +4
==========================================
+ Hits 144884 144911 +27
- Misses 33504 33522 +18
- Partials 496 504 +8 ☔ View full report in Codecov by Sentry. |
@stinodego: I'd be slightly tempted to allow True/False and a float probability here, as another use-case for these strategies is to feed synthetic dataframes for testing into third party code that relies on Polars (eg: use of these strategies isn't just about testing Polars' own code, it's also about allowing libraries that use Polars to generate frames to test their code). This is probably also an argument to integrate some sort of dedicated synthetic data generation facility as a separate consideration, of course; while there is overlap between the two use-cases I certainly wouldn't argue that they are the same thing! (Might have to go take a look at the synthetic data landscape now, hmmm 🤔) |
One of the key 'insights' that motivated this rewrite is that generating data for parametric tests and generating generic synthetic data are two very different things. Hypothesis 'skews' the data intentionally in all kinds of ways to try and produce failures. You could have 1 billion rows with a 0.99 null probability, and hypothesis would still produce a column without any nulls in it. So that number doesn't really mean anything in the context of parametric testing. You can see the same in hypothesis' own API: for example, their Synthetic data generation can be really useful, but hypothesis should not be used for that. It's really not suitable for that purpose, as far as I've seen. |
Indeed; also my eventual conclusion above, I just need to rush to find a replacement for doing it now, heh 😂 |
…s to parametric test strategies (pola-rs#16183)
Changes
**kwargs
toseries
anddataframes
which passes the kwargs to underlying data generation strategies.allow_infinities
. Users should useallow_infinity
which is passed to the float strategy.null_probability
. Users should not be micromanaging probabilities - hypothesis will find appropriate examples. Replaced byallow_null
boolean arg.categories
strategy to sample from a number of set categories, instead of being a free text field.