depr(python): Deprecate `allow_infinities` and `null_probability` args to parametric test strategies #16183

stinodego · 2024-05-13T08:18:34Z

Changes

Add **kwargs to series and dataframes which passes the kwargs to underlying data generation strategies.
Deprecate allow_infinities. Users should use allow_infinity which is passed to the float strategy.
- This is not very discoverable yet, but adding all possible configurations for all data types to the top-level strategies is not sustainable. I am considering making the underlying data generation strategies public.
Deprecate null_probability. Users should not be micromanaging probabilities - hypothesis will find appropriate examples. Replaced by allow_null boolean arg.
Update the categories strategy to sample from a number of set categories, instead of being a free text field.

codecov · 2024-05-13T08:35:49Z

Codecov Report

Attention: Patch coverage is 82.00000% with 9 lines in your changes are missing coverage. Please review.

Project coverage is 80.98%. Comparing base (dbfc6b2) to head (b639211).
Report is 2 commits behind head on main.

Files	Patch %	Lines
...olars/polars/testing/parametric/strategies/core.py	70.96%	5 Missing and 4 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #16183      +/-   ##
==========================================
- Coverage   80.99%   80.98%   -0.01%     
==========================================
  Files        1392     1392              
  Lines      178884   178937      +53     
  Branches     2893     2897       +4     
==========================================
+ Hits       144884   144911      +27     
- Misses      33504    33522      +18     
- Partials      496      504       +8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

alexander-beedie · 2024-05-13T14:07:23Z

Deprecate null_probability. Users should not be micromanaging probabilities - hypothesis will find appropriate examples. Replaced by allow_null boolean arg.

@stinodego: I'd be slightly tempted to allow True/False and a float probability here, as another use-case for these strategies is to feed synthetic dataframes for testing into third party code that relies on Polars (eg: use of these strategies isn't just about testing Polars' own code, it's also about allowing libraries that use Polars to generate frames to test their code).

This is probably also an argument to integrate some sort of dedicated synthetic data generation facility as a separate consideration, of course; while there is overlap between the two use-cases I certainly wouldn't argue that they are the same thing! (Might have to go take a look at the synthetic data landscape now, hmmm 🤔)

stinodego · 2024-05-13T14:24:29Z

One of the key 'insights' that motivated this rewrite is that generating data for parametric tests and generating generic synthetic data are two very different things.

Hypothesis 'skews' the data intentionally in all kinds of ways to try and produce failures. You could have 1 billion rows with a 0.99 null probability, and hypothesis would still produce a column without any nulls in it. So that number doesn't really mean anything in the context of parametric testing.

You can see the same in hypothesis' own API: for example, their floats strategy has allow_nan as a bool, not a nan_probability.

Synthetic data generation can be really useful, but hypothesis should not be used for that. It's really not suitable for that purpose, as far as I've seen.

alexander-beedie · 2024-05-14T12:39:22Z

Synthetic data generation can be really useful, but hypothesis should not be used for that. It's really not suitable for that purpose, as far as I've seen.

Indeed; also my eventual conclusion above, I just need to rush to find a replacement for doing it now, heh 😂

…s to parametric test strategies (pola-rs#16183)

stinodego added 3 commits May 13, 2024 07:18

Make categoricals sampled from N categories

943c10b

Deprecate null_probability, introduce allow_null

7ff4fcd

Deprecate allow_infinities

b639211

github-actions bot added deprecation Add a deprecation warning to outdated functionality python Related to Python Polars labels May 13, 2024

stinodego marked this pull request as ready for review May 13, 2024 08:25

stinodego requested review from ritchie46, c-peters, alexander-beedie, MarcoGorelli and reswqa as code owners May 13, 2024 08:25

stinodego merged commit 77aaec3 into main May 13, 2024
17 checks passed

stinodego deleted the hyp-strats-2 branch May 13, 2024 08:46

c-peters added the accepted Ready for implementation label May 21, 2024

c-peters assigned stinodego May 21, 2024

Wouittone pushed a commit to Wouittone/polars that referenced this pull request Jun 22, 2024

depr(python): Deprecate allow_infinities and null_probability arg…

1a46c3d

…s to parametric test strategies (pola-rs#16183)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

depr(python): Deprecate `allow_infinities` and `null_probability` args to parametric test strategies #16183

depr(python): Deprecate `allow_infinities` and `null_probability` args to parametric test strategies #16183

stinodego commented May 13, 2024 •

edited

Loading

codecov bot commented May 13, 2024

alexander-beedie commented May 13, 2024 •

edited

Loading

stinodego commented May 13, 2024

alexander-beedie commented May 14, 2024

depr(python): Deprecate allow_infinities and null_probability args to parametric test strategies #16183

depr(python): Deprecate allow_infinities and null_probability args to parametric test strategies #16183

Conversation

stinodego commented May 13, 2024 • edited Loading

Changes

codecov bot commented May 13, 2024

Codecov Report

alexander-beedie commented May 13, 2024 • edited Loading

stinodego commented May 13, 2024

alexander-beedie commented May 14, 2024

depr(python): Deprecate `allow_infinities` and `null_probability` args to parametric test strategies #16183

depr(python): Deprecate `allow_infinities` and `null_probability` args to parametric test strategies #16183

stinodego commented May 13, 2024 •

edited

Loading

alexander-beedie commented May 13, 2024 •

edited

Loading