-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set probabilistic sampling as default subsampling behavior #659
Conversation
Codecov Report
@@ Coverage Diff @@
## master #659 +/- ##
==========================================
+ Coverage 29.31% 29.34% +0.03%
==========================================
Files 39 39
Lines 5452 5455 +3
Branches 1335 1335
==========================================
+ Hits 1598 1601 +3
Misses 3796 3796
Partials 58 58
Continue to review full report at Codecov.
|
84fe289
to
d557157
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks John. This looks good to me. I only have one optional suggestion for the argument handling.
Makes the optional flag `--probabilistic-sampling` enabled by default and adds a `--no-probabilistic-sampling` flag to disable this default behavior. These changes make the `--probabilistic-sampling` flag unnecessary but maintain backwards-compatibility for workflows that reference this flag. We toggle probabilistic sampling through one variable. This simplifies the downstream logic that checks for whether to use probabilistic sampling or not. Note that with this approach we cannot define a separate `help` string for the `--no-probabilistic-sampling` flag because it will appear on the command line with a default value of its destination variable (`probabilistic_sampling=True`). This commit also disables probabilistic sampling for test builds. Test builds sample so few sequences that probabilistic subsampling can end up picking too few sequences for augur refine to work properly. This commit disables probabilistic sampling for those builds. Finally, this commit removes the redundant Zika test build implemented with Snakemake since we already have a Cram test (`zika.t`) that does the same thing.
d557157
to
e1abcbb
Compare
Thank you, @rneher! I've updated the parser to use the |
This PR makes probabilistic sampling the default behavior for
augur filter
when the user requests a maximum number of subsampling sequences.Major changes include:
--no-probabilistic-sampling
flag that mirrors the existing--probabilistic-sampling
flag. This flag forces augur to try to accomodate no more than the request number of sequences and throw an error if there are more subsampling groups than available sequences.--probabilistic-sampling
the default behavior when users request a maximum number of sequences to subsample. This change to the filter API allows users to define subsampling groups whose number exceeds the maximum sequences requested without throwing an error. As described in the conversation of issue Filter: Errors out when there are more groups than requested sequences #654, this new behavior is more likely the expected behavior by most users.Minor changes include:
zika.t
). Since this PR revealed the annoying CI errors that can result from maintaining redundant functional tests, it is time to remove these older tests.Fixes #654