Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set probabilistic sampling as default subsampling behavior #659

Merged
merged 1 commit into from
Jan 20, 2021

Conversation

huddlej
Copy link
Contributor

@huddlej huddlej commented Jan 5, 2021

This PR makes probabilistic sampling the default behavior for augur filter when the user requests a maximum number of subsampling sequences.

Major changes include:

  • Adds a --no-probabilistic-sampling flag that mirrors the existing --probabilistic-sampling flag. This flag forces augur to try to accomodate no more than the request number of sequences and throw an error if there are more subsampling groups than available sequences.
  • Makes --probabilistic-sampling the default behavior when users request a maximum number of sequences to subsample. This change to the filter API allows users to define subsampling groups whose number exceeds the maximum sequences requested without throwing an error. As described in the conversation of issue Filter: Errors out when there are more groups than requested sequences #654, this new behavior is more likely the expected behavior by most users.

Minor changes include:

  • Removes the Zika Snakemake functional tests. These tests could have been removed back when we originally migrated them to the Cram functional tests (zika.t). Since this PR revealed the annoying CI errors that can result from maintaining redundant functional tests, it is time to remove these older tests.
  • Explicitly disables probabilistic sampling in functional tests random output impedes testing for correctness.

Fixes #654

@huddlej huddlej added this to the Major release 11.0.0 milestone Jan 5, 2021
@codecov
Copy link

codecov bot commented Jan 5, 2021

Codecov Report

Merging #659 (e1abcbb) into master (a1c588a) will increase coverage by 0.03%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #659      +/-   ##
==========================================
+ Coverage   29.31%   29.34%   +0.03%     
==========================================
  Files          39       39              
  Lines        5452     5455       +3     
  Branches     1335     1335              
==========================================
+ Hits         1598     1601       +3     
  Misses       3796     3796              
  Partials       58       58              
Impacted Files Coverage Δ
augur/filter.py 45.05% <100.00%> (+0.48%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a1c588a...e1abcbb. Read the comment docs.

@huddlej huddlej changed the title Disable probabilistic sampling in Zika test Set probabilistic sampling as default subsampling behavior Jan 14, 2021
@huddlej huddlej self-assigned this Jan 16, 2021
@huddlej huddlej requested a review from rneher January 20, 2021 00:39
Copy link
Member

@rneher rneher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks John. This looks good to me. I only have one optional suggestion for the argument handling.

augur/filter.py Outdated Show resolved Hide resolved
Makes the optional flag `--probabilistic-sampling` enabled by default
and adds a `--no-probabilistic-sampling` flag to disable this default
behavior. These changes make the `--probabilistic-sampling` flag
unnecessary but maintain backwards-compatibility for workflows that
reference this flag.

We toggle probabilistic sampling through one variable. This simplifies
the downstream logic that checks for whether to use probabilistic
sampling or not. Note that with this approach we cannot define a
separate `help` string for the `--no-probabilistic-sampling` flag
because it will appear on the command line with a default value of its
destination variable (`probabilistic_sampling=True`).

This commit also disables probabilistic sampling for test builds. Test
builds sample so few sequences that probabilistic subsampling can end up
picking too few sequences for augur refine to work properly. This commit
disables probabilistic sampling for those builds.

Finally, this commit removes the redundant Zika test build implemented
with Snakemake since we already have a Cram test (`zika.t`) that does
the same thing.
@huddlej
Copy link
Contributor Author

huddlej commented Jan 20, 2021

Thank you, @rneher! I've updated the parser to use the store_false approach you recommended and squashed all of the commits into a single cleaner one. When CI tests finish, I'll merge this and tomorrow I will prepare a release (since there's a little too much going on today).

@huddlej huddlej merged commit 3365c27 into master Jan 20, 2021
@huddlej huddlej deleted the fix-probabilistic-sampling branch January 20, 2021 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Filter: Errors out when there are more groups than requested sequences
2 participants