Set probabilistic sampling as default subsampling behavior #659

huddlej · 2021-01-05T23:00:08Z

This PR makes probabilistic sampling the default behavior for augur filter when the user requests a maximum number of subsampling sequences.

Major changes include:

Adds a --no-probabilistic-sampling flag that mirrors the existing --probabilistic-sampling flag. This flag forces augur to try to accomodate no more than the request number of sequences and throw an error if there are more subsampling groups than available sequences.
Makes --probabilistic-sampling the default behavior when users request a maximum number of sequences to subsample. This change to the filter API allows users to define subsampling groups whose number exceeds the maximum sequences requested without throwing an error. As described in the conversation of issue Filter: Errors out when there are more groups than requested sequences #654, this new behavior is more likely the expected behavior by most users.

Minor changes include:

Removes the Zika Snakemake functional tests. These tests could have been removed back when we originally migrated them to the Cram functional tests (zika.t). Since this PR revealed the annoying CI errors that can result from maintaining redundant functional tests, it is time to remove these older tests.
Explicitly disables probabilistic sampling in functional tests random output impedes testing for correctness.

Fixes #654

codecov · 2021-01-05T23:30:27Z

Codecov Report

Merging #659 (e1abcbb) into master (a1c588a) will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #659      +/-   ##
==========================================
+ Coverage   29.31%   29.34%   +0.03%     
==========================================
  Files          39       39              
  Lines        5452     5455       +3     
  Branches     1335     1335              
==========================================
+ Hits         1598     1601       +3     
  Misses       3796     3796              
  Partials       58       58

Impacted Files	Coverage Δ
augur/filter.py	`45.05% <100.00%> (+0.48%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a1c588a...e1abcbb. Read the comment docs.

rneher

thanks John. This looks good to me. I only have one optional suggestion for the argument handling.

augur/filter.py

Makes the optional flag `--probabilistic-sampling` enabled by default and adds a `--no-probabilistic-sampling` flag to disable this default behavior. These changes make the `--probabilistic-sampling` flag unnecessary but maintain backwards-compatibility for workflows that reference this flag. We toggle probabilistic sampling through one variable. This simplifies the downstream logic that checks for whether to use probabilistic sampling or not. Note that with this approach we cannot define a separate `help` string for the `--no-probabilistic-sampling` flag because it will appear on the command line with a default value of its destination variable (`probabilistic_sampling=True`). This commit also disables probabilistic sampling for test builds. Test builds sample so few sequences that probabilistic subsampling can end up picking too few sequences for augur refine to work properly. This commit disables probabilistic sampling for those builds. Finally, this commit removes the redundant Zika test build implemented with Snakemake since we already have a Cram test (`zika.t`) that does the same thing.

huddlej · 2021-01-20T18:11:19Z

Thank you, @rneher! I've updated the parser to use the store_false approach you recommended and squashed all of the commits into a single cleaner one. When CI tests finish, I'll merge this and tomorrow I will prepare a release (since there's a little too much going on today).

huddlej added this to the Major release 11.0.0 milestone Jan 5, 2021

huddlej mentioned this pull request Jan 6, 2021

Filter: Errors out when there are more groups than requested sequences #654

Closed

huddlej changed the title ~~Disable probabilistic sampling in Zika test~~ Set probabilistic sampling as default subsampling behavior Jan 14, 2021

huddlej self-assigned this Jan 16, 2021

huddlej force-pushed the fix-probabilistic-sampling branch from 84fe289 to d557157 Compare January 16, 2021 01:01

huddlej requested a review from rneher January 20, 2021 00:39

rneher reviewed Jan 20, 2021

View reviewed changes

augur/filter.py Outdated Show resolved Hide resolved

huddlej force-pushed the fix-probabilistic-sampling branch from d557157 to e1abcbb Compare January 20, 2021 18:08

huddlej merged commit 3365c27 into master Jan 20, 2021

huddlej deleted the fix-probabilistic-sampling branch January 20, 2021 22:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set probabilistic sampling as default subsampling behavior #659

Set probabilistic sampling as default subsampling behavior #659

huddlej commented Jan 5, 2021 •

edited

Loading

codecov bot commented Jan 5, 2021 •

edited

Loading

rneher left a comment

huddlej commented Jan 20, 2021

Set probabilistic sampling as default subsampling behavior #659

Set probabilistic sampling as default subsampling behavior #659

Conversation

huddlej commented Jan 5, 2021 • edited Loading

codecov bot commented Jan 5, 2021 • edited Loading

Codecov Report

rneher left a comment

Choose a reason for hiding this comment

huddlej commented Jan 20, 2021

huddlej commented Jan 5, 2021 •

edited

Loading

codecov bot commented Jan 5, 2021 •

edited

Loading