Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix probabilistic subsampling for small values #759

Merged
merged 1 commit into from
Aug 13, 2021

Commits on Aug 13, 2021

  1. Fix probabilistic subsampling for small values

    Fixes a regression from the rewrite of augur filter where probabilistic
    subsampling for small values of `--subsample-max-sequences` could
    randomly select zero strains and randomly cause our CI tests to fail.
    Prior to the rewrite of augur filter and introduction of priority
    queues, we fixed this issue by repeatedly attempting to calculate
    sequences per group that summed to an integer value greater than zero.
    However, the way I implemented random queue sizes inside the `PriorityQueue`
    class in the rewrite prevented me from using a similar "multiple
    attempts" approach.
    
    This commit redesigns the way we create priority queues. In the case
    where we already know the number of sequences per group in the first
    pass, we create an appropriately-sized priority queue for each group as
    we encounter it. There is no possibility that the sum of these queue
    sizes could be zero.
    
    In the case where we need to calculate the number of sequences per group
    from the first pass, we already know all possible groups and can create
    their priority queues in bulk. The new `create_queues_by_group` function
    allows us to create fixed-sized or randomly-sized queues and also make
    multiple attempts when queue sizes sum to zero. As a result, the
    `PriorityQueue` class is much simpler (it requires no logic about random
    max sizes) and we can actually test the fixed and random behaviors more
    carefully with doctests for `create_queues_by_group`.
    huddlej committed Aug 13, 2021
    Configuration menu
    Copy the full SHA
    5db04fc View commit details
    Browse the repository at this point in the history