-
Notifications
You must be signed in to change notification settings - Fork 129
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix probabilistic subsampling for small values
Fixes a regression from the rewrite of augur filter where probabilistic subsampling for small values of `--subsample-max-sequences` could randomly select zero strains and randomly cause our CI tests to fail. Prior to the rewrite of augur filter and introduction of priority queues, we fixed this issue by repeatedly attempting to calculate sequences per group that summed to an integer value greater than zero. However, the way I implemented random queue sizes inside the `PriorityQueue` class in the rewrite prevented me from using a similar "multiple attempts" approach. This commit redesigns the way we create priority queues. In the case where we already know the number of sequences per group in the first pass, we create an appropriately-sized priority queue for each group as we encounter it. There is no possibility that the sum of these queue sizes could be zero. In the case where we need to calculate the number of sequences per group from the first pass, we already know all possible groups and can create their priority queues in bulk. The new `create_queues_by_group` function allows us to create fixed-sized or randomly-sized queues and also make multiple attempts when queue sizes sum to zero. As a result, the `PriorityQueue` class is much simpler (it requires no logic about random max sizes) and we can actually test the fixed and random behaviors more carefully with doctests for `create_queues_by_group`.
- Loading branch information
Showing
1 changed file
with
63 additions
and
33 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters