Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Refactor filters into separate functions
Refactors filter logic into separate function with the same signature of `func(metadata, **kwargs)` that returns a `set` of strain names that pass the filter. Although this work does not reduce the complexity of the code by itself, it sets up a pattern that will allow us to move all filters into a single loop through all user-requested filters. This change should simplify the main logic and also allow us to short-cut evaluation when filters remove all possible strains (e.g., `--exclude-all`), avoiding unnecessary checks. This refactoring also includes new functions for sequence-based filters. As part of these sequence-based functions, we update the sequence index data frame to be indexed by strain name to be consistent with the metadata data frame. One side-effect of this refactoring is the additional of a functional test for both `--include-where` and `--exclude-where` filters to make sure these are properly implemented and no regressions occur during refactoring. The lack of this test initially allowed the refactoring of `--exclude-where` logic to introduce a bug. Finally, we also define a new function to include strains by a query. Note that this implementation relies on the same query parser used by the `--exclude-where` argument which allows the negation operator and also the code that lowercases the strings before comparison. This change is backward compatible, however, and only adds functionality that is consistent with the `--exclude-where` functionality.
- Loading branch information