-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor filters into separate functions #745
Conversation
Codecov Report
@@ Coverage Diff @@
## master #745 +/- ##
==========================================
+ Coverage 31.51% 31.58% +0.07%
==========================================
Files 41 41
Lines 5674 5755 +81
Branches 1373 1391 +18
==========================================
+ Hits 1788 1818 +30
- Misses 3812 3860 +48
- Partials 74 77 +3
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactors filter logic into separate function with the same signature of `func(metadata, **kwargs)` that returns a `set` of strain names that pass the filter. Although this work does not reduce the complexity of the code by itself, it sets up a pattern that will allow us to move all filters into a single loop through all user-requested filters. This change should simplify the main logic and also allow us to short-cut evaluation when filters remove all possible strains (e.g., `--exclude-all`), avoiding unnecessary checks. This refactoring also includes new functions for sequence-based filters. As part of these sequence-based functions, we update the sequence index data frame to be indexed by strain name to be consistent with the metadata data frame. One side-effect of this refactoring is the additional of a functional test for both `--include-where` and `--exclude-where` filters to make sure these are properly implemented and no regressions occur during refactoring. The lack of this test initially allowed the refactoring of `--exclude-where` logic to introduce a bug. Finally, we also define a new function to include strains by a query. Note that this implementation relies on the same query parser used by the `--exclude-where` argument which allows the negation operator and also the code that lowercases the strings before comparison. This change is backward compatible, however, and only adds functionality that is consistent with the `--exclude-where` functionality.
dbf000f
to
0710b50
Compare
Description of proposed changes
Refactors filter logic into separate function with the same signature of
func(metadata, **kwargs)
that returns aset
of strain names that pass the filter. Although this work does not reduce the complexity of the code by itself, it sets up a pattern that will allow us to move all filters into a single loop through all user-requested filters. This change should simplify the main logic and also allow us to short-cut evaluation when filters remove all possible strains (e.g.,--exclude-all
), avoiding unnecessary checks.This refactoring also includes new functions for sequence-based filters. As part of these sequence-based functions, we update the sequence index data frame to be indexed by strain name to be consistent with the metadata data frame.
One side-effect of this refactoring is the addition of a functional test for both
--include-where
and--exclude-where
filters to make sure these are properly implemented and no regressions occur during refactoring. The lack of this test initially allowed the refactoring of--exclude-where
logic to introduce a bug.Finally, we also define a new function to include strains by a query. Note that this implementation relies on the same query parser used by the
--exclude-where
argument which allows the negation operator and also the code that lowercases the strings before comparison. This change is backward compatible, however, and only adds functionality that is consistent with the--exclude-where
functionality.Related issue(s)
Builds on work in the
pandas-metadata
branch and PR #743Testing
Adds doctests for all new filter functions. Test these functions locally by installing augur with dev dependencies (
python3 -m pip install .[dev]
) with./run_tests -k filter_by
.