-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
filter: Split filter.py into smaller files #997
Conversation
07d41f8
to
3a120e7
Compare
3a120e7
to
d0ce4f1
Compare
118a499
to
f596267
Compare
Codecov ReportBase: 67.98% // Head: 68.10% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #997 +/- ##
==========================================
+ Coverage 67.98% 68.10% +0.12%
==========================================
Files 57 62 +5
Lines 6656 6682 +26
Branches 1637 1637
==========================================
+ Hits 4525 4551 +26
Misses 1825 1825
Partials 306 306
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-blocking
Organization is up for debate, but keeping the top level
filter.py
around is currently required for the CLI to function properly:
My general tendency is to avoid non-specific names like "support" when organizing code bases. Instead, I'd suggest nesting these new files directly under the augur.filter
namespace instead, e.g. augur.filter.io
, augur.filter.errors
, etc. This in turn means extending the command_name()
function referenced above to handle the case when command.__name__ == command.__package__
, but I expect that to be a small straightforward change.
Untested, but here are two ways to make that change. Smallest diff: diff --git a/augur/__init__.py b/augur/__init__.py
index 31ee6826..a69dcf44 100644
--- a/augur/__init__.py
+++ b/augur/__init__.py
@@ -142,4 +142,9 @@ def command_name(command):
package = command.__package__
module_name = command.__name__
+ # If the command module is also a package, e.g. augur/x/__init__.py instead
+ # of augur/x.py, then use the parent package.
+ if package == module_name:
+ package = package.rsplit(".", 1)[0]
+
return remove_prefix(package, module_name).lstrip(".").replace("_", "-") Reconsidering some unnecessary machinery: diff --git a/augur/__init__.py b/augur/__init__.py
index 31ee6826..0a7186fa 100644
--- a/augur/__init__.py
+++ b/augur/__init__.py
@@ -19,7 +19,7 @@ recursion_limit = os.environ.get("AUGUR_RECURSION_LIMIT")
if recursion_limit:
sys.setrecursionlimit(int(recursion_limit))
-command_strings = [
+COMMAND_NAMES = [
"parse",
"index",
"filter",
@@ -29,10 +29,10 @@ command_strings = [
"refine",
"ancestral",
"translate",
- "reconstruct_sequences",
+ "reconstruct-sequences",
"clades",
"traits",
- "sequence_traits",
+ "sequence-traits",
"lbi",
"distance",
"titers",
@@ -44,7 +44,7 @@ command_strings = [
"measurements",
]
-COMMANDS = [importlib.import_module('augur.' + c) for c in command_strings]
+COMMANDS = [(c, importlib.import_module('augur.' + c.replace("-", "_"))) for c in COMMAND_NAMES]
def make_parser():
parser = argparse.ArgumentParser(
@@ -56,10 +56,10 @@ def make_parser():
add_default_command(parser)
add_version_alias(parser)
- for command in COMMANDS:
+ for command_name, command in COMMANDS:
# Add a subparser for each command.
subparser = subparsers.add_parser(
- command_name(command),
+ command_name,
help = first_line(command.__doc__),
description = command.__doc__)
@@ -129,17 +129,3 @@ def add_version_alias(parser):
nargs = 0,
help = argparse.SUPPRESS,
action = run_version_command)
-
-
-def command_name(command):
- """
- Returns a short name for a command module.
- """
-
- def remove_prefix(prefix, string):
- return re.sub('^' + re.escape(prefix), '', string)
-
- package = command.__package__
- module_name = command.__name__
-
- return remove_prefix(package, module_name).lstrip(".").replace("_", "-") |
Removes the `command_name` function to simplify the command name parsing by directly replacing underscores with dashes. This allows us to create the correct CLI command name for a command regardless of whether the command is a module or a package and without the "unnecessary machinery." Note this change was built on a suggestion by @tsibley in another PR¹, but I'm incorporating here to support the reorganization of commands with subcommands into packages. I chose to do the string replacement for the command name rather than the command module so that we can directly list module names and make the computer do the snake_case to kebab-case conversion for us. ¹ #997 (comment)
Will get back to this once #1002 is done due to overlapping desires. |
13e160d
to
76566bc
Compare
Rebased on latest changes with the notable difference of using |
735fa4e
to
f01c450
Compare
@tsibley I'm still not convinced... or at least the following lines should change under your thinking.
augur/tests/filter/test_subsample.py Line 5 in f01c450
Would you suggest to change them to these? from augur import AugurError from augur.filter import FilterException In my opinion, these are to be used internally, and we should just reference them from the original location so that we as developers can clearly see that |
Ah, I have no issue with internal usage referencing the original location they're defined, and indeed agree it's preferable. My point with #997 (comment) was that if we consider at least some of the functions in If you still don't agree, that's fine, and feel free to drop the re-export! |
@tsibley Ok I understand now! Indeed, So before continuing here, I think we first need to clarify what part of |
b45703b
to
1ff27fc
Compare
1ff27fc
to
b98580c
Compare
b98580c
to
574355b
Compare
See dc4f58b for rationale.
This is an intermediate step before splitting the file even more. This module needs to be prefixed with an underscore otherwise there is a name conflict with the run() function in filter/__init__.py.
Also, validate arguments in the top-level `filter.run`. This allows the imported run() function to be responsible for one less thing.
574355b
to
c00db66
Compare
Rebased onto latest # force-push
git diff 574355b16f630df83baef59edc4a3b8e60c59e4d c00db66c5747faf7bb5a3563368dcbdbfe2211a3 augur/filter tests/filter Outputdiff --git a/augur/filter/_run.py b/augur/filter/_run.py
index 7c599221..eadb03d1 100644
--- a/augur/filter/_run.py
+++ b/augur/filter/_run.py
@@ -10,7 +10,10 @@ from tempfile import NamedTemporaryFile
from augur.errors import AugurError
from augur.index import index_sequences, index_vcf
-from augur.io import open_file, read_metadata, read_sequences, write_sequences, is_vcf as filename_is_vcf, write_vcf
+from augur.io.file import open_file
+from augur.io.metadata import read_metadata
+from augur.io.sequences import read_sequences, write_sequences
+from augur.io.vcf import is_vcf as filename_is_vcf, write_vcf
from .io import cleanup_outputs, read_priority_scores
from .include_exclude_rules import apply_filters, construct_filters
from .subsample import PriorityQueue, TooManyGroupsError, calculate_sequences_per_group, create_queues_by_group, get_groups_for_subsampling
diff --git a/augur/filter/errors.py b/augur/filter/errors.py
deleted file mode 100644
index 54a131c4..00000000
--- a/augur/filter/errors.py
+++ /dev/null
@@ -1,7 +0,0 @@
-from augur.errors import AugurError
-
-
-class FilterException(AugurError):
- """Representation of an error that occurred during filtering.
- """
- pass
diff --git a/augur/filter/include_exclude_rules.py b/augur/filter/include_exclude_rules.py
index d4881a34..0edf407c 100644
--- a/augur/filter/include_exclude_rules.py
+++ b/augur/filter/include_exclude_rules.py
@@ -6,7 +6,7 @@ import pandas as pd
from augur.dates import numeric_date, is_date_ambiguous, get_numerical_dates
from augur.errors import AugurError
-from augur.io import is_vcf as filename_is_vcf
+from augur.io.vcf import is_vcf as filename_is_vcf
from augur.utils import read_strains
from .io import filter_kwargs_to_str
diff --git a/augur/filter/subsample.py b/augur/filter/subsample.py
index f2bb6f31..d0729286 100644
--- a/augur/filter/subsample.py
+++ b/augur/filter/subsample.py
@@ -9,7 +9,6 @@ from typing import Collection
from augur.dates import get_iso_year_week
from augur.errors import AugurError
from . import GROUP_BY_GENERATED_COLUMNS
-from .errors import FilterException
def get_groups_for_subsampling(strains, metadata, group_by=None):
@@ -62,7 +61,7 @@ def get_groups_for_subsampling(strains, metadata, group_by=None):
>>> get_groups_for_subsampling(strains, metadata, group_by)
Traceback (most recent call last):
...
- augur.filter.errors.FilterException: The specified group-by categories (['missing_column']) were not found.
+ augur.errors.AugurError: The specified group-by categories (['missing_column']) were not found.
If we try to group by some columns that exist and some that don't, we allow
grouping to continue and print a warning message to stderr.
@@ -110,9 +109,9 @@ def get_groups_for_subsampling(strains, metadata, group_by=None):
# If we could not find any requested categories, we cannot complete subsampling.
if 'date' not in metadata and group_by_set <= GROUP_BY_GENERATED_COLUMNS:
- raise FilterException(f"The specified group-by categories ({group_by}) were not found. Note that using any of {sorted(GROUP_BY_GENERATED_COLUMNS)} requires a column called 'date'.")
+ raise AugurError(f"The specified group-by categories ({group_by}) were not found. Note that using any of {sorted(GROUP_BY_GENERATED_COLUMNS)} requires a column called 'date'.")
if not group_by_set & (set(metadata.columns) | GROUP_BY_GENERATED_COLUMNS):
- raise FilterException(f"The specified group-by categories ({group_by}) were not found.")
+ raise AugurError(f"The specified group-by categories ({group_by}) were not found.")
# Warn/error based on other columns grouped with 'week'.
if 'week' in group_by_set:
diff --git a/augur/filter/validate_arguments.py b/augur/filter/validate_arguments.py
index 322417e9..b0c81d59 100644
--- a/augur/filter/validate_arguments.py
+++ b/augur/filter/validate_arguments.py
@@ -1,5 +1,5 @@
import sys
-from augur.io import is_vcf as filename_is_vcf
+from augur.io.vcf import is_vcf as filename_is_vcf
SEQUENCE_ONLY_FILTERS = (
diff --git a/tests/filter/test_run.py b/tests/filter/test_run.py
index 71338d64..253160d5 100644
--- a/tests/filter/test_run.py
+++ b/tests/filter/test_run.py
@@ -16,7 +16,7 @@ import augur.filter
import augur.filter._run
import augur.filter.io
import augur.filter.include_exclude_rules
-from augur.io import read_metadata
+from augur.io.metadata import read_metadata
@pytest.fixture
def argparser():
diff --git a/tests/filter/test_subsample.py b/tests/filter/test_subsample.py
index ffe58855..8d1b401f 100644
--- a/tests/filter/test_subsample.py
+++ b/tests/filter/test_subsample.py
@@ -2,7 +2,7 @@ import pytest
import pandas as pd
import augur.filter.subsample
-from augur.filter.errors import FilterException
+from augur.errors import AugurError
@pytest.fixture
@@ -47,7 +47,7 @@ class TestFilterGroupBy:
groups = ['invalid']
metadata = valid_metadata.copy()
strains = metadata.index.tolist()
- with pytest.raises(FilterException) as e_info:
+ with pytest.raises(AugurError) as e_info:
augur.filter.subsample.get_groups_for_subsampling(strains, metadata, group_by=groups)
assert str(e_info.value) == "The specified group-by categories (['invalid']) were not found."
@@ -127,7 +127,7 @@ class TestFilterGroupBy:
metadata = valid_metadata.copy()
metadata = metadata.drop('date', axis='columns')
strains = metadata.index.tolist()
- with pytest.raises(FilterException) as e_info:
+ with pytest.raises(AugurError) as e_info:
augur.filter.subsample.get_groups_for_subsampling(strains, metadata, group_by=groups)
assert str(e_info.value) == "The specified group-by categories (['year']) were not found. Note that using any of ['month', 'week', 'year'] requires a column called 'date'."
@@ -136,7 +136,7 @@ class TestFilterGroupBy:
metadata = valid_metadata.copy()
metadata = metadata.drop('date', axis='columns')
strains = metadata.index.tolist()
- with pytest.raises(FilterException) as e_info:
+ with pytest.raises(AugurError) as e_info:
augur.filter.subsample.get_groups_for_subsampling(strains, metadata, group_by=groups)
assert str(e_info.value) == "The specified group-by categories (['month']) were not found. Note that using any of ['month', 'week', 'year'] requires a column called 'date'."
@@ -145,7 +145,7 @@ class TestFilterGroupBy:
metadata = valid_metadata.copy()
metadata = metadata.drop('date', axis='columns')
strains = metadata.index.tolist()
- with pytest.raises(FilterException) as e_info:
+ with pytest.raises(AugurError) as e_info:
augur.filter.subsample.get_groups_for_subsampling(strains, metadata, group_by=groups)
assert str(e_info.value) == "The specified group-by categories (['year', 'month']) were not found. Note that using any of ['month', 'week', 'year'] requires a column called 'date'." # changes to master
git diff f0f9200cbd7c2a2dcaa209292e161aefebf1dc93 c8900c5cdba6a83312dd350b4a9ad487f63aa7a5 augur/filter.py tests/test_filter.py Outputdiff --git a/augur/filter.py b/augur/filter.py
index 482f6520..65c965c4 100644
--- a/augur/filter.py
+++ b/augur/filter.py
@@ -21,7 +21,10 @@ from typing import Collection
from .dates import numeric_date, numeric_date_type, SUPPORTED_DATE_HELP_TEXT, is_date_ambiguous, get_numerical_dates, get_iso_year_week
from .errors import AugurError
from .index import index_sequences, index_vcf
-from .io import open_file, read_metadata, read_sequences, write_sequences, is_vcf as filename_is_vcf, write_vcf
+from .io.file import open_file
+from .io.metadata import read_metadata
+from .io.sequences import read_sequences, write_sequences
+from .io.vcf import is_vcf as filename_is_vcf, write_vcf
from .utils import read_strains
comment_char = '#'
@@ -105,12 +108,6 @@ def register_parser(parent_subparsers):
return parser
-class FilterException(AugurError):
- """Representation of an error that occurred during filtering.
- """
- pass
-
-
def read_priority_scores(fname):
def constant_factory(value):
return lambda: value
@@ -957,7 +954,7 @@ def get_groups_for_subsampling(strains, metadata, group_by=None):
>>> get_groups_for_subsampling(strains, metadata, group_by)
Traceback (most recent call last):
...
- augur.filter.FilterException: The specified group-by categories (['missing_column']) were not found.
+ augur.errors.AugurError: The specified group-by categories (['missing_column']) were not found.
If we try to group by some columns that exist and some that don't, we allow
grouping to continue and print a warning message to stderr.
@@ -1005,9 +1002,9 @@ def get_groups_for_subsampling(strains, metadata, group_by=None):
# If we could not find any requested categories, we cannot complete subsampling.
if 'date' not in metadata and group_by_set <= GROUP_BY_GENERATED_COLUMNS:
- raise FilterException(f"The specified group-by categories ({group_by}) were not found. Note that using any of {sorted(GROUP_BY_GENERATED_COLUMNS)} requires a column called 'date'.")
+ raise AugurError(f"The specified group-by categories ({group_by}) were not found. Note that using any of {sorted(GROUP_BY_GENERATED_COLUMNS)} requires a column called 'date'.")
if not group_by_set & (set(metadata.columns) | GROUP_BY_GENERATED_COLUMNS):
- raise FilterException(f"The specified group-by categories ({group_by}) were not found.")
+ raise AugurError(f"The specified group-by categories ({group_by}) were not found.")
# Warn/error based on other columns grouped with 'week'.
if 'week' in group_by_set:
diff --git a/tests/test_filter.py b/tests/test_filter.py
index d8b23a3e..580203c5 100644
--- a/tests/test_filter.py
+++ b/tests/test_filter.py
@@ -14,8 +14,8 @@ from Bio.SeqRecord import SeqRecord
from freezegun import freeze_time
import augur.filter
-from augur.io import read_metadata
-from augur.filter import FilterException
+from augur.io.metadata import read_metadata
+from augur.errors import AugurError
@pytest.fixture
def argparser():
@@ -391,7 +391,7 @@ class TestFilterGroupBy:
groups = ['invalid']
metadata = valid_metadata.copy()
strains = metadata.index.tolist()
- with pytest.raises(FilterException) as e_info:
+ with pytest.raises(AugurError) as e_info:
augur.filter.get_groups_for_subsampling(strains, metadata, group_by=groups)
assert str(e_info.value) == "The specified group-by categories (['invalid']) were not found."
@@ -471,7 +471,7 @@ class TestFilterGroupBy:
metadata = valid_metadata.copy()
metadata = metadata.drop('date', axis='columns')
strains = metadata.index.tolist()
- with pytest.raises(FilterException) as e_info:
+ with pytest.raises(AugurError) as e_info:
augur.filter.get_groups_for_subsampling(strains, metadata, group_by=groups)
assert str(e_info.value) == "The specified group-by categories (['year']) were not found. Note that using any of ['month', 'week', 'year'] requires a column called 'date'."
@@ -480,7 +480,7 @@ class TestFilterGroupBy:
metadata = valid_metadata.copy()
metadata = metadata.drop('date', axis='columns')
strains = metadata.index.tolist()
- with pytest.raises(FilterException) as e_info:
+ with pytest.raises(AugurError) as e_info:
augur.filter.get_groups_for_subsampling(strains, metadata, group_by=groups)
assert str(e_info.value) == "The specified group-by categories (['month']) were not found. Note that using any of ['month', 'week', 'year'] requires a column called 'date'."
@@ -489,7 +489,7 @@ class TestFilterGroupBy:
metadata = valid_metadata.copy()
metadata = metadata.drop('date', axis='columns')
strains = metadata.index.tolist()
- with pytest.raises(FilterException) as e_info:
+ with pytest.raises(AugurError) as e_info:
augur.filter.get_groups_for_subsampling(strains, metadata, group_by=groups)
assert str(e_info.value) == "The specified group-by categories (['year', 'month']) were not found. Note that using any of ['month', 'week', 'year'] requires a column called 'date'." Given previous approvals, I'll plan to merge this once checks pass. |
Description of proposed changes
No functional changes here, just splitting a big file up into smaller pieces.
Organization is up for debate.
Note: if #1087 gets merged before this PR, I'll add docs changes here.
Related issue(s)
filter_support/
#937Testing
Tests updated to reflect new organization of files (see #996).