Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measurements command #879

Merged
merged 21 commits into from
Jun 13, 2022
Merged

Measurements command #879

merged 21 commits into from
Jun 13, 2022

Conversation

joverlee521
Copy link
Contributor

Description of proposed changes

Adds the measurements subcommand with two sub-subcommands, export and concat.
See commits for details.

Related issue(s)

Based on proposal in #869

Testing

Added unit tests for the changes in the validate subcommand.
Added functional tests for measurements export and measurements concat.

@joverlee521 joverlee521 requested a review from a team March 31, 2022 03:02
@codecov
Copy link

codecov bot commented Mar 31, 2022

Codecov Report

Merging #879 (fc70c8b) into master (f68c448) will decrease coverage by 22.84%.
The diff coverage is 29.18%.

❗ Current head fc70c8b differs from pull request most recent head 8fbdc82. Consider uploading reports for the commit 8fbdc82 to get more accurate results

@@             Coverage Diff             @@
##           master     #879       +/-   ##
===========================================
- Coverage   59.36%   36.51%   -22.85%     
===========================================
  Files          43       43               
  Lines        6014     7292     +1278     
  Branches     1539     2138      +599     
===========================================
- Hits         3570     2663      -907     
- Misses       2185     4474     +2289     
+ Partials      259      155      -104     
Impacted Files Coverage Δ
augur/__init__.py 31.25% <ø> (-48.12%) ⬇️
augur/export_v2.py 15.39% <0.00%> (-53.50%) ⬇️
augur/measurements.py 7.81% <7.81%> (ø)
augur/utils.py 50.25% <66.66%> (-14.10%) ⬇️
augur/validate.py 53.89% <79.24%> (-13.44%) ⬇️
augur/__main__.py 0.00% <0.00%> (-100.00%) ⬇️
augur/traits.py 7.25% <0.00%> (-73.58%) ⬇️
augur/export.py 33.33% <0.00%> (-66.67%) ⬇️
augur/export_v1.py 9.91% <0.00%> (-64.34%) ⬇️
augur/ancestral.py 10.34% <0.00%> (-61.21%) ⬇️
... and 34 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f68c448...8fbdc82. Read the comment docs.

@huddlej huddlej self-requested a review March 31, 2022 12:49
huddlej added a commit to nextstrain/seasonal-flu that referenced this pull request Apr 19, 2022
Adds a new Snakemake file that mirrors `Snakefile_WHO` and adds logic to
build measurements JSONs for HA-based builds with the new `augur
measurements` command [1]. The current prototype only works for H3N2.

[1] nextstrain/augur#879
@joverlee521 joverlee521 mentioned this pull request Apr 20, 2022
Copy link
Contributor

@huddlej huddlej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice work, @joverlee521! This is probably the most thoroughly documented and tested a new Augur subcommand has ever been. The implementation is generally quite clear and easy to follow. The API docs and tests go a long way toward demonstrating how to use this new command. We might eventually want a how-to guide or tutorial for using this, but that's better for a later PR.

Although I've requested changes, my comments below are primarily stylistic. The exception are the comments about how functions are defined in the new measurements.py module (which are more about function than style). We can chat about these comments more synchronously with anyone else who has feedback on the command.

augur/measurements.py Outdated Show resolved Hide resolved
augur/measurements.py Outdated Show resolved Hide resolved
augur/measurements.py Outdated Show resolved Hide resolved
augur/measurements.py Outdated Show resolved Hide resolved
augur/measurements.py Outdated Show resolved Hide resolved
tests/functional/measurements_concat.t Outdated Show resolved Hide resolved
docs/api/augur.measurements.rst Show resolved Hide resolved
augur/validate.py Outdated Show resolved Hide resolved
augur/measurements.py Outdated Show resolved Hide resolved
tests/functional/measurements_export.t Outdated Show resolved Hide resolved
huddlej and others added 21 commits June 10, 2022 13:36
Updates lists of valid panels in Auspice config JSON schema, Auspice
JSON v2 schema, and the export v2 command's argparse choices. These
changes allow users to add "measurements" to the list of panels in an
Auspice config JSON or the command line.
A schema for the measurements collection config JSON to be supplied to
`augur measurements export`.

Basically a copy of the config properties for a collection within the
measurements schema. In the future, we could look into using
`jsonschema.RefResolver` to use refs that refer to a separate schema
file.
Validates the provided measurements collection config JSON against the
JSON schema `augur/data/schema-measurements-collection-config.json`
Adds validations for constraints on the values within measurements and
collection config JSONs that cannot be checked via JSON schema
validation. These include:

1. A collection's fields, groupings, and filters are valid fields in the
collection's measurements.
2. A collection's display default group-by is included in the groupings
3. All collections within measurements JSON have unique keys
4. The default collection value is a valid key that matches one of the
the collections
The `augur measurements export` subcommand creates the measurements JSON
for a single collection of measurements provided in a TSV.

The most basic measurements export command takes the command-line
options for the required fields to create the minimal measurements JSON.
Add ability to provide config options via a collection config JSON or
via command line args. The config JSON includes all available configs
but there are only command line args for configs that have 1:1 options.
Nested options such as fields' titles and groupings' orders have been
excluded to reduce complexity of command line args.

Similar to the auspice export command, the command line args will
override the values of the config JSON.
The `augur measurements concat` subcommand concatentates multiple
measurements JSONs into a single measurements JSON. Depends on the
measurements validation to verify each measurements JSON is valid and
the final produced measurments JSON is valid.
1. Pass long strings as multiple strings to `print()` and let it handle
the formatting of the output with a default space separator so we don't
have think about it.

2. Standardizes error messages by removing '«»' in favor of Python's
built-in `repr()` formatting with `!r` in f-strings.
1. consistently capitalize "Auspice"
2. fully spell out "Concatenate"
3. add generic description for `--threshold` option
A subset of optional args (currently just `title` and `x-axis-label`)
need a default value for Auspice but we also do not want to overwrite
the config file with the argument default values every time. Use a
global dict `DEFAULT_ARGS` to hold these default values instead of the
argparse `default` argument.
Seems like a handy argparse action that can be used by other modules.
Suggested by @huddlej in review
Only write stdout to `/dev/null` so that we can track the expected
error messages in the Cram tests.
Suggested by @huddlej in review.
The error printed within the function seems like a side effect. The
code would be clearer if the error printing is handled directly with
the boolean check.
If any of the groupings provided by the user does not exist as a column,
then exit command with an error message to ensure that these are not
ignored by the user. If any defined grouping does not exist, then it
might indicate a greater error such as passing the wrong collection TSV.
Makes it clearer that these functions will read and return the file
contents. Similar to other `io` functions in Augur, the validation
of the file structure is an expected side effect.
Renamed argument group variables for export as well to differentiate
the argument groups for each subcommand.
Follow the pattern used by most Augur subcommands, which uses the
argparse.Namespace object in their "main" logic
Allows users to specify a list of columns from their collection TSV to
include in the output measurements JSON. All columns will be included by
default if option is not provided.

The 'strain' and 'value' columns do not have to be included in the list
since these are required columns. However, other configuration columns
(e.g. groupings) will need to be explicitly included in the list if
users are using the option. This ensures that we do not give unexpected
outputs by auto-including grouping columns.

Includes new functional test for the new option.
The measurements sub-subcommands are very straightforward that is seems
to add unnecessary complexity by breaking out into functions that are
mainly used to raise errors that then have to be caught by the "main"
function.
@joverlee521
Copy link
Contributor Author

Rebased on master and fixed up some small fix commits. I'll merge after all of the CI tests have passed.

@huddlej
Copy link
Contributor

huddlej commented Jun 13, 2022

@joverlee521 Looks great! Let's merge this, release it, and start using it in our refactored flu builds. 🚀

@joverlee521 joverlee521 merged commit 95d9d11 into master Jun 13, 2022
@joverlee521 joverlee521 deleted the measurements-command branch June 13, 2022 21:52
@victorlin victorlin added this to the Major release 16.0.0 milestone Jun 15, 2022
@victorlin victorlin mentioned this pull request Apr 29, 2024
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

4 participants