Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add beginnings of unit test coverage of align #463

Merged
merged 2 commits into from
Mar 25, 2020

Conversation

elebow
Copy link
Contributor

@elebow elebow commented Mar 24, 2020

Description of proposed changes

This PR is intended to serve as an RFC on future unit testing strategy. I'm a software developer with substantial experience in Python development but minimal (essentially layperson) awareness of computational genomics. I would love to contribute to the project, and unit testing seems to be an area where I can help.

A robust unit test suite enables faster development of application code and more confidence in the results it produces. Augur currently has 13% test coverage.

This PR adds coverage of two methods in align.py, and a convenience script to ease running of the test suite. If this pattern is acceptable, I would be pleased for it to serve as a model for future unit test coverage. This PR alone increases the test coverage of align.py from 12% to 22%, though it was low-hanging fruit.

Related issue(s)

none

Testing

😃

@huddlej huddlej self-requested a review March 24, 2020 02:47
@huddlej huddlej added the enhancement New feature or request label Mar 24, 2020
@tolot27
Copy link
Contributor

tolot27 commented Mar 24, 2020

Unit testing would be really helpful. The issue #462 would have been discovered with unit testing. I'd love to see this PR merged soon. Then I can extend with tests for a PR I'll create soon.

@danielsoneg
Copy link

+1 to adding more tests!

One quick suggestion - I've found it useful in the past to make the unit tests as granular as possible - in other words, instead of just test_check_duplicates, we'd do test_check_duplicates_no_dupes, test_check_duplicates_string, test_check_duplicates_seq. That both makes it explicit from the tests what the expected behavior under each condition is and also makes it easier to spot the exact error that the tests are throwing.

Also, pytest has a bunch of really nice capabilities, especially fixtures and tmp_path, that I think could be really useful for this project.

Copy link
Contributor

@huddlej huddlej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this direction and we absolutely need more tests throughout augur. Thank you for taking the first step toward improving our coverage!

All of the unit tests you’ve added work for me. I have a couple of specific questions/comments that I’ve included inline with the code. The major takeaways are:

  • pytest-cov needs to be added to the dev dependencies for the augur python package with the appropriate version pinning
    • Related to this, we should update our pinned versions of pytest to support 5.*. This change could happen as part of this PR to confirm the coverage plugin is working as expected.
  • pytest configuration options can be moved into the existing pytest.python3.ini file and run_tests.sh can be simplified to a single line (it is nice having this shell script instead of having to remember how to run pytest)

We’ve just merged some new tests for augur align (courtesy of @danielsoneg) that are represented as docstrings. For simple tests like this, doctests are slightly preferable to unit tests because they get exported to our API docs and improve that documentation. It looks like the pytest coverage plugin accounts for doctests, so we can improve our coverage with either of these approaches. For tests that require more complex input data, unit tests with fixtures will be more helpful. We currently test our frequencies module in this way.

run_tests.sh Outdated
coverage_arg='--cov=augur'
fi

python3 -m pytest -s $coverage_arg "$@"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to keep all pytest configuration settings in one place. The pytest config options (-s and --cov=augur) can be moved into the existing pytest config file and always executed. Then this run_tests script could be a single line that runs python -m pytest -c pytest.python3.ini.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 on moving -s.

See comment below about omitting --cov when -k is specified.

run_tests.sh Show resolved Hide resolved
@elebow
Copy link
Contributor Author

elebow commented Mar 24, 2020

On doctest vs standalone tests:

I feel that doctests are most useful as documentation, rather than verification. They're excellent as usage examples, but they can quickly become large and detract from readability of the code. For example, the test code in this PR for check_duplicates is 51 lines, and that's not even a complex function.

Doctests also make it hard to use pytest facilities like fixtures, and many editors don't do syntax highlighting of code in comments.

There's little downside to having both doctests and standalone tests, so my preference is usually:

  1. Write doctests primarily as usage examples. Include the happy paths, but don't allow them to grow overly large in an effort to achieve complete verification.
  2. Write standalone tests for complete coverage of all possible (reasonable) cases.

@elebow
Copy link
Contributor Author

elebow commented Mar 24, 2020

One quick suggestion - I've found it useful in the past to make the unit tests as granular as possible - in other words, instead of just test_check_duplicates, we'd do test_check_duplicates_no_dupes, test_check_duplicates_string, test_check_duplicates_seq.

Sounds good to me. I've done it both ways in the past.

@danielsoneg
Copy link

I'm a rando off the street as well, so I'll defer to the maintainers, but given a bit of experience with large/long-lasting projects -

I'd agree with @elebow on unit tests vs doctests. I think the doctests are great, and they've been useful to me as examples, but they're definitely limited, especially around setup and teardown. The strip_non_reference tests are leaving a file hanging around because I couldn't find a comfortable way to clean up afterward.

Re --cov, I think my inclination would be to put the "complete" set of test-running flags into the pytest.python3.ini file. I already run a limited set ad-hoc while developing, but I think I'd consider that an "advanced user" feature - for anyone just dropping in out of the blue, the standard "here's how you run tests" docs should run the full suite sufficient to convince us all that new code is good.

@elebow
Copy link
Contributor Author

elebow commented Mar 24, 2020

Regarding fixtures:

I left them out of this PR to make for a more incremental changeset to introduce some testing patterns. I'm fully 👍 on using pytest fixtures, and possibly pytest-factoryboy in a later PR. I'm also happy to include them in this PR if that is preferred.

@huddlej
Copy link
Contributor

huddlej commented Mar 25, 2020

Ok, this all sounds excellent and I'm super excited we're having this conversation! @elebow and @danielsoneg, I hope you don't mind if I collect your comments into an improved "Testing" section of our augur dev docs. :)

I see how the -k flag and optional running of test coverage provides the advanced user functionality that simply running pytest with defaults wouldn’t.

@elebow, can you make the following small changes, then, and I’ll update the testing docs and Travis CI config accordingly?

  • Move -s into pytest.python3.ini
  • Add pytest-cov to setup.py in the dev dependencies with an appropriate version
  • Update pytest’s version in setup.py to the latest ("pytest >=3.2.1, ==3.*", -> "pytest >=5.4.1, ==5.*",)

Then in a separate PR I will:

  • Update the “testing” section of the dev docs to reflect this discussion
  • Update Travis CI config to run run_tests.sh
  • Rename pytest.python3.ini to the default pytest.ini, to save some typing

Then I think we can start accepting more unit tests built along the same approach you've used here.

@elebow
Copy link
Contributor Author

elebow commented Mar 25, 2020

I was already working on some changes; I hope this is okay for this PR:

  • moved -s into pytest.python3.ini
  • added line breaks and comments to pytest.python3.ini
  • run_tests.sh now includes -c pytest.python3.ini, so the complete test suite can be run with just ./run_tests.sh
  • added pytest-cov to dev dependencies
  • bumped version of pytest to 5.4.*
  • split the check_duplicates monolithic test function into a set of smaller ones

Push coming in a few minutes.


Regarding having Travis CI run run_tests.sh:

That's probably fine, but it might be better to have Travis continue to invoke pytest -c pytest.python3.ini.

run_tests.sh provides a convenience for humans, and we might want to add things there in the future that are not appropriate for Travis—for example, --color=always. It's probably fine for Travis to run the script, though.

@elebow elebow force-pushed the add-test-coverage-align branch 2 times, most recently from c956425 to 4cdd718 Compare March 25, 2020 01:48
@@ -0,0 +1,16 @@
#!/bin/sh

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest removing this file and instead adding --no-cov when needed to pytest.


# run any doctests found in .py modules
--doctest-modules

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add --cov here, --no-cov can override it.

# run any doctests found in .py modules
--doctest-modules

augur/

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@willingc
Copy link
Contributor

@huddlej @elebow @danielsoneg Suggestions by @hameerabbasi about Python testing are good practice in the Python world. While doctests are nice and can enhance docs, I would consider them separate from test files using pytest. Running pytest with coverage in CI is good practice. If the tests take a long time to run, Azure Pipelines has been very good for execution speed of tests.

One way to increase coverage would be to determine which source files need tests and add those to an issue with a checklist and folks can leave a message on the issue to claim work on a file and check off the box when PRs are merged.

Also introduce the convenience script `run_tests.sh`.

This also adds two items to .gitignore:
  `.coverage` is generated by pytest-cov
  `.mypy_cache` is generated by the static code analyzer mypy
@elebow
Copy link
Contributor Author

elebow commented Mar 25, 2020

👍 thanks for the feedback.

Changes pushed.

@huddlej
Copy link
Contributor

huddlej commented Mar 25, 2020

This works for me! Thank you everyone for all of your help!

@willingc Would you be interested in taking lead on the issue creation for specific modules to test that you mentioned? I am also happy to do this, but I'll definitely be a bottleneck as I'm working from home with a small child. :)

@huddlej huddlej merged commit e80bff5 into nextstrain:master Mar 25, 2020
@willingc
Copy link
Contributor

Sure thing @huddlej. No problem as I am on vacation (or staycation) this week. I'll add the issue.

@jameshadfield jameshadfield mentioned this pull request Mar 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants