Add beginnings of unit test coverage of `align` #463

elebow · 2020-03-24T02:19:52Z

Description of proposed changes

This PR is intended to serve as an RFC on future unit testing strategy. I'm a software developer with substantial experience in Python development but minimal (essentially layperson) awareness of computational genomics. I would love to contribute to the project, and unit testing seems to be an area where I can help.

A robust unit test suite enables faster development of application code and more confidence in the results it produces. Augur currently has 13% test coverage.

This PR adds coverage of two methods in align.py, and a convenience script to ease running of the test suite. If this pattern is acceptable, I would be pleased for it to serve as a model for future unit test coverage. This PR alone increases the test coverage of align.py from 12% to 22%, though it was low-hanging fruit.

Related issue(s)

none

Testing

😃

tolot27 · 2020-03-24T17:32:48Z

Unit testing would be really helpful. The issue #462 would have been discovered with unit testing. I'd love to see this PR merged soon. Then I can extend with tests for a PR I'll create soon.

danielsoneg · 2020-03-24T19:42:47Z

+1 to adding more tests!

One quick suggestion - I've found it useful in the past to make the unit tests as granular as possible - in other words, instead of just test_check_duplicates, we'd do test_check_duplicates_no_dupes, test_check_duplicates_string, test_check_duplicates_seq. That both makes it explicit from the tests what the expected behavior under each condition is and also makes it easier to spot the exact error that the tests are throwing.

Also, pytest has a bunch of really nice capabilities, especially fixtures and tmp_path, that I think could be really useful for this project.

huddlej

I love this direction and we absolutely need more tests throughout augur. Thank you for taking the first step toward improving our coverage!

All of the unit tests you’ve added work for me. I have a couple of specific questions/comments that I’ve included inline with the code. The major takeaways are:

pytest-cov needs to be added to the dev dependencies for the augur python package with the appropriate version pinning
- Related to this, we should update our pinned versions of pytest to support 5.*. This change could happen as part of this PR to confirm the coverage plugin is working as expected.
pytest configuration options can be moved into the existing pytest.python3.ini file and run_tests.sh can be simplified to a single line (it is nice having this shell script instead of having to remember how to run pytest)

We’ve just merged some new tests for augur align (courtesy of @danielsoneg) that are represented as docstrings. For simple tests like this, doctests are slightly preferable to unit tests because they get exported to our API docs and improve that documentation. It looks like the pytest coverage plugin accounts for doctests, so we can improve our coverage with either of these approaches. For tests that require more complex input data, unit tests with fixtures will be more helpful. We currently test our frequencies module in this way.

huddlej · 2020-03-24T20:04:00Z

run_tests.sh

+	coverage_arg='--cov=augur'
+fi
+
+python3 -m pytest -s $coverage_arg "$@"


I would prefer to keep all pytest configuration settings in one place. The pytest config options (-s and --cov=augur) can be moved into the existing pytest config file and always executed. Then this run_tests script could be a single line that runs python -m pytest -c pytest.python3.ini.

👍 on moving -s.

See comment below about omitting --cov when -k is specified.

run_tests.sh

elebow · 2020-03-24T23:05:29Z

On doctest vs standalone tests:

I feel that doctests are most useful as documentation, rather than verification. They're excellent as usage examples, but they can quickly become large and detract from readability of the code. For example, the test code in this PR for check_duplicates is 51 lines, and that's not even a complex function.

Doctests also make it hard to use pytest facilities like fixtures, and many editors don't do syntax highlighting of code in comments.

There's little downside to having both doctests and standalone tests, so my preference is usually:

Write doctests primarily as usage examples. Include the happy paths, but don't allow them to grow overly large in an effort to achieve complete verification.
Write standalone tests for complete coverage of all possible (reasonable) cases.

elebow · 2020-03-24T23:07:03Z

One quick suggestion - I've found it useful in the past to make the unit tests as granular as possible - in other words, instead of just test_check_duplicates, we'd do test_check_duplicates_no_dupes, test_check_duplicates_string, test_check_duplicates_seq.

Sounds good to me. I've done it both ways in the past.

danielsoneg · 2020-03-24T23:14:28Z

I'm a rando off the street as well, so I'll defer to the maintainers, but given a bit of experience with large/long-lasting projects -

I'd agree with @elebow on unit tests vs doctests. I think the doctests are great, and they've been useful to me as examples, but they're definitely limited, especially around setup and teardown. The strip_non_reference tests are leaving a file hanging around because I couldn't find a comfortable way to clean up afterward.

Re --cov, I think my inclination would be to put the "complete" set of test-running flags into the pytest.python3.ini file. I already run a limited set ad-hoc while developing, but I think I'd consider that an "advanced user" feature - for anyone just dropping in out of the blue, the standard "here's how you run tests" docs should run the full suite sufficient to convince us all that new code is good.

elebow · 2020-03-24T23:58:12Z

Regarding fixtures:

I left them out of this PR to make for a more incremental changeset to introduce some testing patterns. I'm fully 👍 on using pytest fixtures, and possibly pytest-factoryboy in a later PR. I'm also happy to include them in this PR if that is preferred.

huddlej · 2020-03-25T00:25:54Z

Ok, this all sounds excellent and I'm super excited we're having this conversation! @elebow and @danielsoneg, I hope you don't mind if I collect your comments into an improved "Testing" section of our augur dev docs. :)

I see how the -k flag and optional running of test coverage provides the advanced user functionality that simply running pytest with defaults wouldn’t.

@elebow, can you make the following small changes, then, and I’ll update the testing docs and Travis CI config accordingly?

Move -s into pytest.python3.ini
Add pytest-cov to setup.py in the dev dependencies with an appropriate version
Update pytest’s version in setup.py to the latest ("pytest >=3.2.1, ==3.*", -> "pytest >=5.4.1, ==5.*",)

Then in a separate PR I will:

Update the “testing” section of the dev docs to reflect this discussion
Update Travis CI config to run run_tests.sh
Rename pytest.python3.ini to the default pytest.ini, to save some typing

Then I think we can start accepting more unit tests built along the same approach you've used here.

elebow · 2020-03-25T00:42:56Z

I was already working on some changes; I hope this is okay for this PR:

moved -s into pytest.python3.ini
added line breaks and comments to pytest.python3.ini
run_tests.sh now includes -c pytest.python3.ini, so the complete test suite can be run with just ./run_tests.sh
added pytest-cov to dev dependencies
bumped version of pytest to 5.4.*
split the check_duplicates monolithic test function into a set of smaller ones

Push coming in a few minutes.

Regarding having Travis CI run run_tests.sh:

That's probably fine, but it might be better to have Travis continue to invoke pytest -c pytest.python3.ini.

run_tests.sh provides a convenience for humans, and we might want to add things there in the future that are not appropriate for Travis—for example, --color=always. It's probably fine for Travis to run the script, though.

hameerabbasi · 2020-03-25T10:35:06Z

run_tests.sh

@@ -0,0 +1,16 @@
+#!/bin/sh


I would suggest removing this file and instead adding --no-cov when needed to pytest.

hameerabbasi · 2020-03-25T10:35:38Z

pytest.python3.ini

+
+	# run any doctests found in .py modules
+	--doctest-modules
+


Add --cov here, --no-cov can override it.

hameerabbasi · 2020-03-25T10:37:11Z

pytest.python3.ini

+	# run any doctests found in .py modules
+	--doctest-modules
+
+	augur/


Add these to testpaths, see here: https://github.com/pydata/sparse/blob/master/pytest.ini#L5-L6

willingc · 2020-03-25T13:18:07Z

@huddlej @elebow @danielsoneg Suggestions by @hameerabbasi about Python testing are good practice in the Python world. While doctests are nice and can enhance docs, I would consider them separate from test files using pytest. Running pytest with coverage in CI is good practice. If the tests take a long time to run, Azure Pipelines has been very good for execution speed of tests.

One way to increase coverage would be to determine which source files need tests and add those to an issue with a checklist and folks can leave a message on the issue to claim work on a file and check off the box when PRs are merged.

Also introduce the convenience script `run_tests.sh`. This also adds two items to .gitignore: `.coverage` is generated by pytest-cov `.mypy_cache` is generated by the static code analyzer mypy

elebow · 2020-03-25T15:29:20Z

👍 thanks for the feedback.

Added --cov=augur to pytest.python3.ini, so it runs even in CI.
Changed run_tests.sh to add --no-cov when -k is specified. Note that this causes a warning when running the script; see pytest gives "Coverage disabled via --no-cov switch!" warning even --no-cov is not given!! pytest-dev/pytest-cov#284.
Left run_tests.sh in the PR because it still has value as convenience for human users when doing TDD.
Moved the test paths to from addopts to testpaths attribute in the ini.

Changes pushed.

huddlej · 2020-03-25T19:13:07Z

This works for me! Thank you everyone for all of your help!

@willingc Would you be interested in taking lead on the issue creation for specific modules to test that you mentioned? I am also happy to do this, but I'll definitely be a bottleneck as I'm working from home with a small child. :)

willingc · 2020-03-25T19:30:25Z

Sure thing @huddlej. No problem as I am on vacation (or staycation) this week. I'll add the issue.

huddlej self-requested a review March 24, 2020 02:47

huddlej added the enhancement New feature or request label Mar 24, 2020

huddlej requested changes Mar 24, 2020

View reviewed changes

huddlej mentioned this pull request Mar 24, 2020

Want to contribute? Welcome! Please start here. #464

Open

tolot27 mentioned this pull request Mar 24, 2020

implement mafft's autodirection of reverse-complemented sequences #467

Merged

elebow force-pushed the add-test-coverage-align branch 2 times, most recently from c956425 to 4cdd718 Compare March 25, 2020 01:48

hameerabbasi reviewed Mar 25, 2020

View reviewed changes

elebow added 2 commits March 25, 2020 11:28

Add beginnings of unit test coverage of align

5c382f9

Also introduce the convenience script `run_tests.sh`. This also adds two items to .gitignore: `.coverage` is generated by pytest-cov `.mypy_cache` is generated by the static code analyzer mypy

Upgrade pytest version to 5.4.*

00bcfc8

elebow force-pushed the add-test-coverage-align branch from 4cdd718 to 00bcfc8 Compare March 25, 2020 15:29

huddlej approved these changes Mar 25, 2020

View reviewed changes

huddlej merged commit e80bff5 into nextstrain:master Mar 25, 2020

jameshadfield mentioned this pull request Mar 26, 2020

Add testing guide #480

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add beginnings of unit test coverage of `align` #463

Add beginnings of unit test coverage of `align` #463

elebow commented Mar 24, 2020

tolot27 commented Mar 24, 2020

danielsoneg commented Mar 24, 2020

huddlej left a comment

huddlej Mar 24, 2020

elebow Mar 24, 2020

elebow commented Mar 24, 2020

elebow commented Mar 24, 2020

danielsoneg commented Mar 24, 2020

elebow commented Mar 24, 2020

huddlej commented Mar 25, 2020

elebow commented Mar 25, 2020

hameerabbasi Mar 25, 2020

hameerabbasi Mar 25, 2020

hameerabbasi Mar 25, 2020

willingc commented Mar 25, 2020

elebow commented Mar 25, 2020

huddlej commented Mar 25, 2020

willingc commented Mar 25, 2020

Add beginnings of unit test coverage of align #463

Add beginnings of unit test coverage of align #463

Conversation

elebow commented Mar 24, 2020

Description of proposed changes

Related issue(s)

Testing

tolot27 commented Mar 24, 2020

danielsoneg commented Mar 24, 2020

huddlej left a comment

Choose a reason for hiding this comment

huddlej Mar 24, 2020

Choose a reason for hiding this comment

elebow Mar 24, 2020

Choose a reason for hiding this comment

elebow commented Mar 24, 2020

elebow commented Mar 24, 2020

danielsoneg commented Mar 24, 2020

elebow commented Mar 24, 2020

huddlej commented Mar 25, 2020

elebow commented Mar 25, 2020

hameerabbasi Mar 25, 2020

Choose a reason for hiding this comment

hameerabbasi Mar 25, 2020

Choose a reason for hiding this comment

hameerabbasi Mar 25, 2020

Choose a reason for hiding this comment

willingc commented Mar 25, 2020

elebow commented Mar 25, 2020

huddlej commented Mar 25, 2020

willingc commented Mar 25, 2020

Add beginnings of unit test coverage of `align` #463

Add beginnings of unit test coverage of `align` #463