Refactor regression package and add unit tests and Travis CI configuration #3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull requests refactors the regression package and adds unit tests. For now, the tests just verify that the code doesn't crash. To allow the regression code to be automatically tested, I refactored the actual job into a function that accepts a SparkContext and arguments, then added a
__main__
method that calls it.I used argparse to process command line arguments; it's included in the standard library in Python 2.7+ and available via
pip/easy_install argparse
for earlier Python versions. Down the line, it might be cool to implement some common optional arguments across all of the scripts; for example, there could be a--serializer
option for using a custom PySpark serializer. Argparse has a lot of other cool features; its sub-commands feature could be used to implement a singlethunder
script with multiple sub-commands (likethunder regress
,thunder tuning
, etc).To run the tests locally:
This uses the nosetests test runner.
To convince myself that my refactorings didn't alter the code's behavior, I added a
matdiff.py
script for comparing the .mat files in output directories.I also added a configuration for automatically running the unit tests on Travis CI. Here's the Travis page for my fork, showing test results from my latest commits: https://travis-ci.org/JoshRosen/thunder
If you want to run the Travis tests on your repository after merging this, log into https://travis-ci.org/ with your GitHub account and follow the instructions to enable the GitHub service integration hook for Travis.