-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements to parse.py #496
Conversation
This is far more efficient than repeated calls to str.replace.
Thanks @groutr! Sorry for the delay. this looks overall pretty good. But your implementation has one problem. It doesn't write out the sequences to file. By the time you arrive here: Line 119 in 693372a
the iterator over the sequences is at the end and this writes an empty file. You either need to write these sequences to file as you loop over the input, or load them all into memory. |
Opening/Closing the output file here allows us to avoid loading all of the input file into memory.
Codecov Report
@@ Coverage Diff @@
## master #496 +/- ##
=========================================
Coverage ? 19.16%
=========================================
Files ? 31
Lines ? 5072
Branches ? 1286
=========================================
Hits ? 972
Misses ? 4077
Partials ? 23 Continue to review full report at Codecov.
|
No longer need to check if a field is in tmp_meta.
thanks, this looks good. I'll test a few more use cases tomorrow. |
Description of proposed changes
This PR implements some enhancements for parse.py. Some of the enhancements will improve performance by using common python idioms. Overall, memory used by parse.py should also be less since we don't read the entire fasta file into memory anymore.
Testing
The test suite passes. This PR fixes several suboptimal implementations of things with equivalent replacements as verified with manual testing.
The one place where I changed behavior is with
prettify
. It will now convert "Et Al." to "et al."Thank you for contributing to Nextstrain!