[filter] Index input sequences and stream output sequences #627

Reduces memory needed by augur filter to a smaller constant value instead of loading all input sequences into memory once and then again when writing sequences out to disk. The primary improvements here are: 1. Use a BioPython index data structure [1] that tracks where each sequence is on disk but does not load sequences into memory. This structure acts the same as the dictionary structure we originally used except sequences are loaded lazily when they are requested. 2. Use an iterator to write sequences back to disk. BioPython's SeqIO write method accepts an iterator that allows us to stream sequences back to disk without first loading them into memory as a list of sequence objects. [1] http://biopython.org/DIST/docs/tutorial/Tutorial.html#sec66

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[filter] Index input sequences and stream output sequences #627

[filter] Index input sequences and stream output sequences #627

Commits on Nov 5, 2020