Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for (1) interleaved reads & (2) reading from pipes #213

Merged
merged 4 commits into from
Jan 31, 2023

Conversation

luispedro
Copy link
Contributor

I was exploring adding support for strobealign in NGLess and it would enable some optimizations if (1) it supports the interleaved reads format, including mixing paired and single-end reads and (2) reads can be piped in.

Adding support for the former is actually quite easy and I feel the result is slightly cleaner (less repeated code)

Adding support for piping in reads is a bit more complex because of the read length estimation. But the complexity is all hidden inside the InputBuffer/RewindableFile classes.

@marcelm
Copy link
Collaborator

marcelm commented Jan 30, 2023

Cool, this would be nice to have! I’ll review the code as soon as I can. I’m happy you were able to get rid of some code duplication as well.

Just a note: I usually prefer smaller, self-contained commits (max. one refactoring, bug fix, added feature per commit), but it’s ok now. (No need to change anything now.)

@marcelm marcelm mentioned this pull request Jan 30, 2023
13 tasks
@luispedro
Copy link
Contributor Author

I also generally prefer small contained commits, but these were hard to do such that each individual commit is still correct and passing all the tests, which I also try to do (the PR does contain 3 separate commits, which makes it seem bigger than a single one).

I do have another improvement on my side, which I was waiting for this to be merged, but that is a purely micro-optimization commit to make things just a tiny bit faster.

src/fastq.hpp Outdated Show resolved Hide resolved
src/fastq.hpp Outdated Show resolved Hide resolved
src/main.cpp Show resolved Hide resolved
src/main.cpp Show resolved Hide resolved
src/pc.cpp Outdated Show resolved Hide resolved
src/pc.cpp Show resolved Hide resolved
src/pc.hpp Outdated Show resolved Hide resolved
tests/run.sh Outdated Show resolved Hide resolved
src/fastq.cpp Outdated Show resolved Hide resolved
@luispedro luispedro force-pushed the add_interleaved_support branch 2 times, most recently from 7df973e to ca1e303 Compare January 30, 2023 16:17
Otherwise, the close function gets called multiple times
This also adds support for mixing paired and single end reads in the
input mix

It requires more complexity in reading in records, but it results in
overall simpler code
This enables inputs to be piped into strobealign

It requires keeping in memory some of the data that has been read in so
that it can be processed twice (once for estimating the read length and
then the actual read alignment)
@luispedro
Copy link
Contributor Author

Okay, I resolved the conflicts and cleaned up the history now. The design is cleaner too: there is a single RewindableFile and the KStream is embedded within it so that the two cannot get out of sync.

Copy link
Collaborator

@marcelm marcelm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I like this version!
@ksahlin I would like to merge this if it’s ok with you.

CHANGES.md Outdated
@@ -35,6 +35,8 @@
* Suppress some logging output by default.
* Added option `-v` for showing full logging output.
* Issue #206: Added option `-U` for suppressing output of unaligned reads. Thanks @sjaenick.
* Add support for interleaved reads
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to add a "Thanks @luispedro" on these two bullet points.

@ksahlin
Copy link
Owner

ksahlin commented Jan 30, 2023

Thank you for this PR @luispedro ! Yes @marcelm, I am happy to merge this.

Feel free to add a "Thanks @luispedro" on these two bullet points.

Yes, I think its time to properly add credits to contributors in this repo (and elsewhere).

CHANGES.md Outdated Show resolved Hide resolved
@marcelm
Copy link
Collaborator

marcelm commented Jan 31, 2023

Thanks again!

@marcelm marcelm merged commit e986856 into ksahlin:main Jan 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants