-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very short soft-clipped ends #54
Comments
I agree with this. Probably an artefact of using SSW local alignment mode. I used ksw2 before, but found its extension mode to occasionally yield some strange alignments over indels, hence switched (anecdata, unfortunately didn't log the event anyware). Having the right third party extension alignment tool is room for future work. |
marcelm
added a commit
that referenced
this issue
Feb 23, 2023
marcelm
added a commit
that referenced
this issue
Feb 24, 2023
marcelm
added a commit
that referenced
this issue
Feb 24, 2023
This was referenced Feb 24, 2023
Closed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
All the reads in the phiX test dataset happen to start with a single
N
base (an artifact of picking the first 100 reads from the run and not random ones). For a read that otherwise matches without errors, StrobeAlign reports the alignment as1S300=
.I found this to be unexpected because BWA-MEM reports this as
301M
(with theN
considered to be a mismatch as one can see from theMD
tag). BWA-MEM penalizes soft-clipping (option-L
) with a default penalty of 5, so it’ll prefer (at most) one mismatch (penalty 4) over soft clipping.On the other hand, minimap2 also soft clips and reports
1S300M
.I think that penalizing soft clipping is beneficial when aligning short reads. It is not important for shotgun sequencing, but for targeted sequencing (amplicons), soft clipping single bases introduces a bias: Any variation at that position in the reference cannot be observed. For minimap2, it’s not so important because it is primarily (AFAIK) for longer reads.
This is probably not a high-priority issue, but I wanted to at least write it down because I was suprised when inspecting the test BAM output.
The text was updated successfully, but these errors were encountered: