-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAPQ scores #25
Comments
Link to mentioned paper: A tandem simulation framework for predicting mapping quality The current MAPQ computation function in StrobeAlign uses the formula from minimap2, see Section 2.3.1 in the paper. |
I don't see any section numbers in the linked paper, both links above point to the same one: A tandem simulation framework for predicting mapping quality |
Sorry, I shouldn’t comment so late at night – I have fixed the link now. |
@ksahlin sent me this comment by e-mail and asked me to post it here:
( |
Copying @PaulPyl’s comment from #79:
|
I temporarily changed |
Oh, made a mistake when computing the median, it’s actually ~1780. And with the fix in #84 merged, this went down to 378. |
After getting more evidence that MAPQ scores are not computed accurately from this benchmark I looked at how I describe the MAPQ score calculation in the paper. It is, in retrospect, obvious that we should not simply compute the MAPQ score from the seed-score ratio of the first and second best hit. We should compute the score from the ratio of the alignment scores whenever possible. Guiding anecdata: IIRC, strobealign's variant calling sensitivity improved significantly without similar decrease in precision when simply putting all non-zero mapq scores values to highest possible confidence (60?). While this is not a good idea, it could suggest that the current mapq values may be too conservative. |
First, there was a bug in MAPQ computation due to initializing best_score to -1000. In the first iteration, min_mapq_diff is updated thus: min_mapq_diff = std::max(0, alignment.score - best_score); This is just alignment.score + 1000, so most of the time, min_mapq_diff ends up being something like 1600 or so for typical read lengths. (This is clamped to 60 in the output, but still.) Here, we keep track of the highest and second-highest alignment score in best_score and second_best_score. Then, mapq is computed such that it is * 60 for second_best_score=0 and * 0 for second_best_score=best_score, where anything in between is interpolated. See #25
First, there was a bug in MAPQ computation due to initializing best_score to -1000. In the first iteration, min_mapq_diff is updated thus: min_mapq_diff = std::max(0, alignment.score - best_score); This is just alignment.score + 1000, so most of the time, min_mapq_diff ends up being something like 1600 or so for typical read lengths. (This is clamped to 60 in the output, but still.) Here, we keep track of the highest and second-highest alignment score in best_score and second_best_score. Then, mapq is computed such that it is * 60 for second_best_score=0 and * 0 for second_best_score=best_score, where anything in between is interpolated. See #25
First, there was a bug in MAPQ computation due to initializing best_score to -1000. In the first iteration, min_mapq_diff is updated thus: min_mapq_diff = std::max(0, alignment.score - best_score); This is just alignment.score + 1000, so most of the time, min_mapq_diff ends up being something like 1600 or so for typical read lengths. (This is clamped to 60 in the output, but still.) Here, we keep track of the highest and second-highest alignment score in best_score and second_best_score. Then, mapq is computed such that it is * 60 for second_best_score=0 and * 0 for second_best_score=best_score, where anything in between is interpolated. See #25
First, there was a bug in MAPQ computation due to initializing best_score to -1000. In the first iteration, min_mapq_diff is updated thus: min_mapq_diff = std::max(0, alignment.score - best_score); This is just alignment.score + 1000, so most of the time, min_mapq_diff ends up being something like 1600 or so for typical read lengths. (This is clamped to 60 in the output, but still.) Here, we keep track of the highest and second-highest alignment score in best_score and second_best_score. Then, mapq is computed such that it is * 60 for second_best_score=0 and * 0 for second_best_score=best_score, where anything in between is interpolated. See #25
First, there was a bug in MAPQ computation due to initializing best_score to -1000. In the first iteration, min_mapq_diff is updated thus: min_mapq_diff = std::max(0, alignment.score - best_score); This is just alignment.score + 1000, so most of the time, min_mapq_diff ends up being something like 1600 or so for typical read lengths. (This is clamped to 60 in the output, but still.) Here, we keep track of the highest and second-highest alignment score in best_score and second_best_score. Then, mapq is computed such that it is * 60 for second_best_score=0 and * 0 for second_best_score=best_score, where anything in between is interpolated. See #25
First, there was a bug in MAPQ computation due to initializing best_score to -1000. In the first iteration, min_mapq_diff is updated thus: min_mapq_diff = std::max(0, alignment.score - best_score); This is just alignment.score + 1000, so most of the time, min_mapq_diff ends up being something like 1600 or so for typical read lengths. (This is clamped to 60 in the output, but still.) Here, we keep track of the highest and second-highest alignment score in best_score and second_best_score. Then, mapq is computed such that it is * 60 for second_best_score=0 and * 0 for second_best_score=best_score, where anything in between is interpolated. See #25
Note to developer:
It is not clear if strobealign has optimal mapq scores for variant calling. For example, having an accurate MAPQ score is crucial for SNP calling. See this tweet thread and this subthread.
For info, a relevant paper about this is found here.
The text was updated successfully, but these errors were encountered: