Allow Recovery of Ambiguous Sites for Fasta-input #280
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Using
ancestral
with VCF-input includes a call to TreeTime'srecover_var_ambigs
function, which recalculates mutations on tip branches to restore ambiguous bases ('N's) at positions where they existed in the original sequence.TreeTime reconstructs bases at ambiguous sites (where it can), but recovering these back onto tip sequences allows users to accurately see what proportion of sequences actually have information at a given site, as reconstructions could be misleading.
This PR updates the
ancestral
function with a new flag--keep-ambiguous
which can be used to extend the same functionality to Fasta-input sequences.For example, at a site of interest in Enterovirus...
Without recovering ambiguous sites:
This looks fairly straightforward, and one may be tempted to look and see if there are other associations with clusters where a mutation occurred.
With recovering ambiguous sites (note the colour change):
It becomes clear that for most sequences we have no data at this site, and there are entire clusters without a sequence at this site which have coloured branches just because of ancestral reconstruction. One should be cautious interpreting mutations at this site.
This update includes a TreeTime version check
This is because of a bug in older TreeTime versions that means using the
--keep_ambiguous
flag with Fasta-input returns nonsense. Thus, users are not allowed to use the flag if they are not running TreeTime 0.5.6 or newer.To make this update most useful, it would be great to implement standardised nucleotide/AA colouring in
auspice
! (Particularly so that -/N/X are always grey.)