Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Recovery of Ambiguous Sites for Fasta-input #280

Merged
merged 2 commits into from
May 11, 2019
Merged

Conversation

emmahodcroft
Copy link
Member

Using ancestral with VCF-input includes a call to TreeTime's recover_var_ambigs function, which recalculates mutations on tip branches to restore ambiguous bases ('N's) at positions where they existed in the original sequence.

TreeTime reconstructs bases at ambiguous sites (where it can), but recovering these back onto tip sequences allows users to accurately see what proportion of sequences actually have information at a given site, as reconstructions could be misleading.

This PR updates the ancestral function with a new flag --keep-ambiguous which can be used to extend the same functionality to Fasta-input sequences.

For example, at a site of interest in Enterovirus...
Without recovering ambiguous sites:
image
This looks fairly straightforward, and one may be tempted to look and see if there are other associations with clusters where a mutation occurred.

With recovering ambiguous sites (note the colour change):
image
It becomes clear that for most sequences we have no data at this site, and there are entire clusters without a sequence at this site which have coloured branches just because of ancestral reconstruction. One should be cautious interpreting mutations at this site.

This update includes a TreeTime version check
This is because of a bug in older TreeTime versions that means using the --keep_ambiguous flag with Fasta-input returns nonsense. Thus, users are not allowed to use the flag if they are not running TreeTime 0.5.6 or newer.

To make this update most useful, it would be great to implement standardised nucleotide/AA colouring in auspice! (Particularly so that -/N/X are always grey.)

@emmahodcroft emmahodcroft requested a review from trvrb May 6, 2019 12:54
@jameshadfield
Copy link
Member

This looks great @emmahodcroft

To make this update most useful, it would be great to implement standardised nucleotide/AA colouring in auspice! (Particularly so that -/N/X are always grey.)

Good idea!

@emmahodcroft
Copy link
Member Author

@jameshadfield I had a little look into this (standard colouring), but it was a little more complicated than I'd anticipated, so didn't attempt anything yet! Are there any particular reasons for why this couldn't be done, in theory, or anything we'd want to preserve about how this works?

@jameshadfield
Copy link
Member

Are there any particular reasons for why this couldn't be done, in theory, or anything we'd want to preserve about how this works?

Not that I can think of -- but we want to make it for genotype colouring only as some datasets may have demes / trait values of X, N etc

…o facilitate adding another command-line argument to control handling of gaps at both ends of the alignment.
@rneher rneher merged commit 1439563 into master May 11, 2019
@rneher rneher deleted the no_reconst_N branch May 11, 2019 16:49
@trvrb
Copy link
Member

trvrb commented May 26, 2019

Chiming in to say that

To make this update most useful, it would be great to implement standardised nucleotide/AA colouring in auspice! (Particularly so that -/N/X are always grey.)

is a great idea. I've made an auspice issue to preserve it: nextstrain/auspice#727.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants