Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find smallest subtree with all visible data when using genotype filters #1276

Merged
merged 1 commit into from
Jan 26, 2021

Conversation

jameshadfield
Copy link
Member

@jameshadfield jameshadfield commented Jan 26, 2021

Previously we were not recomputing the MRCA of the the filtered nodes if genotype filters were applied, which resulted in the "zoom to selected" button behaving as if the genotype filters did not exist.

For non-genotype ("normal") filters, given a set X of visible nodes, we simultaneously find the MRCA of X and add the paths from the MRCA to X.

For genotype filtering, we wish to find the MRCA[1] of X without modifying X. Note that this allows situations where MRCA \not\in X. This is introduced in this commit via the function findFilteredMRCA which uses a 3 step process:

  1. Find the basal-most nodes of each (potentially non-monophyletic) visible cluster
  2. Identify the paths from the root to the nodes in (1).
  3. Find the first fork in this set of paths

image

Filtering to clade 20C and genotype S 452 R. Left: "Zoom to selected" from PR #1265 wrongly identifies the MRCA of clade 20C and does not take into accound the genotype filtering. Right: This PR

image

Filtering to the homoplasic mutation 501Y. Left: In PR #1265, filtering to genotypes only did not allow one to "zoom to selected". Right: We can zoom into the subtree containing all samples with 501Y (which happens to be quite close to the root of the tree)

[1] Is there a better word than MRCA? I'm not suggesting that this is the node where the genotypes originated, rather the node which contains the smallest subtree with all of the filtered nodes within it.

Previously we were not recomputing the MRCA of the the filtered nodes if genotype filters were applied, which resulted in the "zoom to selected" button behaving as if the genotype filters did not exist.

For non-genotype ("normal") filters, given a set `X` of visible nodes, we simultaneously find the MRCA  of `X` and add the paths from the MRCA to `X`.

For genotype filtering, we wish to find the MRCA of `X` _without_ modifying `X`.  Note that this allows situations where `MRCA \not\in X`. This is introduced in this commit via the function `findFilteredMRCA` which uses a 3 step process:

1. Find the basal-most nodes of each (potentially non-monophyletic) visible cluster
2. Identify the paths from the root to the nodes in (1).
3. Find the first fork in this set of paths
@jameshadfield jameshadfield temporarily deployed to auspice-mrca-of-visible-vyywkk January 26, 2021 01:35 Inactive
@trvrb
Copy link
Member

trvrb commented Jan 26, 2021

This behavior looks great to me. Functions exactly as I'd expect it. I think that "MCRA" is appropriate. It's the MRCA of the set of tips in question. You can have the MRCA for an arbitrary set of tips.

@jameshadfield jameshadfield merged commit 6b58c32 into filter-by-genotype Jan 26, 2021
@jameshadfield jameshadfield deleted the mrca-of-visible branch January 26, 2021 04:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants