Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model is predicting low numbers for nearly all lineages #497

Closed
ArtPoon opened this issue Dec 5, 2023 · 12 comments
Closed

Model is predicting low numbers for nearly all lineages #497

ArtPoon opened this issue Dec 5, 2023 · 12 comments
Assignees

Comments

@ArtPoon
Copy link
Contributor

ArtPoon commented Dec 5, 2023

  • Nearly all lineages are being coloured as having low numbers of unsampled infections.
  • We should switch from using this model as being the default colouration for trees, back to Divergence (residual from clock prediction)
  • Need to go back and check fit and outputs for the number of infections model.
@ArtPoon
Copy link
Contributor Author

ArtPoon commented Dec 12, 2023

Looks like the molecular clock model residuals are off as well.

@ArtPoon ArtPoon self-assigned this Jan 23, 2024
@ArtPoon
Copy link
Contributor Author

ArtPoon commented Jan 23, 2024

Default display has been switched back to divergence by @GopiGugan

@ArtPoon
Copy link
Contributor Author

ArtPoon commented Mar 12, 2024

  • Run pipeline to capture intermediate outputs if necessary
  • See if regression model needs to be updated, or if there is something that has changed for the inputs that is causing the model to go awry

@ArtPoon
Copy link
Contributor Author

ArtPoon commented Mar 14, 2024

  • Downloaded recoded.json and lineage-specific Newick files from Paphlagon to my local machine for some testing
  • the make_beadplots function in batch_utils.py badly needs some refactoring, will split it into multiple functions at least.

@ArtPoon
Copy link
Contributor Author

ArtPoon commented Mar 14, 2024

new branch iss497 56a0786

@ArtPoon
Copy link
Contributor Author

ArtPoon commented Mar 26, 2024

I'll try to wrap up some of this refactoring and then I may need to hand this off to someone else

@ArtPoon
Copy link
Contributor Author

ArtPoon commented Apr 23, 2024

@GopiGugan can you please retrieve by_lineages from the database and run it through the make_beadplots function, and send me a CSV of the summary stats and predicted number of infections for each lineage?

@ArtPoon
Copy link
Contributor Author

ArtPoon commented May 7, 2024

Obtained CSV from @GopiGugan, will analyze

@ArtPoon
Copy link
Contributor Author

ArtPoon commented May 7, 2024

Summary stats seem reasonable:

Scatterplot of predicted number of infections (from HUNePi model) against sample size (number of sequences):

@ArtPoon
Copy link
Contributor Author

ArtPoon commented May 7, 2024

The issue seems to be that there is a small number of outlier lineages with very high predicted numbers of infections. For the diagnostic summary_stats.csv data that @GopiGugan sent me, the maximum predicted number is about 4.4 million, but most of the predicted numbers fall within the range of 10 to 100,000. On a linear scale, this causes most lineages to be coloured purple/blue:
image

@ArtPoon
Copy link
Contributor Author

ArtPoon commented May 7, 2024

A log-transform on mapping predicted numbers of infections to the colour scale should resolve this on the front end

@ArtPoon
Copy link
Contributor Author

ArtPoon commented May 7, 2024

Ok this is fixed, tree colours look much better:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant