Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running numbat in multiple samples from same patient #176

Closed
ccruizm opened this issue Mar 12, 2024 · 1 comment
Closed

Running numbat in multiple samples from same patient #176

ccruizm opened this issue Mar 12, 2024 · 1 comment

Comments

@ccruizm
Copy link

ccruizm commented Mar 12, 2024

Good day!

I have read that it is possible to run Numbar in several samples (#111 , #142 ). I have been trying to apply it to my data where different samples come from the same patient (different tumor areas). In this case, I run pileup_and_phase.R in each sample individually and then merge the allele_counts.tsv.gz into one data frame. However, when running run_numbat I get
Error in check_allele_df(df_allele): Inconsistent SNP genotypes; Are cells from two different individuals mixed together?.

I checked, and after merging both df_allele and running.

snps_test = df %>% 
        filter(GT != '') %>% 
        group_by(snp_id) %>%
        summarise(
            n = length(unique(GT))
        )

I indeed see that some snp_id have n=2 (which causes the error). Could you please tell me how I should use Numbat in case of multiple samples, please? Do I need to run pileup_and_phase.R for all the samples of interest? I have ovelarping barcodes among samples. Do I need to rename them in the BAM file? Or is there a easier way to do it that I am missing?

Thanks in advance!

@teng-gao
Copy link
Collaborator

teng-gao commented Apr 15, 2024

The right thing to do is to run pileup_and_phase.R jointly for all samples belonging to the same patient. Please refer to the doc for how to do this
https://kharchenkolab.github.io/numbat/articles/numbat.html#preparing-data
Renaming clashing barcodes is not necessary as the bulk aggregate is used for genotyping

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants