Phylogeny on merged samples #130

freddie090 · 2023-06-23T11:16:32Z

Hi,

I have multiple samples from an experiment that have ~6000 cells of 10X scRNA data each. If I were to try and run NUMBAT on the entire merged experiment the BAMs would be too big.

Is it possible to run NUMBAT on the independent samples, but then merge the samples for the phylogeny part of the analysis? (for example, as was done in figure 5a of the NUMBAT paper).

Additionally, if I were to subset the BAMs given some high quality cells, merge the subsetted samples and then run NUMBAT on these merged BAMs/expression matrices, would this improve the robustness of the analysis? ie is there any advantage to the samples being processed simultaneously vs independently?

Thanks

teng-gao · 2023-06-28T16:52:18Z

Hi @freddie090 ,

You can genotype the samples (from the same individual) using the multi-sample mode of pileup_and_phase (you can provide a list of BAMs), and provide the combined count_mat and alelle_df in a single numbat run. Numbat should be able to handle 6000 cells fine.

The advantage is that you get consistent CNV and clone calls across samples and get an integrated phylogeny. Genotyping using multiple samples can also improve phasing accuracy.

Best,
Teng

freddie090 · 2023-07-07T14:04:43Z

Hi Teng,

Ah great, okay - my hunch was the inference would be more robust if it had access to information from all samples at once. I'll give it a go!

Thanks -
Freddie

freddie090 · 2023-07-26T07:55:56Z

Hi @teng-gao - sorry, just to clarify:

After running pileup and phase where I provide a list of BAM files and corresponding sample names (as comma separated values as a single argument, e.g.: -- samples samp_1,samp_2,samp_3 \ --bams samp_1.bam,samp_2.bam,samp_3.bam the script produces an 'allele_counts.tsv' file for each sample.

Has the multi sample mode worked? I wasn't sure whether I should expect a single combined allele counts table for all samples. If not, then do you suggest manually merging the expression matrix and allele counts for each sample before running Numbat?

Best
Freddie

teng-gao · 2023-07-27T15:20:18Z

Yes you should get a separate allele count df for each sample. You can then concatenate them (ditto for expression count matrix) before feeding to run_numbat.

freddie090 · 2023-07-27T19:00:31Z

Okay - and sorry final Q @teng-gao - are the sample identities preserved somewhere for distinguishing in the phylogeny plots later?

teng-gao · 2023-07-31T21:50:09Z

Okay - and sorry final Q @teng-gao - are the sample identities preserved somewhere for distinguishing in the phylogeny plots later?

You can plot the sample identities associated with cell barcodes on a sidebar using the annot = option in plot_phylo_heatmap:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phylogeny on merged samples #130

Phylogeny on merged samples #130

freddie090 commented Jun 23, 2023

teng-gao commented Jun 28, 2023

freddie090 commented Jul 7, 2023

freddie090 commented Jul 26, 2023

teng-gao commented Jul 27, 2023

freddie090 commented Jul 27, 2023 •

edited

Loading

teng-gao commented Jul 31, 2023

Phylogeny on merged samples #130

Phylogeny on merged samples #130

Comments

freddie090 commented Jun 23, 2023

teng-gao commented Jun 28, 2023

freddie090 commented Jul 7, 2023

freddie090 commented Jul 26, 2023

teng-gao commented Jul 27, 2023

freddie090 commented Jul 27, 2023 • edited Loading

teng-gao commented Jul 31, 2023

freddie090 commented Jul 27, 2023 •

edited

Loading