"No CNV remains after filtering" using default settings on 10x lymphoma data #55

juliashay · 2022-11-08T00:41:24Z

I wanted to use Numbat to investigate the copy number variations in a publicly available 10x lymphoma dataset. My attempts ended with Numbat finding no copy number variations in this dataset, which presumably should have some.

Steps to reproduce:

Run a docker with Numbat: docker run -v /home/jupyter/docker_test:/mnt/mydata -it pkharchenkolab/numbat-rbase:latest /bin/bash
Download and extract the following files from 10x link: save the feature matrix along with barcodes and features from lymph_node_lymphoma_14k_filtered_feature_bc_matrix.tar.gz as lymphoma_matrix.mtx, lymphoma_barcodes.tsv and lymphoma_features.tsv; save the BAM file and its index from lymph_node_lymphoma_14k_gex_possorted_bam.bam as lymphoma.bam and lymphoma.bam.bai.
Inside the docker, run

Rscript /numbat/inst/bin/pileup_and_phase.R --label lymphoma --samples lymphoma --bams /mnt/mydata/lymphoma.bam --barcodes /mnt/mydata/lymphoma_barcodes.tsv --outdir /mnt/mydata/lymphoma --gmap /Eagle_v2.4.1/tables/genetic_map_hg38_withX.txt.gz --snpvcf /data/genome1K.phase3.SNP_AF5e2.chr1toX.hg38.vcf --paneldir /data/1000G_hg38 --ncores 16

to generate an allele count file.

Prepare the matrix and allele count file to work with Numbat

mtx <- as(Matrix::readMM("/mnt/mydata/lymphoma_matrix.mtx"), "dgCMatrix")
df_allele_cnt <- read.csv("/mnt/mydata/lymphoma/lymphoma_allele_counts.tsv", sep = "\t")
features <- read.csv("/mnt/mydata/lymphoma_features.tsv", sep = "\t", header = F)
barcodes <- read.csv("/mnt/mydata/lymphoma_barcodes.tsv", sep = "\t", header = F)
dimnames(mtx) = list(features$V2, barcodes$V1)

duplicate_genes = duplicated(features$V2)
features_new = features[!duplicate_genes, ]
mtx_new = mtx[!duplicate_genes, ]

Run Numbat:

run_numbat(mtx_new, ref_hca, df_allele_cnt, genome = "hg38", t = 1e-5, ncores = 4, plot = TRUE, out_dir = "/mnt/mydata/lymphoma_results")

Expected result:
I expected Numbat to find some copy number variations in this dataset.

Actual result:
Numbat runs for 1 iteration, then terminates with "No CNV after filtering - terminating" message (see attached image).

Please help me understand the results I am getting: are there settings that should be different? Is the result expected for this dataset?

The text was updated successfully, but these errors were encountered:

teng-gao · 2022-11-08T04:55:17Z

Thanks for the detailed report. We previously analyzed this sample before and it should find some CNVs. Could you attach the outputs (bulk_subtrees_1.png, exp_roll_clust.png, and joint_post_1.tsv) in the results folder?

juliashay · 2022-11-08T17:46:02Z

Thanks for the detailed report. We previously analyzed this sample before and it should find some CNVs.

That's good to know.

Could you attach the outputs (bulk_subtrees_1.png, exp_roll_clust.png, and joint_post_1.tsv) in the results folder?

In my run, Numbat terminated before it generated the results folder and any of these files.

teng-gao · 2022-11-08T18:02:00Z

That would be surprising, because the output folder is created before any of the analysis steps are executed:

https://github.com/kharchenkolab/numbat/blob/main/R/main.R#L109-L112

Please double-check - I'm almost sure that the result folder shouldn't be empty. The output files should also be generated at each step, not all at once at the last step.

teng-gao · 2022-11-09T22:11:37Z

Hi @juliashay ,

I was able to reproduce your results. The CNVs are filtered out because of the max_entropy threshold. You can check by

fread('~/lymphoma/lymphoma_results/joint_post_1.tsv') %>% distinct(seg, avg_entropy) %>% count(avg_entropy < 0.5)

This is also documented in the FAQ section.

The uncertainty in single-cell posterior is high because of the high noise in expression logFC (ref_hca is not a great match for this dataset; better use a custom reference). Alternatively, change max_entropy=0.8 and the workflow should succeed.

teng-gao closed this as completed Nov 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"No CNV remains after filtering" using default settings on 10x lymphoma data #55

"No CNV remains after filtering" using default settings on 10x lymphoma data #55

juliashay commented Nov 8, 2022

teng-gao commented Nov 8, 2022

juliashay commented Nov 8, 2022

teng-gao commented Nov 8, 2022 •

edited

Loading

teng-gao commented Nov 9, 2022 •

edited

Loading

"No CNV remains after filtering" using default settings on 10x lymphoma data #55

"No CNV remains after filtering" using default settings on 10x lymphoma data #55

Comments

juliashay commented Nov 8, 2022

teng-gao commented Nov 8, 2022

juliashay commented Nov 8, 2022

teng-gao commented Nov 8, 2022 • edited Loading

teng-gao commented Nov 9, 2022 • edited Loading

teng-gao commented Nov 8, 2022 •

edited

Loading

teng-gao commented Nov 9, 2022 •

edited

Loading