Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error during 2. Iteration [multiplexing] #49

Closed
Thapeachydude opened this issue Oct 30, 2022 · 9 comments
Closed

Error during 2. Iteration [multiplexing] #49

Thapeachydude opened this issue Oct 30, 2022 · 9 comments

Comments

@Thapeachydude
Copy link

Hello,

I'm running into an error during the second iteration. Curiously, the error doesn't occur for other samples in the dataset and also not for this one when segs_loh = NULL. So, it appears to be an issue for this combination of sample, reference and segs_loh.

Iteration 2
Mem used: 10.3Gb
Running HMMs on 5 cell groups..
Running HMMs on 4 cell groups..                                                                                                      
less than 5% of genome is in neutral region - including LOH in baseline
Testing for multi-allelic CNVs ..
0 multi-allelic CNVs found: 
Evaluating CNV per cell ..
Mem used: 10.5Gb
Excluding clonal LOH regions .. 
All cells succeeded
Expanding allelic states..
No multi-allelic CNVs, skipping ..
No multi-allelic CNVs, skipping ..
No multi-allelic CNVs, skipping ..
Building phylogeny ..
Mem used: 10.6Gb
Using 6 CNVs to construct phylogeny
Using UPGMA tree as seed..
Mem used: 10.6Gb
Iter 2 -1496.78014009687 0.16s
Iter 3 -1496.78014009687 0.11s
Found 1 normal cells..
Error:
! Tibble columns must have compatible sizes.
• Size 46923: Column `2`.
• Size 94963: Column `3`.
ℹ Only values of size one are recycled.
Run `rlang::last_error()` to see where the error occurred.
There were 16 warnings (use warnings() to see them)

The subtrees_2.rds and clones_2.rds files can be found in the output directory, but not bulk_clones_final.tsv.gz. Therefore, I assume the error occurs somewhere in between.

I'm running numbat using a custom reference (normalized expression values) from a public dataset.

ref_internal <- aggregate_counts(refCountMat, refAnnoTab)

bulk <- get_bulk(count_mat = patMat,
                   lambdas_ref = ref_internal,
                   df_allele = alleleCounts, gtf = gtf_hg38 
  )

segs_loh <- bulk %>% detect_clonal_loh(t = 1e-4)


run_numbat(
      count_mat = patMat,
      lambdas_ref = ref_internal, 
      df_allele = alleleCounts,
      segs_loh = segs_loh,
      genome = "hg38", gamma = 20, min_cells = 50, multi_allelic = TRUE,
      t = 1e-5, max_iter = 2, max_nni = 100, min_LLR = 5,
      max_entropy = 0.5, tau = 0.3,
      ncores = 10, 
      plot = TRUE
    )

Any help would be appreciated. : )

@teng-gao
Copy link
Collaborator

teng-gao commented Oct 30, 2022

Hi @Thapeachydude,

This is probably because running HMM on a pseudobulk consisting only 1 cell triggers an error. Please feel free to send me a reproducible example and I'll try to fix.

Best,
Teng

PS: is panel_2.png also plotted? If so, you can just use all the other outputs as usual.

@Thapeachydude
Copy link
Author

Thapeachydude commented Nov 1, 2022

Hi @teng-gao

thanks for the quick reply! Yes, the panel_2.png is plotted.

An RDS object with the normalized count matrix, reference pseudobulks and allele counts can be found here.

Many thanks!

@teng-gao
Copy link
Collaborator

teng-gao commented Nov 2, 2022

Thanks, I will take a look. It seems that only the last step (plotting subclone-level HMM profiles) is affected and all the other outputs should still be fine.

@teng-gao
Copy link
Collaborator

teng-gao commented Nov 7, 2022

Hi @Thapeachydude,

I took a look at your bulk HMM profile aggregating all cells, and it seems that the allele dataframe is contaminated by many homozygous SNPs. This usually happens when multiple individual's SNPs are mixed together in pileup_and_phase step. How did you produce the allele count dataframe and are you sure that only cells from one individual is included?

image

Thanks,
Teng

@Thapeachydude
Copy link
Author

Thapeachydude commented Nov 11, 2022

Hi @teng-gao,

we have a multiplexed 10x lane (8-10 individuals). For each individual I provided the bam file and barcodes for that 10x lane and then filtered the resulting allele counts for the cells of that individual.

I guess this is where the SNP mixes come from. I can try to re-run the allele counting with only the barcodes for this individual. Would the bam files also have to be split or can I provide them per 10x lane?

Cheers

@teng-gao
Copy link
Collaborator

teng-gao commented Nov 11, 2022

Hi @Thapeachydude,

Yes, that explains it. Providing only the barcode for each individual while using the same bam should be fine. You also need to run the script separately for each individual, not providing the barcode files all at once.

@teng-gao teng-gao changed the title Error during 2. Iteration Error during 2. Iteration [multiplexing] Nov 15, 2022
@teng-gao teng-gao changed the title Error during 2. Iteration [multiplexing] Error during 2. Iteration Nov 15, 2022
@Thapeachydude
Copy link
Author

Hi @teng-gao,

re-ran things per individual with only the barcodes for that specific individual and it worked. Output looks a lot better (was wondering about the high rate of CNLoh). Thanks for taking a look!

Only a minor question remains: I spotted some calls of "bamp", what exactly would that mean? bi-allelic amplicifaction?

@teng-gao teng-gao changed the title Error during 2. Iteration Error during 2. Iteration [multiplexing] Nov 15, 2022
@teng-gao
Copy link
Collaborator

Yes, BAMP is balanced gains of both homologous chromosomes such as 2:2 (tetraploid) and 3:3, and so on.

@Thapeachydude
Copy link
Author

Cool, again many thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants