long vectors not supported yet #32

josegarciamanteiga · 2022-05-13T15:41:17Z

Hi,
Thanks for the package. Spectacular results with single cell RNASeq in tumors. I'd like to publish the identification of CAFs as normal cells in my tumors using it and it works smoothly in single datasets from 10X but I tried to mix three samples and got this error:

..../....
Retesting CNVs..
Retesting CNVs..
Retesting CNVs..
Retesting CNVs..
Retesting CNVs..
Finishing..
Finishing..
Finishing..
Finishing..
Finishing..
Error in vec_slice(x_out, x_slicer) :
long vectors not supported yet: ../../src/include/Rinlinedfuns.h:535

Error: Tibble columns must have compatible sizes.

Size 70731: Column 3.
Size 84967: Column 1.
Size 104417: Column 2.
Size 112067: Column 0.
ℹ Only values of size one are recycled.
Backtrace:
█
1. ├─numbat::run_numbat(...)
2. │ └─%>%(...)
3. ├─numbat::run_group_hmms(...)
4. │ └─%>%(...)
5. ├─dplyr::ungroup(.)
6. ├─dplyr::mutate(., seg_start_index = min(snp_index), seg_end_index = max(snp_index))
7. ├─dplyr::group_by(., seg, sample)
8. └─dplyr::bind_rows(.)
9. ├─tibble::as_tibble(dots)

└─tibble:::as_tibble.list(dots)

└─tibble:::lst_to_tibble(x, .rows, .name_repair, col_lengths(x))

  └─tibble:::recycle_columns(x, .rows, lengths)

Warning message:
In mclapply(bulks %>% split(.$sample), mc.cores = ncores, function(bulk) { :
scheduled core 2 encountered error in user code, all values of the job will be affected
Execution halted

I used pileup_and_phase.R without problems on a bam merged from the cellranger bams where I substituted the "-1" at the end of the barcodes to avoid collisions after using cellranger aggr to generate barcodes.
The error is thrown by run_numbat run with 64GB and 12 cores.
Thanks for the help
Jose

The text was updated successfully, but these errors were encountered:

evanbiederstedt · 2022-05-13T21:39:09Z

Hi @josegarciamanteiga

The error is actually from R itself: https://github.com/wch/r-source/blob/trunk/src/include/Rinlinedfuns.h

This used to be a more common error in R before version...3 maybe?

There's possibly something we could do to fix this. We'll investigate.

For context:
https://stackoverflow.com/questions/24335692/large-matrices-in-r-long-vectors-not-supported-yet
https://support.bioconductor.org/p/118016/

Best, Evan

teng-gao · 2022-05-14T01:59:39Z

Hi @josegarciamanteiga,

Thanks for the issue! Are the three samples from the same individual (so that they have the same germline SNP profile)? If so, there's no need to merge the bams manually; You can supply multiple BAMs and barcode files to pileup_and_phase.R and it will produce a consensus VCF for the individual the allele data frames for each sample. More details here:
https://kharchenkolab.github.io/numbat/articles/numbat.html#preparing-data

Best,
Teng

josegarciamanteiga · 2022-05-16T11:11:21Z

Dear Teng,
Thanks for the reply! Two out of three are indeed from the same individual. I have used them now to run pileup_and_phase.R as you advised and indeed produced the data without errors. But now, with run_numbat.R, how should I give the two gene x umi matrices and the allele data tables? My point would be to have the posteriors and all the numbat output taking into account both samples so that I can load it onto a Seurat/Pagoda scRNA-Seq that contains an integration of both datasets.

As for the 'long vectors error', it is strange since it is running with R 4.0.3, here the sessionInfo() for further details:

library(numbat)
sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /home/garciamanteiga.jose/.conda/envs/numbat/lib/libblas.so.3.8.0
LAPACK: /home/garciamanteiga.jose/.conda/envs/numbat/lib/liblapack.so.3.8.0

locale:
[1] LC_CTYPE=en_US.utf-8 LC_NUMERIC=C
[3] LC_TIME=en_US.utf-8 LC_COLLATE=en_US.utf-8
[5] LC_MONETARY=en_US.utf-8 LC_MESSAGES=en_US.utf-8
[7] LC_PAPER=en_US.utf-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.utf-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] numbat_0.1.0

loaded via a namespace (and not attached):
[1] treeio_1.14.4 tidyselect_1.1.1 purrr_0.3.4
[4] graphlayouts_0.8.0 lattice_0.20-45 ggfun_0.0.5
[7] colorspace_2.0-3 vctrs_0.3.8 generics_0.1.2
[10] viridisLite_0.4.0 utf8_1.2.2 gridGraphics_0.5-1
[13] rlang_0.4.12 pillar_1.7.0 glue_1.6.2
[16] DBI_1.1.2 tweenr_1.0.2 rvcheck_0.1.8
[19] lifecycle_1.0.1 stringr_1.4.0 munsell_0.5.0
[22] gtable_0.3.0 parallel_4.0.3 fansi_1.0.2
[25] tidygraph_1.2.0 Rcpp_1.0.7 scales_1.1.1
[28] BiocManager_1.30.16 jsonlite_1.8.0 farver_2.1.0
[31] gridExtra_2.3 ggforce_0.3.3 ggplot2_3.3.2
[34] digest_0.6.29 aplot_0.1.2 stringi_1.7.6
[37] dplyr_1.0.7 ggrepel_0.9.1 polyclip_1.10-0
[40] grid_4.0.3 ggtree_2.4.2 tools_4.0.3
[43] yulab.utils_0.0.4 logger_0.2.2 magrittr_2.0.2
[46] lazyeval_0.2.2 patchwork_1.1.1 tibble_3.1.6
[49] ggraph_2.0.5 crayon_1.5.0 ape_5.6-2
[52] tidyr_1.1.2 pkgconfig_2.0.3 tidytree_0.3.9
[55] MASS_7.3-55 ellipsis_0.3.2 data.table_1.14.2
[58] ggplotify_0.1.0 extraDistr_1.9.1 assertthat_0.2.1
[61] viridis_0.6.2 R6_2.5.1 igraph_1.2.11
[64] nlme_3.1-155 compiler_4.0.3

teng-gao · 2022-05-17T03:52:05Z

Hi @josegarciamanteiga,

The error occurred because there were more than one individual's genotypes in the allele data. Only data from the same individual should be provided to pileup_and_phase.R and run_numbat. If you have two samples from the same individual, you can concatenate the gene count matrices (e.g. cbind) and allele dataframes (e.g. rbind) as input to run_numbat. If the third sample belongs to a separate individual, I would run it separately. If you want to plot the single-cell posteriors in an integrated expression embedding from different samples/individuals, you can combine the posterior dataframes (e.g. nb$joint_post, nb$clone_post) after reading in the results for each individual separately. For more info on the output, please see this tutorial.

Thanks,
Teng

josegarciamanteiga · 2022-05-17T14:24:17Z

Ok, thanks for the info. I thought something like that (cbind/rbind) would be the solution. As for showing the posteriors of different samples combined on an integrated object, that was my first working hypothesis, but I was not sure I could then interpret the results well, as the posteriors for normal vs tumor are intra-sample. Indeed, my point of analyzing them together was because in one sample I had very few normal cells and I wondered whether running phasing/numbat altogether could aid, but now I see it is not possible for the individual germline in normal cells is key. I think I will then visualize the single-dataset calls (for samples coming from different individuals) on an integrated object. Thanks again for a great package and help! Jose

…

-------------------------------------- Jose M. Garcia Manteiga PhD Computational Biologist Center for Translational Genomics and BioInformatics Dibit2-Basilica, 4A3 San Raffaele Scientific Institute Via Olgettina 58, 20132 Milano (MI), Italy Tel: +39-02-2643-9211 e-mail: ***@***.*** Il giorno mar 17 mag 2022 alle ore 05:52 Teng Gao ***@***.***> ha scritto:

Hi @josegarciamanteiga <https://github.com/josegarciamanteiga>, The error occurred because there were more than one individual's genotypes in the allele data. Only data from the same individual should be provided to pileup_and_phase.R and run_numbat. If you have two samples from the same individual, you can concatenate the gene count matrices (e.g. cbind) and allele dataframes (e.g. rbind) as input to run_numbat. If the third sample belongs to a separate individual, I would run it separately. If you want to plot the single-cell posteriors in an integrated expression embedding from different samples/individuals, you can combine the posterior dataframes (e.g. nb$joint_post, nb$clone_post) after reading in the results for each individual separately. For more info on the output, please see this tutorial <https://kharchenkolab.github.io/numbat/articles/visualization.html#single-cell-cnv-calls> . Thanks, Teng — Reply to this email directly, view it on GitHub <#32 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2UOMIPYRKOCJCLJGYDQVDVKMJXFANCNFSM5V3YLCSQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

teng-gao closed this as completed May 17, 2022

teng-gao pushed a commit that referenced this issue May 19, 2022

adding checks for allele df #32

b81b51e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

long vectors not supported yet #32

long vectors not supported yet #32

josegarciamanteiga commented May 13, 2022 •

edited

Loading

evanbiederstedt commented May 13, 2022

teng-gao commented May 14, 2022

josegarciamanteiga commented May 16, 2022

teng-gao commented May 17, 2022

josegarciamanteiga commented May 17, 2022 via email

long vectors not supported yet #32

long vectors not supported yet #32

Comments

josegarciamanteiga commented May 13, 2022 • edited Loading

evanbiederstedt commented May 13, 2022

teng-gao commented May 14, 2022

josegarciamanteiga commented May 16, 2022

teng-gao commented May 17, 2022

josegarciamanteiga commented May 17, 2022 via email

josegarciamanteiga commented May 13, 2022 •

edited

Loading