Error: Must group by variables found in `.data`. * Column `seg` is not found. * Column `sample` is not found. When analyzing two samples bound together #33

josegarciamanteiga · 2022-05-18T14:12:08Z

Hi again,
As you suggested, I used pileup_and_phase.R with two samples using ",". Then I used cbind and rbind on count matrices and allele dataframes by substituting before the "-1" suffix on barcodes in the second files for a "-2".
The code was going smoothly up to the fifth 'Retesting CNVs.." where it threw this:

Error: Must group by variables found in .data.
*** Column seg is not found.

Column sample is not found.**
Backtrace:
█

├─numbat::run_numbat(...)
│ └─%>%(...)
├─numbat::run_group_hmms(...)
│ └─%>%(...)
├─dplyr::ungroup(.)
├─dplyr::mutate(., seg_start_index = min(snp_index), seg_end_index = max(snp_index))
├─dplyr::group_by(., seg, sample)
└─dplyr:::group_by.data.frame(., seg, sample)
└─dplyr::group_by_prepare(.data, ..., .add = .add, caller_env = caller_env())

Thanks again!
Jose

The text was updated successfully, but these errors were encountered:

josegarciamanteiga · 2022-05-18T14:39:27Z

Let me give the complete log:

Running under parameters:
t = 0.001
alpha = 1e-04
gamma = 20
min_cells = 20
init_k = 3
sample_size = 1e+05
max_cost = 2316.3
max_iter = 2
min_depth = 0
use_loh = auto
multi_allelic = TRUE
min_LLR = 50
max_entropy = 0.6
skip_nj = FALSE
exclude_normal = FALSE
diploid_chroms =
ncores = 12
common_diploid = TRUE
Input metrics:
7721 cells
Approximating initial clusters using smoothed expression ..
number of genes left: 10579
Iteration 1
Fitted reference proportions:
NK:5.3e-07,Fibroblast:0.24,Macrophage:0.21,CD4+T:0.019,CD8+T:5.3e-07,Endothelial:0.089,Myeloid:0.013,Monocyte:0.015,Dendritic:0.0022,Plasma:0.031,B:0.053,Epithelial:0.33
number of genes left: 10475
Fitted reference proportions:
NK:0.0017,Fibroblast:0.24,Macrophage:0.18,CD4+T:0.046,CD8+T:5.4e-07,Endothelial:0.11,Myeloid:0.016,Monocyte:0.013,Dendritic:0.004,Plasma:0.019,B:0.1,Epithelial:0.27
Fitted reference proportions:
NK:0.0011,Fibroblast:0.27,Macrophage:0.2,CD4+T:0.018,CD8+T:5.5e-07,Endothelial:0.087,Myeloid:0.015,Monocyte:0.013,Dendritic:0.0026,Plasma:0.03,B:0.07,Epithelial:0.3
number of genes left: 10703
number of genes left: 10487
Fitted reference proportions:
NK:0.0014,Fibroblast:0.25,Macrophage:0.19,CD4+T:0.028,CD8+T:5.5e-07,Endothelial:0.094,Myeloid:0.015,Monocyte:0.014,Dendritic:0.003,Plasma:0.029,B:0.071,Epithelial:0.3
Fitted reference proportions:
NK:0.0011,Fibroblast:0.25,Macrophage:0.2,CD4+T:0.019,CD8+T:5.4e-07,Endothelial:0.089,Myeloid:0.014,Monocyte:0.014,Dendritic:0.0026,Plasma:0.031,B:0.06,Epithelial:0.31
number of genes left: 10474
number of genes left: 10579
Retesting CNVs..
Retesting CNVs..
Retesting CNVs..
Retesting CNVs..
Retesting CNVs..
Finishing..
Finishing..
Finishing..
Finishing..
Finishing..
Fitted reference proportions:
NK:0.0017,Fibroblast:0.24,Macrophage:0.18,CD4+T:0.046,CD8+T:5.4e-07,Endothelial:0.11,Myeloid:0.016,Monocyte:0.013,Dendritic:0.004,Plasma:0.019,B:0.1,Epithelial:0.27
Fitted reference proportions:
NK:5.3e-07,Fibroblast:0.24,Macrophage:0.21,CD4+T:0.019,CD8+T:5.3e-07,Endothelial:0.089,Myeloid:0.013,Monocyte:0.015,Dendritic:0.0022,Plasma:0.031,B:0.053,Epithelial:0.33
number of genes left: 10475
number of genes left: 10703
Fitted reference proportions:
NK:0.0011,Fibroblast:0.27,Macrophage:0.2,CD4+T:0.018,CD8+T:5.5e-07,Endothelial:0.087,Myeloid:0.015,Monocyte:0.013,Dendritic:0.0026,Plasma:0.03,B:0.07,Epithelial:0.3
number of genes left: 10487
Finishing..
Finishing..
Finishing..
Making plots..
Evaluating CNV per cell ..
Expanding allelic states..
Building phylogeny ..
Using 10 CNVs to construct phylogeny
Using UPGMA tree as seed..
Iter 2 -40022.5589230836 140
Iter 3 -39408.3836591589 130
Iter 4 -39408.3836591589 100
Found 159 normal cells..
Iteration 2
Fitted reference proportions:
NK:5.1e-07,Fibroblast:0.26,Macrophage:0.19,CD4+T:0.022,CD8+T:5.1e-07,Endothelial:0.088,Myeloid:0.014,Monocyte:0.012,Dendritic:0.0021,Plasma:0.026,B:0.068,Epithelial:0.31
number of genes left: 10506
Fitted reference proportions:
NK:0.00063,Fibroblast:0.26,Macrophage:0.19,CD4+T:0.022,CD8+T:5.2e-07,Endothelial:0.089,Myeloid:0.015,Monocyte:0.013,Dendritic:0.0023,Plasma:0.028,B:0.068,Epithelial:0.31
number of genes left: 10497
Fitted reference proportions:
NK:0.001,Fibroblast:0.25,Macrophage:0.19,CD4+T:0.024,CD8+T:5.3e-07,Endothelial:0.09,Myeloid:0.015,Monocyte:0.014,Dendritic:0.0026,Plasma:0.028,B:0.07,Epithelial:0.31
number of genes left: 10522
Fitted reference proportions:
NK:0.0013,Fibroblast:0.25,Macrophage:0.19,CD4+T:0.025,CD8+T:5.5e-07,Endothelial:0.094,Myeloid:0.015,Monocyte:0.014,Dendritic:0.0029,Plasma:0.029,B:0.07,Epithelial:0.31
number of genes left: 10569
Fitted reference proportions:
NK:0.00076,Fibroblast:0.26,Macrophage:0.19,CD4+T:0.022,CD8+T:5.2e-07,Endothelial:0.088,Myeloid:0.015,Monocyte:0.013,Dendritic:0.0026,Plasma:0.028,B:0.069,Epithelial:0.31
number of genes left: 10490
Retesting CNVs..
Retesting CNVs..
Retesting CNVs..
Retesting CNVs..
Retesting CNVs..
Error: Must group by variables found in .data.

Column seg is not found.
Column sample is not found.
Backtrace:
█

├─numbat::run_numbat(...)
│ └─%>%(...)
├─numbat::run_group_hmms(...)
│ └─%>%(...)
├─dplyr::ungroup(.)
├─dplyr::mutate(., seg_start_index = min(snp_index), seg_end_index = max(snp_index))
├─dplyr::group_by(., seg, sample)
└─dplyr:::group_by.data.frame(., seg, sample)
└─dplyr::group_by_prepare(.data, ..., .add = .add, caller_env = caller_env())
Warning messages:
1: In mclapply(mc.cores = ncores, neighbours, function(tree) { :
scheduled core 2 did not deliver a result, all values of the job will be affected
2: In mclapply(mc.cores = ncores, neighbours, function(tree) { :
scheduled cores 1, 2, 3, 8 did not deliver results, all values of the jobs will be affected
3: The x argument of as_tibble.matrix() must have unique column names if .name_repair is omitted as of tibble 2.0.0.
Using compatibility .name_repair.
This warning is displayed once every 8 hours.
Call lifecycle::last_lifecycle_warnings() to see where this warning was generated.
4: In mclapply(bulks %>% split(.$sample), mc.cores = ncores, function(bulk) { :
scheduled cores 1, 2, 3, 4, 5 did not deliver results, all values of the jobs will be affected
Execution halted
slurmstepd: error: Detected 68192 oom-kill event(s) in step 805718.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.

teng-gao · 2022-05-18T14:44:38Z

Hmm. The last line seems to indicate that the jobs ran out of memory .. how much memory did you use for 12 cores? One caveat with the current implementation of Numbat is that it's quite memory intensive.

josegarciamanteiga · 2022-05-18T14:46:50Z

124GB! I thought the last line was due to the Error above found in one of the cores.
I could try and go further or even ask for a fixed amount of mem by core. What do you reccommend?
It finishes indeed the first iteration.
Thanks again
J

teng-gao · 2022-05-18T17:08:21Z

Yes, please check if giving it more memory would solve the issue. We will look into optimizing the memory usage soon.

josegarciamanteiga · 2022-05-18T17:39:25Z

Thanks. I'm trying with 20 cores and 32GB mem-per-cpu. I'll let you know Jose

…

-------------------------------------- Jose M. Garcia Manteiga PhD Computational Biologist Center for Translational Genomics and BioInformatics Dibit2-Basilica, 4A3 San Raffaele Scientific Institute Via Olgettina 58, 20132 Milano (MI), Italy Tel: +39-02-2643-9211 e-mail: ***@***.*** Il giorno mer 18 mag 2022 alle ore 19:08 Teng Gao ***@***.***> ha scritto:

Yes, please check if giving it more memory would solve the issue. We will look into optimizing the memory usage soon. — Reply to this email directly, view it on GitHub <#33 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2UOMNX2GEGHPEXHGGOTKLVKUPZFANCNFSM5WIT4XOA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

teng-gao · 2022-07-05T14:03:02Z

Hello @josegarciamanteiga,

We made some improvements to the runtime and memory usage in Version 0.1.3, which should be overall twice as fast and less memory intensive. Do let me know if the memory problem persists since there’s still a step that uses mclapply for parallelization.

Thanks,
Teng

josegarciamanteiga · 2022-07-05T15:31:30Z

Thanks! I solved it with more memory not by cpu but in total, to let it manage it. But I will next try the new version for new results. Best Jose

…

-------------------------------------- Jose M. Garcia Manteiga PhD Computational Biologist Center for Translational Genomics and BioInformatics Dibit2-Basilica, 4A3 San Raffaele Scientific Institute Via Olgettina 58, 20132 Milano (MI), Italy Tel: +39-02-2643-9211 e-mail: ***@***.*** Il giorno mar 5 lug 2022 alle ore 16:03 Teng Gao ***@***.***> ha scritto:

Hello @josegarciamanteiga <https://github.com/josegarciamanteiga>, We made some improvements to the runtime and memory usage in Version 0.1.3 <https://kharchenkolab.github.io/numbat/news/index.html#numbat-0-1-3---07022022>, which should be overall twice as fast and less memory intensive. Do let me know if the memory problem persists since there’s still a step that uses mclapply for parallelization. Thanks, Teng — Reply to this email directly, view it on GitHub <#33 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2UOMIUB64CUMLBLX6R3HTVSQ6CFANCNFSM5WIT4XOA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

teng-gao closed this as completed Jul 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error: Must group by variables found in `.data`. * Column `seg` is not found. * Column `sample` is not found. When analyzing two samples bound together #33

Error: Must group by variables found in `.data`. * Column `seg` is not found. * Column `sample` is not found. When analyzing two samples bound together #33

josegarciamanteiga commented May 18, 2022

josegarciamanteiga commented May 18, 2022

teng-gao commented May 18, 2022

josegarciamanteiga commented May 18, 2022

teng-gao commented May 18, 2022

josegarciamanteiga commented May 18, 2022 via email

teng-gao commented Jul 5, 2022

josegarciamanteiga commented Jul 5, 2022 via email

Error: Must group by variables found in .data. * Column seg is not found. * Column sample is not found. When analyzing two samples bound together #33

Error: Must group by variables found in .data. * Column seg is not found. * Column sample is not found. When analyzing two samples bound together #33

Comments

josegarciamanteiga commented May 18, 2022

josegarciamanteiga commented May 18, 2022

teng-gao commented May 18, 2022

josegarciamanteiga commented May 18, 2022

teng-gao commented May 18, 2022

josegarciamanteiga commented May 18, 2022 via email

teng-gao commented Jul 5, 2022

josegarciamanteiga commented Jul 5, 2022 via email

Error: Must group by variables found in `.data`. * Column `seg` is not found. * Column `sample` is not found. When analyzing two samples bound together #33

Error: Must group by variables found in `.data`. * Column `seg` is not found. * Column `sample` is not found. When analyzing two samples bound together #33