Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Must group by variables found in .data. * Column seg is not found. * Column sample is not found. When analyzing two samples bound together #33

Closed
josegarciamanteiga opened this issue May 18, 2022 · 7 comments

Comments

@josegarciamanteiga
Copy link

Hi again,
As you suggested, I used pileup_and_phase.R with two samples using ",". Then I used cbind and rbind on count matrices and allele dataframes by substituting before the "-1" suffix on barcodes in the second files for a "-2".
The code was going smoothly up to the fifth 'Retesting CNVs.." where it threw this:

Error: Must group by variables found in .data.
*** Column seg is not found.

  • Column sample is not found.**
    Backtrace:
  1. ├─numbat::run_numbat(...)
  2. │ └─%>%(...)
  3. ├─numbat::run_group_hmms(...)
  4. │ └─%>%(...)
  5. ├─dplyr::ungroup(.)
  6. ├─dplyr::mutate(., seg_start_index = min(snp_index), seg_end_index = max(snp_index))
  7. ├─dplyr::group_by(., seg, sample)
  8. └─dplyr:::group_by.data.frame(., seg, sample)
  9. └─dplyr::group_by_prepare(.data, ..., .add = .add, caller_env = caller_env())

Thanks again!
Jose

@josegarciamanteiga
Copy link
Author

Let me give the complete log:

Running under parameters:
t = 0.001
alpha = 1e-04
gamma = 20
min_cells = 20
init_k = 3
sample_size = 1e+05
max_cost = 2316.3
max_iter = 2
min_depth = 0
use_loh = auto
multi_allelic = TRUE
min_LLR = 50
max_entropy = 0.6
skip_nj = FALSE
exclude_normal = FALSE
diploid_chroms =
ncores = 12
common_diploid = TRUE
Input metrics:
7721 cells
Approximating initial clusters using smoothed expression ..
number of genes left: 10579
Iteration 1
Fitted reference proportions:
NK:5.3e-07,Fibroblast:0.24,Macrophage:0.21,CD4+T:0.019,CD8+T:5.3e-07,Endothelial:0.089,Myeloid:0.013,Monocyte:0.015,Dendritic:0.0022,Plasma:0.031,B:0.053,Epithelial:0.33
number of genes left: 10475
Fitted reference proportions:
NK:0.0017,Fibroblast:0.24,Macrophage:0.18,CD4+T:0.046,CD8+T:5.4e-07,Endothelial:0.11,Myeloid:0.016,Monocyte:0.013,Dendritic:0.004,Plasma:0.019,B:0.1,Epithelial:0.27
Fitted reference proportions:
NK:0.0011,Fibroblast:0.27,Macrophage:0.2,CD4+T:0.018,CD8+T:5.5e-07,Endothelial:0.087,Myeloid:0.015,Monocyte:0.013,Dendritic:0.0026,Plasma:0.03,B:0.07,Epithelial:0.3
number of genes left: 10703
number of genes left: 10487
Fitted reference proportions:
NK:0.0014,Fibroblast:0.25,Macrophage:0.19,CD4+T:0.028,CD8+T:5.5e-07,Endothelial:0.094,Myeloid:0.015,Monocyte:0.014,Dendritic:0.003,Plasma:0.029,B:0.071,Epithelial:0.3
Fitted reference proportions:
NK:0.0011,Fibroblast:0.25,Macrophage:0.2,CD4+T:0.019,CD8+T:5.4e-07,Endothelial:0.089,Myeloid:0.014,Monocyte:0.014,Dendritic:0.0026,Plasma:0.031,B:0.06,Epithelial:0.31
number of genes left: 10474
number of genes left: 10579
Retesting CNVs..
Retesting CNVs..
Retesting CNVs..
Retesting CNVs..
Retesting CNVs..
Finishing..
Finishing..
Finishing..
Finishing..
Finishing..
Fitted reference proportions:
NK:0.0017,Fibroblast:0.24,Macrophage:0.18,CD4+T:0.046,CD8+T:5.4e-07,Endothelial:0.11,Myeloid:0.016,Monocyte:0.013,Dendritic:0.004,Plasma:0.019,B:0.1,Epithelial:0.27
Fitted reference proportions:
NK:5.3e-07,Fibroblast:0.24,Macrophage:0.21,CD4+T:0.019,CD8+T:5.3e-07,Endothelial:0.089,Myeloid:0.013,Monocyte:0.015,Dendritic:0.0022,Plasma:0.031,B:0.053,Epithelial:0.33
number of genes left: 10475
number of genes left: 10703
Fitted reference proportions:
NK:0.0011,Fibroblast:0.27,Macrophage:0.2,CD4+T:0.018,CD8+T:5.5e-07,Endothelial:0.087,Myeloid:0.015,Monocyte:0.013,Dendritic:0.0026,Plasma:0.03,B:0.07,Epithelial:0.3
number of genes left: 10487
Finishing..
Finishing..
Finishing..
Making plots..
Evaluating CNV per cell ..
Expanding allelic states..
Building phylogeny ..
Using 10 CNVs to construct phylogeny
Using UPGMA tree as seed..
Iter 2 -40022.5589230836 140
Iter 3 -39408.3836591589 130
Iter 4 -39408.3836591589 100
Found 159 normal cells..
Iteration 2
Fitted reference proportions:
NK:5.1e-07,Fibroblast:0.26,Macrophage:0.19,CD4+T:0.022,CD8+T:5.1e-07,Endothelial:0.088,Myeloid:0.014,Monocyte:0.012,Dendritic:0.0021,Plasma:0.026,B:0.068,Epithelial:0.31
number of genes left: 10506
Fitted reference proportions:
NK:0.00063,Fibroblast:0.26,Macrophage:0.19,CD4+T:0.022,CD8+T:5.2e-07,Endothelial:0.089,Myeloid:0.015,Monocyte:0.013,Dendritic:0.0023,Plasma:0.028,B:0.068,Epithelial:0.31
number of genes left: 10497
Fitted reference proportions:
NK:0.001,Fibroblast:0.25,Macrophage:0.19,CD4+T:0.024,CD8+T:5.3e-07,Endothelial:0.09,Myeloid:0.015,Monocyte:0.014,Dendritic:0.0026,Plasma:0.028,B:0.07,Epithelial:0.31
number of genes left: 10522
Fitted reference proportions:
NK:0.0013,Fibroblast:0.25,Macrophage:0.19,CD4+T:0.025,CD8+T:5.5e-07,Endothelial:0.094,Myeloid:0.015,Monocyte:0.014,Dendritic:0.0029,Plasma:0.029,B:0.07,Epithelial:0.31
number of genes left: 10569
Fitted reference proportions:
NK:0.00076,Fibroblast:0.26,Macrophage:0.19,CD4+T:0.022,CD8+T:5.2e-07,Endothelial:0.088,Myeloid:0.015,Monocyte:0.013,Dendritic:0.0026,Plasma:0.028,B:0.069,Epithelial:0.31
number of genes left: 10490
Retesting CNVs..
Retesting CNVs..
Retesting CNVs..
Retesting CNVs..
Retesting CNVs..
Error: Must group by variables found in .data.

  • Column seg is not found.
  • Column sample is not found.
    Backtrace:
  1. ├─numbat::run_numbat(...)
  2. │ └─%>%(...)
  3. ├─numbat::run_group_hmms(...)
  4. │ └─%>%(...)
  5. ├─dplyr::ungroup(.)
  6. ├─dplyr::mutate(., seg_start_index = min(snp_index), seg_end_index = max(snp_index))
  7. ├─dplyr::group_by(., seg, sample)
  8. └─dplyr:::group_by.data.frame(., seg, sample)
  9. └─dplyr::group_by_prepare(.data, ..., .add = .add, caller_env = caller_env())
    Warning messages:
    1: In mclapply(mc.cores = ncores, neighbours, function(tree) { :
    scheduled core 2 did not deliver a result, all values of the job will be affected
    2: In mclapply(mc.cores = ncores, neighbours, function(tree) { :
    scheduled cores 1, 2, 3, 8 did not deliver results, all values of the jobs will be affected
    3: The x argument of as_tibble.matrix() must have unique column names if .name_repair is omitted as of tibble 2.0.0.
    Using compatibility .name_repair.
    This warning is displayed once every 8 hours.
    Call lifecycle::last_lifecycle_warnings() to see where this warning was generated.
    4: In mclapply(bulks %>% split(.$sample), mc.cores = ncores, function(bulk) { :
    scheduled cores 1, 2, 3, 4, 5 did not deliver results, all values of the jobs will be affected
    Execution halted
    slurmstepd: error: Detected 68192 oom-kill event(s) in step 805718.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.

@teng-gao
Copy link
Collaborator

Hmm. The last line seems to indicate that the jobs ran out of memory .. how much memory did you use for 12 cores? One caveat with the current implementation of Numbat is that it's quite memory intensive.

@josegarciamanteiga
Copy link
Author

124GB! I thought the last line was due to the Error above found in one of the cores.
I could try and go further or even ask for a fixed amount of mem by core. What do you reccommend?
It finishes indeed the first iteration.
Thanks again
J

@teng-gao
Copy link
Collaborator

Yes, please check if giving it more memory would solve the issue. We will look into optimizing the memory usage soon.

@josegarciamanteiga
Copy link
Author

josegarciamanteiga commented May 18, 2022 via email

@teng-gao
Copy link
Collaborator

teng-gao commented Jul 5, 2022

Hello @josegarciamanteiga,

We made some improvements to the runtime and memory usage in Version 0.1.3, which should be overall twice as fast and less memory intensive. Do let me know if the memory problem persists since there’s still a step that uses mclapply for parallelization.

Thanks,
Teng

@josegarciamanteiga
Copy link
Author

josegarciamanteiga commented Jul 5, 2022 via email

@teng-gao teng-gao closed this as completed Jul 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants