Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ferocious memory usage #23

Closed
cartographerJ opened this issue Mar 11, 2022 · 10 comments
Closed

Ferocious memory usage #23

cartographerJ opened this issue Mar 11, 2022 · 10 comments

Comments

@cartographerJ
Copy link

During iteration 1, memory usage is pretty standard but during iteration 2, in the step "Evaluating CNV per cell .." memory usage goes through the roof. Is there any sort of workaround for this? Is this behavior expected?

@evanbiederstedt
Copy link
Contributor

evanbiederstedt commented Mar 11, 2022

Hi @cartographerJ

Thanks for using numbat. Yes, we've been investigating this elsewhere: #13

We hope to have a fix soon; we'll keep you updated.

Best, Evan

(Sorry, the close was a mistake with my cursor)

@cartographerJ
Copy link
Author

Great thanks for looking into it

Just for reference even for a low number of cells,

Running under parameters: t = 0.001 alpha = 1e-04 gamma = 20 min_cells = 20 init_k = 3 sample_size = 1e+05 max_cost = 442.5 max_iter = 2 min_depth = 0 use_loh = auto multi_allelic = TRUE min_LLR = 50 max_entropy = 0.6 skip_nj = FALSE exclude_normal = FALSE diploid_chroms = ncores = 16 common_diploid = TRUE Input metrics: 1475 cells Approximating initial clusters using smoothed expression .. number of genes left: 9964

I am getting memory usage during the Evaluating CNVs... step:

on iteration 1:

Screen Shot 2022-03-11 at 12 48 30 PM

on iteration 2:
Screen Shot 2022-03-11 at 12 57 18 PM

I was able to force this to run with ~300gb of memory.

@teng-gao
Copy link
Collaborator

Thanks, you're using the development version, correct?

@cartographerJ
Copy link
Author

I am using main, should I try the devel branch?

@teng-gao
Copy link
Collaborator

I think they're the same in terms of memory - currently trying to track down why this is

@teng-gao
Copy link
Collaborator

Hi @cartographerJ it seems that we've found what causes this ferocious behavior. It turned out to be a bad interaction between OpenMP and mclapply (no clue why..). What solved the problem for us is to set export OMP_NUM_THREADS=1 in your shell before starting R/Rscript. Could you try this and let us know if this solves the problem on your end?

@evanbiederstedt
Copy link
Contributor

Hi @cartographerJ

Yes, please try again after reinstalling the package. I'm guessing the BLAS/LAPACK operations (especially with OpenBLAS) and data.table operations which causing the memory to surge.

We've added some function calls which should explicitly set the OMP threads to 1.

Let us know if this helps at all.
Best, Evan

@cartographerJ
Copy link
Author

will test tomorrow and report back

@cartographerJ
Copy link
Author

I ran the devel version and seems to have fixed the issue!

@teng-gao
Copy link
Collaborator

Terrific! Thanks for testing this. Credit to @evanbiederstedt for the fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants