-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce memory consumption in prelim_map #393
Comments
The largest sample I found that stayed under 6GB of memory was D62195-HCV_S5 from the 26 Feb 2016 batch. Its steps had the following
Another interesting sample is 67182A-HIV_S5 from 20 Sep 2016. It is a smaller sample, but it used more than 6GB of memory on the v7.7 pipeline. |
prelim_map is loading all of the reads into memory before it writes them to the I'm pushing this back to the near future milestone, because Richard H. asked for #394 to get done before this. |
The largest sample in the 26 Feb 2016 run had a compressed FASTQ file of 800MB. As a workaround, we configured Kive to request 20GB for every driver script, and that worked. The most memory used was just over 10GB for the 800MB sample's aln2counts step. The longest elapsed time was 5h50m for the same sample's remap step. However, it used less than 1GB of memory. |
Here are a range of sample sizes to experiment with from the 13-Jun-2017.M04401 run:
The sizes are the sum of the two compressed FASTQ files. Presumably, the V3LOOP samples will behave differently from the HCV samples, because V3LOOP reads get mapped with the pairwise alignment script in
Memory consuming steps in
|
Add a utility for sorting SAM files. Add third-party license information to README.
Reduced memory usage in Here's the memory that each sample uses, after the changes:
Unsurprisingly, writing to disk is slower by up to a minute, but the memory usage now stays below 100MB. |
Next to tackle:
|
Reduced memory usage in Anyway, here's the memory that each sample now uses, along with the slightly slower times.
|
Next to tackle:
|
Reduced memory usage in
|
Also fix a bunch of warnings in remap.py.
Next to tackle:
|
Made some small improvements to |
Part of issue #393. Slight performance improvement to merge_pairs.
Part of #393. Fix some problems with QAI upload, such as removing HLA variants files. Start runs sorted by sample number, reversed.
Mixed HCV pipeline looks challenging. A lot of memory gets used by bowtie2 in the first step, but the actual failures when I ran some large samples were from sam2aln using more than 6G. Seems like I might be able to improve some of the steps, and then configure the rest with a higher memory limit. |
Here are the memory levels used by each step in the Mixed HCV pipeline on the 88160AMIDI_MidHCV sample:
Here are the memory levels used by the bigger 88160A_HCV sample:
It looks like We might look at removing the human genome in the future, but for this issue, I'm just going to work on the |
After removing the sorting and changing the grouping, |
We're trying to allocate enough memory for the pipeline steps when we run them under Slurm, and some of them failed prelim_map because it used more than 3GB on a large sample. I ran prelim_map on a small sample and tracked the memory usage reported by
top
. It reported 100MB used bybowtie2
, and 350MB used by the python process. Thebowtie2
memory was stable, but the python memory steadily climbed.The text was updated successfully, but these errors were encountered: