Valid V3 reads ending up in the unmapped files #228

ArtPoon · 2015-07-15T17:29:06Z

See sample 63249A-V3.3 from run 150529, it has too many unmapped reads. When it was processed in production under Micall 6.8, it had 136199 unmapped reads. When Art processed it on his workstation, it only had 43125 unmapped reads. Try to see what the difference is, and fix the problem.

Updated Summary

There were several bugs in the G2P clipping, so we fixed those. The Micall 7.0 pipeline now calls 19% X4 compared to the 454 calling 23% in this sample.

compare 454 data with Micall
compare Ruby G2P with Python, and fix a bunch of bugs
Run the whole 150529 run with Micall 7.0
Get Conan to compare 454 calls with new Micall results.
Remove the is_ruby_compatible flag from sam_g2p.py.
Process a bunch of other runs and get Conan to compare results.
- 150130
- 150311
- 150316
- 150415
- 150612

The text was updated successfully, but these errors were encountered:

ArtPoon · 2015-07-15T18:06:19Z

I was unable to reproduce this when re-running this sample locally.
The original remap_counts file (pipeline release 6.8) read:

sample_name,type,count,filtered_count
63249A-V3-3-V3LOOP_S93.censored1.fastq,raw,697420,
63249A-V3-3-V3LOOP_S93.censored1.fastq,prelim HLA-B-seed,2,2
63249A-V3-3-V3LOOP_S93.censored1.fastq,prelim HCV-6V-seed,1,0
63249A-V3-3-V3LOOP_S93.censored1.fastq,prelim HIV1B-env-seed,12149,12129
63249A-V3-3-V3LOOP_S93.censored1.fastq,prelim HCV-1A-H77-seed,2,2
63249A-V3-3-V3LOOP_S93.censored1.fastq,remap HIV1B-env-seed,561221,
63249A-V3-3-V3LOOP_S93.censored1.fastq,unmapped,136199,

Using master branch on my workstation (commit 67566cd) I obtained the following:

type,count,filtered_count
raw,697420,
prelim HLA-B-seed,2,2
prelim HCV-6V-seed,1,0
prelim HIV1B-env-seed,12149,12129
prelim HCV-1A-H77-seed,2,2
remap HIV1B-env-seed,654295,
unmapped,43125,

ArtPoon · 2015-07-15T18:10:39Z

I just checked out release 6.8.1 on my workstation (commit aa76593) and still could not reproduce this. Hopefully it is not stochastic. I will repeat to check.

ArtPoon · 2015-07-15T18:14:35Z

On Don's suggestion, copying the required sample files to a temp folder on cluster and running 6.8 from there. It may be a samtools or bowtie2 versioning issue.

ArtPoon · 2015-07-15T19:57:42Z

Reproduced issue on cluster. Either samtools or bowtie2 is at fault.
Attempt re-run with different version of samtools version 0.1.19. If this works, then reprocess all MOTIVATE V3 samples with pipeline release 6.8.1 but patched to call this version of samtools.

ArtPoon · 2015-07-15T22:30:39Z

I ran the development pipeline with remap.py using samtools-0.1.19 but the same number of reads remained unmapped. Now trying with samtools-1.1

ArtPoon · 2015-07-15T23:01:18Z

samtools-1.1 failed to remedy this issue. Now checking if it is a bowtie2 versioning problem.

ArtPoon · 2015-07-16T18:35:17Z

Running bowtie2-2.2.3 (same as my workstation) did not resolve the problem either: 136199 reads unmapped.

ArtPoon · 2015-07-16T21:19:38Z

Calling remap.py directly on the command line rather than through run_processor.py gets the correct result. This does not appear to be due to the latter running processes on a compute node -- if I call remap.py via bpsh on a compute node then I still get the correct result.

ArtPoon · 2015-07-23T20:56:37Z

I just pulled the master branch in the development folder and now calling remap.py directly gives me 136199 unmapped reads. I have tried adjusting the maximum pileup size and this has no effect.

donkirkby · 2015-07-24T17:01:27Z

I just reproduced the problem on my workstation. I ran the code from the v6.8.1 tag with bowtie2 2.2.1 and samtools 0.1.18. I saw 136199 unmapped reads in sample 63249A-V3-3.

ArtPoon · 2015-07-24T21:06:08Z

Funny - I've reverted back to 6.8.1 in the development folder and now calling remap.py directly yields 136199 unmapped reads.

donkirkby · 2015-08-20T17:18:33Z

I ran it once on my workstation with standalone Micall 7.0 and bowtie2 2.2.1, and only saw 2800 unmapped reads, so that's a good sign. However, when I ran it with the data that had been censored for high error rates in the phiX data, I saw 108352 unmapped reads.

ArtPoon · 2015-08-20T20:56:14Z

Going back to the pipeline 6.8 results: the X4 variant of interest appears 5185 times in the file 63249A-V3-3-V3LOOP_S93.unmapped1.fastq (exact match), which contains 18905 reads in total (27%). It appears in 104107 reads in the raw FASTQ R1 file (348710 reads total, so 30%).

donkirkby · 2015-08-20T21:49:55Z

I ran the 63249A sample with Micall 7.0, and looked at all the variants that were successfully called in the g2p.csv file. Both with the censored data and the uncensored data, I counted 73% called as X4 (fpr score ≤ 3.5).

With the uncensored data, 121838 reads were called X4 out of 166530 successful calls.
With the censored data, 101903 reads were called X4 out of 139697 successful calls.

It looks like the censoring is affecting the X4 reads at the same rate as the other reads, so the only remaining question is why the Micall results disagree with the 454 results. According to Conan, the 454 called 23% of the reads X4.

When I calculated the same totals for the g2p.csv file from Micall 6.8 using censored data, I found 534 reads called X4 out of 99612 successful calls. That's less than 1%.

donkirkby · 2015-09-03T22:24:13Z

After a few bug fixes, I reran the 63249A sample with Micall 7.0 as above. I counted the following:

19% called as X4 in the uncensored data (35215 out of 182600 successful calls)
19% called as X4 in the censored data (24259 out of 128574 successful calls)

ArtPoon added the bug label Jul 15, 2015

ArtPoon self-assigned this Jul 15, 2015

ArtPoon added this to the 7.0 - standalone milestone Jul 15, 2015

ArtPoon added the critical label Jul 15, 2015

donkirkby mentioned this issue Jul 15, 2015

Drop in coverage #219

Closed

2 tasks

donkirkby added the in progress label Jul 15, 2015

donkirkby assigned donkirkby and unassigned ArtPoon Aug 20, 2015

donkirkby added a commit that referenced this issue Sep 3, 2015

Fix bug in G2P clipping range for issue #228.

42f5a02

donkirkby added a commit that referenced this issue Sep 18, 2015

Remove Ruby-compatible code from sam_g2p.py for issue #228.

fa2c6e5

donkirkby closed this as completed Sep 21, 2015

donkirkby removed the in progress label Sep 21, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Valid V3 reads ending up in the unmapped files #228

Valid V3 reads ending up in the unmapped files #228

ArtPoon commented Jul 15, 2015

ArtPoon commented Jul 15, 2015

ArtPoon commented Jul 15, 2015

ArtPoon commented Jul 15, 2015

ArtPoon commented Jul 15, 2015

ArtPoon commented Jul 15, 2015

ArtPoon commented Jul 15, 2015

ArtPoon commented Jul 16, 2015

ArtPoon commented Jul 16, 2015

ArtPoon commented Jul 23, 2015

donkirkby commented Jul 24, 2015

ArtPoon commented Jul 24, 2015

donkirkby commented Aug 20, 2015

ArtPoon commented Aug 20, 2015

donkirkby commented Aug 20, 2015

donkirkby commented Sep 3, 2015

Valid V3 reads ending up in the unmapped files #228

Valid V3 reads ending up in the unmapped files #228

Comments

ArtPoon commented Jul 15, 2015

Updated Summary

ArtPoon commented Jul 15, 2015

ArtPoon commented Jul 15, 2015

ArtPoon commented Jul 15, 2015

ArtPoon commented Jul 15, 2015

ArtPoon commented Jul 15, 2015

ArtPoon commented Jul 15, 2015

ArtPoon commented Jul 16, 2015

ArtPoon commented Jul 16, 2015

ArtPoon commented Jul 23, 2015

donkirkby commented Jul 24, 2015

ArtPoon commented Jul 24, 2015

donkirkby commented Aug 20, 2015

ArtPoon commented Aug 20, 2015

donkirkby commented Aug 20, 2015

donkirkby commented Sep 3, 2015