Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variant calling with paired-end reads fails #5

Closed
rene-rex opened this issue Aug 18, 2014 · 3 comments
Closed

Variant calling with paired-end reads fails #5

rene-rex opened this issue Aug 18, 2014 · 3 comments

Comments

@rene-rex
Copy link

Hello

I tried to follow the "Workflow Examples" in the wiki to do variant calling with two plant genotypes based on paired end reads. Here is the sequence of commands I used:

ctx63 build -m 150G -t 10 -k 61 --sample genotype1 --seq2 genotype1_r1.fastq:genotype1_r2.fastq genotype1.ctx

ctx63 build -m 150G -t 10 -k 61 --sample genotype2 --seq2 genotype2_r1.fastq:genotype2_r2.fastq genotype2.ctx

ctx63 build -m 150G -t 10 -k 61 --sample reference --seq reference.fa reference.ctx

ctx63 clean -m 150G -t 10 --out genotype1.clean.ctx genotype1.ctx
ctx63 clean -m 150G -t 10 --out genotype2.clean.ctx genotype2.ctx

ctx63 join -m 150G --out refAndSamples.clean.ctx reference.ctx genotype1.clean.ctx genotype2.clean.ctx

ctx63 inferedges -m 150G -t 10 refAndSamples.clean.ctx

ctx63 thread -m 150G -t 10 --seq2 genotype1_r1.fastq:genotype1_r2.fastq --out genotype1.ctp.gz refAndSamples.clean.ctx:1
ctx63 thread -m 150G -t 10 --seq2 genotype2_r1.fastq:genotype2_r2.fastq --out genotype2.ctp.gz refAndSamples.clean.ctx:2

Unfortunately, both read threading steps failed with this error:

[src/alignment/correct_aln_stats.c:137] Assert Failed correct_aln_stats_print_summary(): num_reads > 0
[15 Aug 2014 16:22:40-keZ] Assert Error

I tried to "fix" this by adding this code above line 135 in alignment/correct_aln_stats.c:

if(num_reads == 0)
{
status("[CorrectAln] Didn't see any SE reads");
}

After recompiling the threading runs smooth. However, now I am stuck at the contigs command

ctx63 contigs -m 150G -t 10 -p 1:genotype1.ctp.gz -p 2:genotype2.ctp.gz refAndSamples.clean.ctx

which yields:

[src/graph_paths/gpath_checks.c:143] Error gpath_load_sample_pop(): No point loading paths for colours other than --colour 0
[18 Aug 2014 09:42:17-gOC] Fatal Error

Any clues what went wrong?

@noporpoise
Copy link
Member

I'm travelling at the moment - will look at fixing num_reads > 0 issue on Monday. Sorry about that.

For the second issue, the contig command only pulls out contigs for a single sample at a time. So you probably want to do:

ctx63 contigs -m 150G -t 10 -c 1 -p 1:genotype1.ctp.gz --out contigs1.fa refAndSamples.clean.ctx
ctx63 contigs -m 150G -t 10 -c 2 -p 2:genotype1.ctp.gz --out contigs2.fa refAndSamples.clean.ctx

Hope that helps. Thank you for trying cortex.

Isaac

@rene-rex
Copy link
Author

Hi

Thanks for the answer. I tried the first command

ctx63 contigs -m 150G -t 10 -c 1 -p 1:genotype1.ctp.gz --out contigs1.fa refAndSamples.clean.ctx

but it produced another error:

[src/graph_paths/gpath_checks.c:107] Error graphs_gpaths_compatible(): Sample names don't match
refAndSamples.clean.ctx:0reference
refAndSamples.clean.ctx:2genotype2.tipclean.supclean7
[22 Aug 2014 08:19:32-JOC] Fatal Error

Is it comparing the names of different samples in the same file?

@noporpoise
Copy link
Member

I'm trying to resolve this issue at the moment. For now you should be able to work around it by only loading one sample at a time. This is not ideal, but I'll put up a fix shortly.

ctx63 contigs -m 150G -t 10 -c 0 -p genotype1.ctp.gz --out contigs1.fa refAndSamples.clean.ctx:1

noporpoise added a commit that referenced this issue Sep 2, 2014
Fixes issue #5

1. Only use seed kmers that are in the sample we are pulling out contigs for
2. Allow pooling colours when checking sample names match
3. graph_file_reader.c, gpath_reader.c: paths are const char* not char* now
4. db_alignment.c: keep track of kmer origin position or -1 if inferred kmer
5. correct_alignment.c: operates on internal dBAlignment and takes reads as input
6. Print VCF header from JSON input headers
7. gpath_load_sample_pop() now won't complain if you give multi-colour graph file
   and only ask to load a single colour from it.
8. assemble_contigs.c: When pulling out a contig, give end status as HitRepeat
   if GraphWalker ok but RepeatWalker stopped.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants