Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error TLEN infomation at hic sam module #139

Open
slbai01 opened this issue Aug 31, 2023 · 3 comments
Open

Error TLEN infomation at hic sam module #139

slbai01 opened this issue Aug 31, 2023 · 3 comments

Comments

@slbai01
Copy link

slbai01 commented Aug 31, 2023

I am running chromap: 0.2.5-r473 on a highly repetitive genome, but the TLEN information in the sam file looks wrong.

Since only one best match is output by default, I suspect that some errors in handling multiple matches result in no output of the read matching information corresponding to TLEN.

Below is the information about several pairs of reads.

SRR12034698.8   115     chr1D   165570658       20      150M    =       164309919       -15705  GAATATTTTTTCTGACCATACATGCTCGGTCCGCCGAAGTTCTACGAGGGTAGCACTGTCCACTCGGACGATCGCCCAAATCATTACCTGAAGTCATCTTCAGGACTGCAAAAGGGTGAAAACGACACTCCTCTACGGATACACTTGGCA  -7A--FF-F<7-<F7AAAF7A-<A<F---<A-AF7FFF<-A--AAJJFJJJJJFFFJFJ<A-7AAAJJJJ7<JJA7AFJFFJF7FA<-JJJJJA<A7FA-AJJJ<FJFJJJJFJJJJJ<JAAFA-JJJF-7F<FAFJF-JFJJFJFFFAA  NM:i:1  MD:Z:39A110
SRR12034698.8   179     chr1D   164309919       20      48M     =       165570658       -15705  TTCTCGATGTGATCAACAGGTTGATNNAATGGNTGGANNNNCTNAGNG        7-A7--777---<FF7<7<J<<7JJ##JJJAJ#FAJF####7-#-)#A        NM:i:1  MD:Z:A47
SRR12034698.15  115     chr2B   457324039       15      150M    =       468582229       -14152  AAGTCTCAATCTGGATACATATTGAACCTGGGAGCAATTAGCTAGAGTAGCTCCGTGTAGAGCATTGTAGACATAGAATATTTGCAAAATGCATACGGCTCTGAATGTGGCAGACCCGTTGACTAAACTTCTCTCACGAGCAAAACATGA  JJA7FJJA<-A<-<JJJFJFFJJJA--<FJAF7F<JJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJAJJJJFJJJJJJJJJJJJJJJJJJFFJJFJJJJJJFFFAA  NM:i:1  MD:Z:26A123
SRR12034698.15  179     chr2B   468582229       15      104M    =       457324039       -14152  TTGGGACACTTCTATTTATATGCATNNAGGTANCGACNCNNTCNAGNAANCCAAATTAGATGTGCTTCAAAGTCAACTTGACAAGTTCAAGATGAAGGACGGTG        AJFF<-JF<JF7FFAFA<<--JF<-##<-F<<#)<7-#J##AA#JA#-F#JJJJJJJJJJJFAFFJJJJJJJJJJJJJJJJJJJJJF-JFJJJJJJAJFFJJJJ        NM:i:2  MD:Z:G2T100

command line:

chromap -i -k 27 -w 14 -r $contigsFasta -o contigs.index
chromap --preset hic -r $contigsFasta -x $contigsChromapIndex -1 $r1Reads -2 $r2Reads --SAM -o aligned.$library_name.sam -t $thread
@haowenz
Copy link
Owner

haowenz commented Sep 1, 2023

Is there a reason you change the default k and w values for index?

For the TLEN issue, @mourisl can you take a look when you get time? Thanks!

@slbai01
Copy link
Author

slbai01 commented Sep 1, 2023

The genome is too large, and the default parameters can't complete the index. #9 has described this question.

@mourisl
Copy link
Collaborator

mourisl commented Sep 6, 2023

Hi @slbai01, sorry for the delayed reply. The TLEN issue could be due to the data type overflow. I have pushed an updates to the li_dev5 branch, could you please checkout that branch and give it a try? I think the split-alignment mode of Hi-C data, the sign of TLEN does not make much sense, and the length may be off by the read length depending on the strand. But at least, now the value of TLEN is reasonable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants