Mask homologous sequences in query fasta(s) by kmers (k≤31) of subject fasta
git clone https://github.com/slw287r/kmsk.git
cd kmsk
make
- Output masked query fastas
kmsk -s t2t.fna -p t2tmsk -o . -t 8 562.fna 1280.fna 5476.fna
# output
562.t2tmsk.fna
1280.t2tmsk.fna
5476.t2tmsk.fna
- Output masked region bed file
kmsk -r -s t2t.fna -p t2tmsk -o . -t 8 562.fna 1280.fna 5476.fna
# output
562.t2tmsk.bed
1280.t2tmsk.bed
5476.t2tmsk.bed
* ~40GB memory is used for the Homo sapiens subject.