-
Notifications
You must be signed in to change notification settings - Fork 23
SB BLAST
BLAST is a local alignment algorithm commonly used to search large collections of sequences for likely homologs to a query sequence. The SeqBuddy blast tool can search a pre-existing blast database or a set of subject sequences in any supported format, returning any matches in their entirety. This is a departure from the normal BLAST output, which generally returns local alignment fragments. For reference, the actual BLAST statistics are streamed to standard error so you know how significant and extensive matches are.
The blastn, blastp, makeblastdb, and blastdbcmd binaries, from the [NCBI C++ toolkit] (http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/) must be present in your system PATH.
If searching pre-made BLAST databases, the -parse_seqids option must have been used when calling the makeblastdb
program, for example:
$: makeblastdb -in path/to/fasta_file -out db_name -dbtype {nucl, prot} -parse_seqids
BLAST databases consist of 6 separate files; provide a relative or absolute path to any of these files or the base name of all files.
SeqBuddy calls blastn/p with the following parameters:
$: blastn -db database -query in_file.fa -out temp.txt -num_threads 4 -evalue 0.01 -outfmt 6
The BLAST programs have a large number of optional parameters, however, and these can be injected into the command used by SeqBuddy. The only commands that you can not change are "db", "query", "subject", "out", and "outfmt"; otherwise, pass all parameters in as a single double-quote enclosed argument (see example 3)
In the following examples, an assembled transcriptome from the millipede species Abacion magnum is being searched for pannexin sequences, using the known Drosophila sequences as a query.
#NEXUS
begin data;
dimensions ntax=8 nchar=316;
format datatype=protein missing=? gap=-;
matrix
'Dme-Panxδ3' -----GFI---K----IDNMVFRCHYRITAILFTC-CIIVTANNLIGDPISCI--IPMHVINTFCWITYTYTV---A--GPGLE-K--HSYYQWVPFVLFFQGLMFYVPHWVWKM-D-GKIRMITG--VDDRDRIL-KYFVNNT--HNGYSFYFFCELLNFINVIVNIFMVDKFLGGAFMSYGTDVLKFSNMDQ-DRFDPMIEIFPRLTKCTFHKFGPSGSVQKHDTLCVLALNILNEKIYIFLWFWFIILATISGVAVLYSVVI---TR-TIR----------K--EGDFLILHFLSQNLSTRSYSDML-Q----
'Dme-Panxδ7' --L--SV----R-Q-RIDNIVFKLHYRWTVILLVA-TLLITSRQYIGEHIQCL--VVSPVINTFCFFTPTF-VD--P---PGI--D-RHAYYQWVPFVLFFQALCFYIPHALWKW-EGGRIKALVK--LG-MERVKD---IRDM--RLNWG-HVFAEVLNLINLLLQITWTNRFLGGQFLTLG------HALKN-RSDEVV---FPKITKCKFHKFGDSGSIQMHDALCVMALNIMNEKIYIILWFWYAFLLIVTVLGLLWRLCF---VR-WSL----------P-LASNWMFLFFLRSNLS-----E-L----DN
'Dme-Panxδ2' MDVFGSVKGLLKIDQV-DNNVFRMHYKATVIILIAFSLLVTSRQYIGDPIDCIVEIPLGVMDTYCWIYSTFTVPEGRDVQP--GSEKYHKYYQWVCFVLFFQAILFYVPRYLWKSWEGGRLKMLVDLSVNDKDRKIVDYFG-NLNRHNFYAFFFVCEALNFVNVIGQIYFVDFFLDGEFSTYGSDVLKFTELEPDERIDPMARVFPKVTKCTFHKYGPSGSVQTHDGLCVLPLNIVNEKIYVFLWFWFIILSIMSI-SLIYRIAVAPKLRHLLLRARSRAESEVEVAIGDWFLLYQLGKNIDPLIYKEVISDLEMG
'Dme-Panxδ5' MSAVKPLSKYLQFKIRIYDSVFTIHSRCTVVILLTCSLLLSARQYFGDPIQCI-S-EEKNIESYCWTMGTYYNEASIAE--GVEIRQYLRYYQWVIILLLFQSFVFYFPSCLWKVWEGRRLKQLCEVDNTRRM--LVKYFDMHFC----YMAYVFCEVLNFLISVVNIIVLEVFLNGFWSKYLRALW-------DRWV-SV---FPKIAKCELKF-GGSGTANVMDNLCILPLNILNEKIFVFLWAWFL-LALMSGLNLLCRLAICSRLREQMIRTKRHVKRALDLTIGDWFLMMKVSVNVNPMLFRDLMQEL---
'Dme-Panxδ6' MAAVKPLSNYLRLKVRIYDPIFTLHSKCTIVILLTCTFLLSAKQYFGEPILCL-S-SERQADSYCWTMGTYWNEQSIAE--GVETRMYLRYYQWVFMILLFQSLLFYFPSFLWKVWEGQRMEQLCEVDRTRQM--LTRYFPIHWC----YSIYAFCELLNVFISILNFWLMDVVFNGFWYKYIHALW-------NLWM-RV---FPKVAKCEFVY-GPSGTPNIMDILCVLPLNILNEKIFAVLYVWFL-FALLAIMNILYRLLICCPLRLQLLNPKSHVREVLSAGYGDWFVLMCVSINVNPTLFRELLEQL--D
'Dme-Panxδ4' MAAVKPLSKYLQFKVHIYDAIFTLHSKVTVALLLACTFLLSSKQYFGDPIQCF-G-D-KDMDAFCWIYGAYL-QCAVSK--VVEN--YITYYQWVVLVLLLESFVFYMPAFLWKIWEGGRLKHLCDFKRTHRV--LVNYFETHFR----YFVYVFCEILNLSISILNFLLLDVFFGGFWGRYRNALY-------NQWI-AV---FPKCAKCEYKG-GPSGSSNIYDYLCLLPLNILNEKIFAFLWIWFI-LAMLISLKFLYRLAVLYPMRLQLLRPKKHLQVALNCSFGDWFVLMRVGNNISPELFRKLLEEL---
'Dme-Panxδ1' YKLLGSLKSYLKWQIQTDNAVFRLHNSFTTVLLLTCSLIITATQYVGQPISCIVGVP-HVVNTFCWIHSTFTMPDRREVHPGVDF-KYYTYYQWVCFVLFFQAMACYTPKFLWNKFEGGLMRMIVGLNITRKRDALLDYLIKHVKRHKLY-AYWACEFLCCINIIVQMYLMNRFFDGEFLSYGTNIMKLSDVPQEQRVDPMVYVFPRVTKCTFHKYGPSGSLQKHDSLCILPLNIVNEKTYVFIWFWFWILLVLLGL--VFRCIIFPKFRPRLLNASNRIPMECRLDIGDWWLIYMLGRNLDPVIYKDVMSEFQVP
'Dme-Panxδ8' LDIFRGLKNLVKVSVKTDSIVFRLHYSITVMILMSFSLIITTRQYVGNPIDCVTDIP-DVLNTYCWIQSTYTLKSLVSVYPGIGNKKHYKYYQWVCFCLFFQAILFYTPRWLWKSWEGGKIHALIDLDISEKKKLLLDYLWENLRYHNWW-AYYVCELLALINVIGQMFLMNRFFDGEFITFGLKVIDYMETDQEDRMDPMIYIFPRMTKCTFFKYGSSGEVEKHDAICILPLNVVNEKIYIFLWFWFILLTFLTLLTLIYRVIIFPRMRVYLFRMRFRVRRDIEIKMGDWFLLYLLGENIDTVIFRDVVQDLRL-
;
end;
$: ls path/to/blastdb
>>> Abacion_magnum.pex Abacion_magnum.phr Abacion_magnum.pin Abacion_magnum.pog
>>> Abacion_magnum.psd Abacion_magnum.psi Abacion_magnum.psq
Use a pre-made blast database to search for matches
$: sb Drosophila.nex -bl /path/to/blastdb/Abacion_magnum
Running...
blastp -num_threads 4 -evalue 0.01
# ######################## BLAST results ######################## #
Dme-Panxδ3 4086 38.344 326 129 10 3 256 18 343 3.12e-73 226
Dme-Panxδ3 5440 50.279 179 71 5 74 234 1 179 5.55e-57 180
Dme-Panxδ7 4086 39.273 275 107 9 7 221 20 294 8.45e-56 180
Dme-Panxδ7 5440 45.506 178 65 7 75 221 1 177 6.20e-38 130
Dme-Panxδ2 4086 46.821 346 140 11 2 303 3 348 1.12e-105 310
Dme-Panxδ2 5440 51.055 237 95 6 94 309 1 237 2.66e-78 236
Dme-Panxδ5 4086 32.448 339 168 10 8 285 10 348 1.41e-49 166
Dme-Panxδ5 5440 33.755 237 115 7 94 290 2 236 5.12e-32 116
Dme-Panxδ6 4086 32.378 349 171 12 8 291 10 358 2.48e-48 162
Dme-Panxδ6 5440 35.169 236 115 9 94 291 2 237 1.65e-30 112
Dme-Panxδ4 4086 33.038 339 162 11 8 281 10 348 4.12e-48 162
Dme-Panxδ4 5440 36.596 235 111 10 90 286 2 236 1.85e-33 120
Dme-Panxδ1 5440 50.211 237 96 7 95 309 1 237 4.23e-80 241
Dme-Panxδ1 4086 38.040 347 171 10 1 303 2 348 2.17e-79 243
Dme-Panxδ8 4086 42.486 346 158 9 2 306 3 348 5.37e-99 293
Dme-Panxδ8 5440 51.271 236 95 4 96 311 1 236 6.26e-84 251
# ############################################################### #
Warning: Alignment format detected but sequences are different lengths. Format changed to fasta to accommodate proper printing of records.
>4086 comp4411_c0_seq1|m.4086
MFDVLGSLKSVFLRLKTISVDNSIFKLHYRLTTIILAVFSILVTSKQYLGDPIDCTTSST
TIRAELLDQYCWVSSTYSLPKAFDQKVGRFGHVSHPGIATYHEGDQVIYHQYYQWVCFVL
FLQSMMFYLPHYLWKIWECGRLKALADDIQGPLTSDETKKGKLAAISAYFSTSLFHHNFY
ATRYSICEVLNFANVVGQMFLTNRFLGGTFLTYGTEVIEFSESNQLNRTDPMIKVFPRVT
KCSFFTYGSSGDMQNHDALCVLPVNIINEKIYIVLWFWFIILAVLSGLAIIYRLIVTFSV
RARYLALRSRANSVSRSEIEKIAYNTEFGDWFVLYLLSKNVNSYVFKEVVDVVVKQLDNS
DYVPKEKHGLFKKLPL*
>5440 comp6054_c0_seq1|m.5440
FVLFFQAMLFYIPRFLWKMWEGKRLETIVLGMHVGILTEEEKNNRKKVLLEYLTRHFRRH
TFYAIKYYICELLCLVNVIGQMYLMNKFLGGEFMDYGSRVLEFSEQNQDSRTDPMIYVFP
RMTKCTFHKFGTSGDIQRHDALCVLPLNIVNEKIYIFLWFWFIILATLTALVLCYRILII
AFPKFRPQILHARCRLTPMKTINSVLRNADLGDWFLFYLLGKNMDPCIFREVCIELSKKL
ETAESNNP*
Search a plain sequence file for blast matches (note that 'Abacion_magnum.fa' contains thousands of sequences, so is not provided here).
$: sb Drosophila.nex -bl /path/to/Abacion_magnum.fa
Building a new DB with makeblastdb, current time: 01/26/2017 10:43:02
Sequence type: Protein
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 7193 sequences in 0.650279 seconds.
Running...
blastp -num_threads 4 -evalue 0.01
# ######################## BLAST results ######################## #
Dme-Panxδ3 4086 38.344 326 129 10 3 256 18 343 3.12e-73 226
Dme-Panxδ3 5440 50.279 179 71 5 74 234 1 179 5.55e-57 180
Dme-Panxδ7 4086 39.273 275 107 9 7 221 20 294 8.45e-56 180
Dme-Panxδ7 5440 45.506 178 65 7 75 221 1 177 6.20e-38 130
Dme-Panxδ2 4086 46.821 346 140 11 2 303 3 348 1.12e-105 310
Dme-Panxδ2 5440 51.055 237 95 6 94 309 1 237 2.66e-78 236
Dme-Panxδ5 4086 32.448 339 168 10 8 285 10 348 1.41e-49 166
Dme-Panxδ5 5440 33.755 237 115 7 94 290 2 236 5.12e-32 116
Dme-Panxδ6 4086 32.378 349 171 12 8 291 10 358 2.48e-48 162
Dme-Panxδ6 5440 35.169 236 115 9 94 291 2 237 1.65e-30 112
Dme-Panxδ4 4086 33.038 339 162 11 8 281 10 348 4.12e-48 162
Dme-Panxδ4 5440 36.596 235 111 10 90 286 2 236 1.85e-33 120
Dme-Panxδ1 5440 50.211 237 96 7 95 309 1 237 4.23e-80 241
Dme-Panxδ1 4086 38.040 347 171 10 1 303 2 348 2.17e-79 243
Dme-Panxδ8 4086 42.486 346 158 9 2 306 3 348 5.37e-99 293
Dme-Panxδ8 5440 51.271 236 95 4 96 311 1 236 6.26e-84 251
# ############################################################### #
Warning: Alignment format detected but sequences are different lengths. Format changed to fasta to accommodate proper printing of records.
>4086 comp4411_c0_seq1|m.4086
MFDVLGSLKSVFLRLKTISVDNSIFKLHYRLTTIILAVFSILVTSKQYLGDPIDCTTSST
TIRAELLDQYCWVSSTYSLPKAFDQKVGRFGHVSHPGIATYHEGDQVIYHQYYQWVCFVL
FLQSMMFYLPHYLWKIWECGRLKALADDIQGPLTSDETKKGKLAAISAYFSTSLFHHNFY
ATRYSICEVLNFANVVGQMFLTNRFLGGTFLTYGTEVIEFSESNQLNRTDPMIKVFPRVT
KCSFFTYGSSGDMQNHDALCVLPVNIINEKIYIVLWFWFIILAVLSGLAIIYRLIVTFSV
RARYLALRSRANSVSRSEIEKIAYNTEFGDWFVLYLLSKNVNSYVFKEVVDVVVKQLDNS
DYVPKEKHGLFKKLPL*
>5440 comp6054_c0_seq1|m.5440
FVLFFQAMLFYIPRFLWKMWEGKRLETIVLGMHVGILTEEEKNNRKKVLLEYLTRHFRRH
TFYAIKYYICELLCLVNVIGQMYLMNKFLGGEFMDYGSRVLEFSEQNQDSRTDPMIYVFP
RMTKCTFHKFGTSGDIQRHDALCVLPLNIVNEKIYIFLWFWFIILATLTALVLCYRILII
AFPKFRPQILHARCRLTPMKTINSVLRNADLGDWFLFYLLGKNMDPCIFREVCIELSKKL
ETAESNNP*
Inject several additional blast parameters
$: sb Drosophila.nex -bl /path/to/blastdb/Abacion_magnum "-max_target_seqs 1 -gapopen 10"
Running...
blastp -num_threads 4 -evalue 0.01 -max_target_seqs 1 -gapopen 10
# ######################## BLAST results ######################## #
Dme-Panxδ3 4086 38.344 326 129 10 3 256 18 343 2.68e-69 210
Dme-Panxδ7 4086 39.273 275 107 9 7 221 20 294 3.49e-53 168
Dme-Panxδ2 4086 46.821 346 140 11 2 303 3 348 4.11e-99 287
Dme-Panxδ5 4086 33.333 339 165 13 8 285 10 348 5.58e-48 156
Dme-Panxδ6 4086 32.951 349 169 14 8 291 10 358 7.03e-47 153
Dme-Panxδ4 4086 33.923 339 159 14 8 281 10 348 8.81e-47 153
Dme-Panxδ1 4086 38.329 347 170 11 1 303 2 348 4.45e-75 226
Dme-Panxδ8 4086 42.486 346 158 9 2 306 3 348 9.06e-93 271
# ############################################################### #
#NEXUS
begin data;
dimensions ntax=1 nchar=377;
format datatype=protein missing=? gap=-;
matrix
4086 MFDVLGSLKSVFLRLKTISVDNSIFKLHYRLTTIILAVFSILVTSKQYLGDPIDCTTSSTTIRAELLDQYCWVSSTYSLPKAFDQKVGRFGHVSHPGIATYHEGDQVIYHQYYQWVCFVLFLQSMMFYLPHYLWKIWECGRLKALADDIQGPLTSDETKKGKLAAISAYFSTSLFHHNFYATRYSICEVLNFANVVGQMFLTNRFLGGTFLTYGTEVIEFSESNQLNRTDPMIKVFPRVTKCSFFTYGSSGDMQNHDALCVLPVNIINEKIYIVLWFWFIILAVLSGLAIIYRLIVTFSVRARYLALRSRANSVSRSEIEKIAYNTEFGDWFVLYLLSKNVNSYVFKEVVDVVVKQLDNSDYVPKEKHGLFKKLPL*
;
end;