Skip to content
Steve Bond edited this page Mar 4, 2016 · 1 revision

--hash_ids, -hi

Description

Rename all sequence IDs to a hash string selected randomly from the ascii_letters set.

A hash-map table is written to stderr above the alignments which are written to stdout (silenced with the -q flag). Note that every sequence gets a unique hash, even if the same original ID is used in multiple alignments; the order of the hash-map will match the order that sequences appear in the output

For developers; an attribute is appended to the SeqBuddy object named hash_map. It is an OrderedDict(), of the form {hash: original_id}

Argument

Hash length ( int )

Optional. Specify the length of the new hash strings IDs (default = 10). If the number of possible hashes is smaller than twice the total number of sequences, then a warning will be printed to stderr and the hash length will be increased automatically until it meets this criteria.

Examples

Input file: Panx_C-term.physr

 3 62
Bfo-Panxα1   DPHYKKVYYKIGTSGRVILNVLASSISPACFQEIMNNVCPRLIRAHVSRKGRNLGDDPNL--
Hca-Panxα1   --HYKKVYYKIGTSGRVILNVIASSIAPSAFQEIMNNVCPRLIRTHVSKKGRNLIDDPDLIS
Mle-Panxα1  DPHYKKVYYKIGTSGRVILNMLAASISPTCFQEIMNNVCPRLIRAHVSKKGRNLGDDPLL--

 3 68
Bfo-Panxα4  -----EIIQVMTDNTNPLFFSKIFNELTNLLIETSSDQAGKVVENLAMQG-DEDTIVDLDTSSSRT--
Hca-Panxα4  -------LQVLMANTHPVIFTRIFDELTFRLVTKASMD-CEAVKNLQAEGQIGETAIDLEPNLGKAVG
Mle-Panxα4  GAGGREIVQILTDNSNPLLFSKIFDDLTNLLITTSKN--ADVIENLSKL---DSSVIELGSKDSI---

 3 61
Bfo-Panxα8  GDSKLKYIYFNCGTTGRTYLHLIAKNINPRIFEQLIIKLKNDLVEEKNKQHLKQTK-EMPV
Hca-Panxα8  -DNKLKYIYFNCGTTGRTYLHLIANNVNPRVFEQLVIRLSKDLVEEKNKAHLKKAEGEANV
Mle-Panxα8  ENSKLKFIYFNCGTTGRTYLHLIAKNVNPRIFEQLIIKLSADLVEEKNKQHLKGSK-DILV

Usage example 1

$: alb Panx_C-term.physr -hi

Output

# Hash table
4CKTDKOBu3,Bfo-Panxα1
KXz9xKCs46,Hca-Panxα1
3XQEvSkBZo,Mle-Panxα1
ru9eV9aFaW,Bfo-Panxα4
tP1nsbNt35,Hca-Panxα4
wSKIW6vQpX,Mle-Panxα4
JnCuUwvDHe,Bfo-Panxα8
1PKGTIaOsD,Hca-Panxα8
x4B8BdeTRW,Mle-Panxα8

 3 62
4CKTDKOBu3  DPHYKKVYYKIGTSGRVILNVLASSISPACFQEIMNNVCPRLIRAHVSRKGRNLGDDPNL--
KXz9xKCs46  --HYKKVYYKIGTSGRVILNVIASSIAPSAFQEIMNNVCPRLIRTHVSKKGRNLIDDPDLIS
3XQEvSkBZo  DPHYKKVYYKIGTSGRVILNMLAASISPTCFQEIMNNVCPRLIRAHVSKKGRNLGDDPLL--

 3 68
ru9eV9aFaW  -----EIIQVMTDNTNPLFFSKIFNELTNLLIETSSDQAGKVVENLAMQG-DEDTIVDLDTSSSRT--
tP1nsbNt35  -------LQVLMANTHPVIFTRIFDELTFRLVTKASMD-CEAVKNLQAEGQIGETAIDLEPNLGKAVG
wSKIW6vQpX  GAGGREIVQILTDNSNPLLFSKIFDDLTNLLITTSKN--ADVIENLSKL---DSSVIELGSKDSI---

 3 61
JnCuUwvDHe  GDSKLKYIYFNCGTTGRTYLHLIAKNINPRIFEQLIIKLKNDLVEEKNKQHLKQTK-EMPV
1PKGTIaOsD  -DNKLKYIYFNCGTTGRTYLHLIANNVNPRVFEQLVIRLSKDLVEEKNKAHLKKAEGEANV
x4B8BdeTRW  ENSKLKFIYFNCGTTGRTYLHLIAKNVNPRIFEQLIIKLSADLVEEKNKQHLKGSK-DILV

Usage example 2

$: alb Panx_C-term.physr Panx_C-term.physr Panx_C-term.physr -hi 1

Output

Warning: The hash_length parameter was passed in with the value 1. This is too small to properly cover all sequences, so it has been increased to 2.

# Hash table
bV,Bfo-Panxα1
R6,Hca-Panxα1
A6,Mle-Panxα1
........

 3 62
bV  DPHYKKVYYKIGTSGRVILNVLASSISPACFQEIMNNVCPRLIRAHVSRKGRNLGDDPNL--
R6  --HYKKVYYKIGTSGRVILNVIASSIAPSAFQEIMNNVCPRLIRTHVSKKGRNLIDDPDLIS
A6  DPHYKKVYYKIGTSGRVILNMLAASISPTCFQEIMNNVCPRLIRAHVSKKGRNLGDDPLL--

 3 68
BV  -----EIIQVMTDNTNPLFFSKIFNELTNLLIETSSDQAGKVVENLAMQG-DEDTIVDLDTSSSRT--
1x  -------LQVLMANTHPVIFTRIFDELTFRLVTKASMD-CEAVKNLQAEGQIGETAIDLEPNLGKAVG
Ly  GAGGREIVQILTDNSNPLLFSKIFDDLTNLLITTSKN--ADVIENLSKL---DSSVIELGSKDSI---

 3 61
WA  GDSKLKYIYFNCGTTGRTYLHLIAKNINPRIFEQLIIKLKNDLVEEKNKQHLKQTK-EMPV
SY  -DNKLKYIYFNCGTTGRTYLHLIANNVNPRVFEQLVIRLSKDLVEEKNKAHLKKAEGEANV
ya  ENSKLKFIYFNCGTTGRTYLHLIAKNVNPRIFEQLIIKLSADLVEEKNKQHLKGSK-DILV........

Main Toolkit Pages





Further Reading

Clone this wiki locally