-
Notifications
You must be signed in to change notification settings - Fork 1
/
README
86 lines (57 loc) · 2.92 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
*****************************************************************************
EVALYN -- EVolved ALYNments
*****************************************************************************
Copyright (C) 2006 Luke Sheneman
A GA for iteratively refining guide trees by evolutionary
computation for use in progressive multiple sequence alignment as
presented in:
Sheneman, L., J.A. Foster (2004) Evolving Better Multiple Sequence
Alignments, Proceedings of the Genetic and Evolutionary Computation
Conference (GECCO 2004), Seattle, WA.
*****************************************************************************
This version of EVALYN is not quite ready for public consumption. I have
no useful help messages or documentation. There is some critical missing
functionality as well, such as the ability to properly handle ambiguity
codes in input sequences. These will be added in future releases.
Basically, EVALYN will read DNA or protein sequences in FASTA format
and will output an alignment in Clustal W (*.ALN) format. It will also output
the best guide tree in Newick format. Essentially, EVALYN maintains a
population of guide trees and iteratively evolves guide trees to improve
alignments as measured by a sum-of-pairs fitness function.
TO BUILD EVALYN:
----------------
cd ./ltree
make
Executable is called "ltree", and will reside in the <evalyndist>/ltree
directory.
EXAMPLE USAGE:
--------------
./ltree --infile=proteins.fasta --population=1000 --iterations=10000 \
--matrix=blosum62.txt --gapopen=-1.0 --gapextend=-0.1 --rnj
Example substitution matrix formats are shown in <evalyndist>/ltree as:
"def_dna_matrix.txt" and "def_pro_matrix.txt", which are also the
default substitution matrices if none are explicitly specified.
COMMON COMMAND-LINE FLAGS:
--------------------------
** Inputs requiring arguments:
--infile = <fasta-formatted input file>
--outfile = <name of output file>
--treefile = <name of output Newick-formatted tree file>
--matrix = <name of substitution matrix>
--population = <population size, ex. "--population = 500" >
--iterations = <number of iterations to run, ex. "--iterations = 1000">
--converge = <program stops when it converges when no improvement in x steps>
--mutation = <mutation rate, ex. "--mutation=0.01">
--gapopen = <cost of opening a gap region, ex. "--gapopen=-4.0">
--gapextend = <cost of extending a gap region, ex. "--gapextend=-1.0">
--seed = <the random number generated seedm ex, "--seed = 1000">
** Inputs requiring NO arguments
--rnj : seeds population with a relaxed neighbod joining tree>
--dna : specifies that input is a DNA sequence
--protein : specifies that input is a protein sequence
****************************************************************************
Please direct questions to:
Luke Sheneman
sheneman@uidaho.edu
University of Idaho
****************************************************************************