Name	Name	Last commit message	Last commit date
Latest commit History 9 Commits
data/Pat4	data/Pat4
scripts	scripts
tools	tools
.gitignore	.gitignore
License.txt	License.txt
README.md	README.md

Name

Last commit message

Last commit date

data/Pat4

Microbial pan-genome analysis toolkit (MPA)

Author: Wei Ding and Richard Neher

Overview: MPAM is based on an automated pan-genome identification pipeline that determines clusters of orthologous genes. The pipeline starts with a set of annotated sequences (e.g. NCBI RefSeq) of a bacterial species. The genomes are split into individual genes and all genes from all strains are compared to each other via the fast protein alignment tool DIAMOND and then clustered into orthologous groups using orthAgogue and MCL. After the construction of gene clusters, genes within clusters are aligned and the corresponding phylogenetic tree is computed, with mutations mapped into each tree and various summary statistics calculated.

Dependencies:

1.1 Required software:

* DIAMOND (fast protein alignment tool)
  - Install: (source: https://github.com/bbuchfink/diamond)
  - wget http://github.com/bbuchfink/diamond/releases/download/v0.7.12/diamond-linux64.tar.gz
  - tar xzf diamond-linux64.tar.gz

* orthAgogue: Please install including all the required dependencies as specified [here] (https://code.google.com/archive/p/orthagogue/)

* [MCL Markov Cluster Algorithm](http://micans.org/mcl/)
  - sudo apt-get install mcl

* mafft (multiple alignment program)
  - Download and install from http://mafft.cbrcj.p/alignment/software/linux.html
  - OR sudo apt-get install mafft

* [fasttree](http://www.microbesonline.org/fasttree/)
  - sudo apt-get install fasttree

* [raxml](https://github.com/stamatak/standard-RAxML)
  - sudo apt-get install raxml

1.2 Required python packages: - pip install numpy scipy biopython ete2 - treetime

How to run:

sh run.sh

    Description:
    This calls run-pipeline.py to run each step using scripts located in folder ./scripts/
    run-pipeline.py [-h] -fn folder_name -sl strain_list
                       [-st steps [steps ...]] [-rt raxml_max_time]
                       [-t threads] [-bp blast_file_path]

    mandatory parameters: -fn folder_name / -sl strain_list / [-st steps [steps ...]]
    NOTICE: strain_list format should be species_name+'-RefSeq', e.g.: Saureus-RefSeq.txt
    Example: python ./scripts/run-pipeline.py  -fn /ebio/ag-neher/share/users/wding/mpam/data/Pat3 -sl Pat3-RefSeq.txt -st 11 -t 64 > Pat3-11.log 2>&1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Microbial pan-genome analysis toolkit (MPA)

About

Releases

Packages

Contributors 4

Languages

License

wdingx/pan-genome-analysis

Folders and files

Latest commit

History

Repository files navigation

Microbial pan-genome analysis toolkit (MPA)

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages