About oclust

A pipeline for clustering long 16S rRNA sequencing reads, or any sequences, into Operational Taxonomic Units.

Requirements

Linux v.2.6.x
Perl v.5.10.1
R (should be in path)
- package seqinr should be installed:

      > install.packages("seqinr")

Note on data

The pipeline is designed for Pacbio CCS reads - it will not work on raw Pacbio reads.

Input files

The only input file to oclust is a file in FASTA format containing the sequencing reads to be clustered.

FASTQ files can be converted to FASTA:

   $ cd utils
   $ chmod +x fastq_to_fasta.pl
   $ ./fastq_to_fasta.pl file.fastq > file.fasta

Installation

Get the repository:

$ git clone https://github.com/oscar-franzen/oclust.git oclust
Make executable (might not be necessary):
```
$ cd oclust
$ chmod +x *.pl
```
Decide if you want to compute distances based on Needleman-Wunsch or Infernal. The latter will be substantially faster.

First time executed, oclust_pipeline.pl will download the human genome sequence and format it.

   $ ./oclust_pipeline.pl -x <method> -f <input file> -o <output directory> -p <number of CPUs>

   General settings:
   -x PW or MSA               Can be PW for pairwise alignments (based on Needleman-Wunsch)
                               or MSA for multiple sequence alignment (based on
                               Infernal). [MSA]
   -t local or cluster        If -x is PW, should it be parallelized by running it locally
                               on multiple cores or by submitting jobs to a cluster
                               (requires a system with the LSF scheduler). [local]
   -a complete, average or    The desired clustering algorithm. [complete]
       single    
   -f [string]                Input fasta file.
   -o [string]                Name of output directory (must not exist) and use full path.
   -R HMM, BLAST, or none     Method to use for reverse complementing sequences. [HMM]
   -p [integer]               Number of processor cores to use for BLAST. [4]
   -minl [integer]            Minimum sequence length. [optional]
   -maxl [integer]            Maximum sequence length. [optional]
   -rand [integer]            Randomly sample a specified number of sequences. [optional]
   -human Y or N              If 'Y'es, then execute BLAST-based contamination
                               screen towards the human genome. [Y]
   -chimera Y or N            Run chimera check. Can be Y or N. [Y]

  LSF settings (only valid for -x PW when -t cluster):
   -lsf_queue [string]       Name of the LSF queue to use. [scavenger]
   -lsf_account [string]     Name of the account to use. [optional]
   -lsf_time [integer]       Runtime hours per job specified as number of hours. [1]
   -lsf_memory [integer]     Requested amount of RAM in MB. [3000]
   -lsf_nb_jobs [integer]    Number of jobs. [20]

Dependencies

The oclust pipeline bundles together the following open source/public domain software:

R, compiled with: $ ./configure --prefix=~/R/ --enable-static=yes --with-x=no --with-tcltk=no
The seqinr R package
Perl and BioPerl
Parallel::ForkManager
Memory::Usage
NCBI BLAST
uchime (public domain version)
HMMER (hmmscan)
vrevcomp
infernal
EMBOSS Needleman-Wunsch implementation (needle), compiled with: $ ./configure --prefix=~/e/ --disable-shared --without-mysql --without-postgresql --without-axis2c --without-hpdf --without-x --without-pngdriver

Reference

Improved OTU-picking using long-read 16S rRNA gene amplicon sequencing and generic hierarchical clustering, Franzén et al. Microbiome 2015

Contact

p.oscar.franzen at gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 177 Commits
bin		bin
db		db
modules		modules
utils		utils
README.md		README.md
oclust_pipeline.pl		oclust_pipeline.pl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About oclust

Requirements

Note on data

Input files

Installation

Dependencies

Reference

Contact

About

Releases

Packages

Languages

oscar-franzen/oclust

Folders and files

Latest commit

History

Repository files navigation

About oclust

Requirements

Note on data

Input files

Installation

Dependencies

Reference

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages