Skip to content

Latest commit

 

History

History
59 lines (42 loc) · 3 KB

README.md

File metadata and controls

59 lines (42 loc) · 3 KB

Bioinformatics Cheat Sheet

Hello! This repo contains several sample code snippets that might be useful for someone beginning to learn bioinformatics. I will try to keep it up to date as I come across tools that I use particularly often.

Setting up your environment

To ensure that downloads here don't interfere with your current working environment, here are some steps to set up a virtual environment where you can set up these bioinformatics tools:

Install Miniforge:

  1. Download the Miniforge installer for your platform from the Miniforge releases page: Miniforge Releases
  2. Open a terminal or command prompt.
  3. Navigate to the directory containing the downloaded installer.
  4. Run the installer script:
bash Miniforge3-latest-MacOSX-x86_64.sh  # Replace with the actual filename

Create Conda Environment and Install tools:

We will call this environment "bioinformatics_env":

conda create -n bioinformatics_env 
conda activate bioinformatics_env
mamba install -c bioconda samtools bedtools biopython gtfparse star sra-tools gseapy

Verify Tools:

Here are the tools that the code in this repository requires, installed in the previous step:

Package Description Link to Documentation
Biopython Manipulating, translating, and reverse-complementing sequences, among others https://biopython.org/wiki/Documentation
samtools Sorting, viewing, and otherwise performing analyses involving .sam or .bam files, which are the primary outputs for most RNA-seq pipelines http://www.htslib.org/doc/samtools.html#DESCRIPTION
bedtools bedtools intersect in particular finds overlaps between two sets of genetic features, in .bed format https://bedtools.readthedocs.io/en/latest/#
gtfparse Reads .gtf files, which are the primary format for annotating the locations of genes and transcripts https://pypi.org/project/gtfparse/
STAR Creates an index for your genome of interest, and aligns RNA-seq reads to it https://github.com/alexdobin/STAR
fastq-dump Downloads .fastq files from the Sequence Read Archive (SRA) https://github.com/ncbi/sra-tools

You'll also want to ensure that these are properly installed, which you can accomplish by running the following code snippet:

samtools --version
bedtools --version
python -c "import Bio; print(Bio.__version__)"
python -c "import gtfparse; print(gtfparse.__version__)"

Clone this repository:

GitHub itself is a very useful tool worth understanding how to use! I added some code in here to help read some files. You can add it to your environment by following the below code:

git clone https://github.com/bpt26/bioinformatics_cheat_sheet.git

Under this setup, you'll need to activate the environment (conda activate myenv) every time you want to use these tools and packages.