Bioinformatics Cheat Sheet

Hello! This repo contains several sample code snippets that might be useful for someone beginning to learn bioinformatics. I will try to keep it up to date as I come across tools that I use particularly often.

Setting up your environment

To ensure that downloads here don't interfere with your current working environment, here are some steps to set up a virtual environment where you can set up these bioinformatics tools:

Install Miniforge:

Download the Miniforge installer for your platform from the Miniforge releases page: Miniforge Releases
Open a terminal or command prompt.
Navigate to the directory containing the downloaded installer.
Run the installer script:

bash Miniforge3-latest-MacOSX-x86_64.sh  # Replace with the actual filename

Create Conda Environment and Install tools:

We will call this environment "bioinformatics_env":

conda create -n bioinformatics_env 
conda activate bioinformatics_env
mamba install -c bioconda samtools bedtools biopython gtfparse star sra-tools gseapy

Verify Tools:

Here are the tools that the code in this repository requires, installed in the previous step:

Package	Description	Link to Documentation
Biopython	Manipulating, translating, and reverse-complementing sequences, among others	https://biopython.org/wiki/Documentation
samtools	Sorting, viewing, and otherwise performing analyses involving .sam or .bam files, which are the primary outputs for most RNA-seq pipelines	http://www.htslib.org/doc/samtools.html#DESCRIPTION
bedtools	bedtools intersect in particular finds overlaps between two sets of genetic features, in .bed format	https://bedtools.readthedocs.io/en/latest/#
gtfparse	Reads .gtf files, which are the primary format for annotating the locations of genes and transcripts	https://pypi.org/project/gtfparse/
STAR	Creates an index for your genome of interest, and aligns RNA-seq reads to it	https://github.com/alexdobin/STAR
fastq-dump	Downloads .fastq files from the Sequence Read Archive (SRA)	https://github.com/ncbi/sra-tools

You'll also want to ensure that these are properly installed, which you can accomplish by running the following code snippet:

samtools --version
bedtools --version
python -c "import Bio; print(Bio.__version__)"
python -c "import gtfparse; print(gtfparse.__version__)"

Clone this repository:

GitHub itself is a very useful tool worth understanding how to use! I added some code in here to help read some files. You can add it to your environment by following the below code:

git clone https://github.com/bpt26/bioinformatics_cheat_sheet.git

Under this setup, you'll need to activate the environment (conda activate myenv) every time you want to use these tools and packages.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Bioinformatics Cheat Sheet

Setting up your environment

Install Miniforge:

Create Conda Environment and Install tools:

Verify Tools:

Clone this repository:

Files

README.md

Latest commit

History

README.md

File metadata and controls

Bioinformatics Cheat Sheet

Setting up your environment

Install Miniforge:

Create Conda Environment and Install tools:

Verify Tools:

Clone this repository: