Skip to content

bpt26/bioinformatics_cheat_sheet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 

Repository files navigation

Bioinformatics Cheat Sheet

Hello! This repo contains several sample code snippets that might be useful for someone beginning to learn bioinformatics. I will try to keep it up to date as I come across tools that I use particularly often.

Setting up your environment

To ensure that downloads here don't interfere with your current working environment, here are some steps to set up a virtual environment where you can set up these bioinformatics tools:

Install Miniforge:

  1. Download the Miniforge installer for your platform from the Miniforge releases page: Miniforge Releases
  2. Open a terminal or command prompt.
  3. Navigate to the directory containing the downloaded installer.
  4. Run the installer script:
bash Miniforge3-latest-MacOSX-x86_64.sh  # Replace with the actual filename

Create Conda Environment and Install tools:

We will call this environment "bioinformatics_env":

conda create -n bioinformatics_env 
conda activate bioinformatics_env
mamba install -c bioconda samtools bedtools biopython gtfparse star sra-tools gseapy

Verify Tools:

Here are the tools that the code in this repository requires, installed in the previous step:

Package Description Link to Documentation
Biopython Manipulating, translating, and reverse-complementing sequences, among others https://biopython.org/wiki/Documentation
samtools Sorting, viewing, and otherwise performing analyses involving .sam or .bam files, which are the primary outputs for most RNA-seq pipelines http://www.htslib.org/doc/samtools.html#DESCRIPTION
bedtools bedtools intersect in particular finds overlaps between two sets of genetic features, in .bed format https://bedtools.readthedocs.io/en/latest/#
gtfparse Reads .gtf files, which are the primary format for annotating the locations of genes and transcripts https://pypi.org/project/gtfparse/
STAR Creates an index for your genome of interest, and aligns RNA-seq reads to it https://github.com/alexdobin/STAR
fastq-dump Downloads .fastq files from the Sequence Read Archive (SRA) https://github.com/ncbi/sra-tools

You'll also want to ensure that these are properly installed, which you can accomplish by running the following code snippet:

samtools --version
bedtools --version
python -c "import Bio; print(Bio.__version__)"
python -c "import gtfparse; print(gtfparse.__version__)"

Clone this repository:

GitHub itself is a very useful tool worth understanding how to use! I added some code in here to help read some files. You can add it to your environment by following the below code:

git clone https://github.com/bpt26/bioinformatics_cheat_sheet.git

Under this setup, you'll need to activate the environment (conda activate myenv) every time you want to use these tools and packages.

About

Resources for learning bioinformatics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages