Skip to content

A repository for the second version of our LTEE gene expression paper

Notifications You must be signed in to change notification settings

shahlab/LTEE_gene_expression_2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repo contains the code and associated files for "The landscape of transcriptional and translational changes over 22 years of bacterial adaptation", currently found on biorxiv (https://www.biorxiv.org/content/10.1101/2021.01.12.426406v1).

Repo organization

The repo is organized into folders with self-descriptive names. For example, code contains the code used to process the data, perform analysis, and make figures. Likewise, data_frames contains the various data files created or used during the analysis. Note that some data files are missing because they are too large to place here and must be generated by running the code.

Running the code

Our analysis was performed on a server with 2 Intel Xeon CPU E5-2660 v4 @ 2.00GHz CPUs with 14 cores and 2 threads per core each, totaling 56 threads, 264Gb of RAM, running the following versions of software

Software version
cutadapt 2.8
python 3.6.9
hisat2 2.1.0
kallisto 0.46.2
samtools 1.10
BBmap 37
fastX toolkit 0.0.14
R 4.2.0
Ubuntu 18.04.5 LTS

Additionally, when knitted to an HTML, the bottom of each Rmd will display the versions of the packages used. Many of the steps make use of, but do not require multiple threads. If needed, you can change the thread usage with the only consequence being that it will take longer to run. There are 3 main phases to the analysis:

  1. Sequencing data processing - process the raw sequencing data such that it can be aligned and quantified.
  2. Analysis - run the various analyses that are based of the sequencing data.
  3. Interpretation - make visualizations of the data acquired during the analysis phase.

In order to recreate the analysis, you simply need to clone the repository, creating a local directory structure that should match the following:

.
├── alignment
│   ├── hisat2
│   │   ├── indices
│   │   └── output
│   └── kallisto
│       ├── indices
│       └── output
├── biocyc_files
├── code
│   ├── analysis
│   ├── data_processing
│   └── figures
├── data_frames
├── fastas
├── figures
├── gffs
└── seqdata
    ├── 1-original
    ├── 2-adapter_removed
    ├── 3-demultiplexed
    ├── 4-deduplicated
    │   ├── deduped_files
    │   └── duplicates
    ├── 5-trimmed_ends
    └── 6-rrna_depleted

Then, download the sequencing data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE164308) and place it in /seqdata/1-original. Upon downloading the data, you should change the filenames to the following new file names:

GSE number Sample name New file name
GSM5006206 rep1 ribo-seq araM rep1-ribo-am.fq.gz
GSM5006207 rep1 ribo-seq araP rep1-ribo-ap.fq.gz
GSM5006208 rep1 RNAseq araM rep1-rna-am.fq.gz
GSM5006209 rep1 RNAseq araP rep1-rna-ap.fq.gz
GSM5006210 rep2 ribo-seq araM rep2-ribo-am.fq.gz
GSM5006211 rep2 ribo-seq araP rep2-ribo-ap.fq.gz
GSM5006212 rep2 RNAseq araM rep2-rna-am.fq.gz
GSM5006213 rep2 RNAseq araP rep2-rna-ap.fq.gz

After that, it should just be a matter of running the code in the specified order. To ensure smooth running of the code, start with a clean R environment for each Rmd. Unless you need to modify the code, the safest way to run each Rmd is to simply knit it from within Rstudio. Knitting won't work for certain Rmds that use shell code, namely the first few that process the data and perform alignment, or others that clone repositories. It's recommended that you copy and paste these commands into the command line to execute them, as they may not execute properly from inside Rstudio.

Sequencing data processing

  1. /code/data_processing/seq_data_processing.Rmd
  2. /code/data_processing/alignment.Rmd
  3. /code/data_processing/data_cleaning.Rmd

Analysis

Only three of the scripts require a particular order, after that, the order does not matter.

  1. /code/analysis/DEseq2.Rmd
  2. /code/analysis/riborex.Rmd
  3. /code/analysis/combine_data_frames.Rmd

Interpretation

The code to make the figures is in /code/figures. The order of these does not matter, some may require you to run code in analysis before being able to generate a figure. knitting each document will place a pdf, png, and rds file in the /figures directory.

About

A repository for the second version of our LTEE gene expression paper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages