GitHub - shahlab/LTEE_gene_expression_2: A repository for the second version of our LTEE gene expression paper

This repo contains the code and associated files for "The landscape of transcriptional and translational changes over 22 years of bacterial adaptation", currently found on biorxiv (https://www.biorxiv.org/content/10.1101/2021.01.12.426406v1).

Repo organization

The repo is organized into folders with self-descriptive names. For example, code contains the code used to process the data, perform analysis, and make figures. Likewise, data_frames contains the various data files created or used during the analysis. Note that some data files are missing because they are too large to place here and must be generated by running the code.

Running the code

Our analysis was performed on a server with 2 Intel Xeon CPU E5-2660 v4 @ 2.00GHz CPUs with 14 cores and 2 threads per core each, totaling 56 threads, 264Gb of RAM, running the following versions of software

Software	version
cutadapt	2.8
python	3.6.9
hisat2	2.1.0
kallisto	0.46.2
samtools	1.10
BBmap	37
fastX toolkit	0.0.14
R	4.2.0
Ubuntu	18.04.5 LTS

Additionally, when knitted to an HTML, the bottom of each Rmd will display the versions of the packages used. Many of the steps make use of, but do not require multiple threads. If needed, you can change the thread usage with the only consequence being that it will take longer to run. There are 3 main phases to the analysis:

Sequencing data processing - process the raw sequencing data such that it can be aligned and quantified.
Analysis - run the various analyses that are based of the sequencing data.
Interpretation - make visualizations of the data acquired during the analysis phase.

In order to recreate the analysis, you simply need to clone the repository, creating a local directory structure that should match the following:

.
├── alignment
│   ├── hisat2
│   │   ├── indices
│   │   └── output
│   └── kallisto
│       ├── indices
│       └── output
├── biocyc_files
├── code
│   ├── analysis
│   ├── data_processing
│   └── figures
├── data_frames
├── fastas
├── figures
├── gffs
└── seqdata
    ├── 1-original
    ├── 2-adapter_removed
    ├── 3-demultiplexed
    ├── 4-deduplicated
    │   ├── deduped_files
    │   └── duplicates
    ├── 5-trimmed_ends
    └── 6-rrna_depleted

Then, download the sequencing data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE164308) and place it in /seqdata/1-original. Upon downloading the data, you should change the filenames to the following new file names:

GSE number	Sample name	New file name
GSM5006206	rep1 ribo-seq araM	rep1-ribo-am.fq.gz
GSM5006207	rep1 ribo-seq araP	rep1-ribo-ap.fq.gz
GSM5006208	rep1 RNAseq araM	rep1-rna-am.fq.gz
GSM5006209	rep1 RNAseq araP	rep1-rna-ap.fq.gz
GSM5006210	rep2 ribo-seq araM	rep2-ribo-am.fq.gz
GSM5006211	rep2 ribo-seq araP	rep2-ribo-ap.fq.gz
GSM5006212	rep2 RNAseq araM	rep2-rna-am.fq.gz
GSM5006213	rep2 RNAseq araP	rep2-rna-ap.fq.gz

After that, it should just be a matter of running the code in the specified order. To ensure smooth running of the code, start with a clean R environment for each Rmd. Unless you need to modify the code, the safest way to run each Rmd is to simply knit it from within Rstudio. Knitting won't work for certain Rmds that use shell code, namely the first few that process the data and perform alignment, or others that clone repositories. It's recommended that you copy and paste these commands into the command line to execute them, as they may not execute properly from inside Rstudio.

Sequencing data processing

/code/data_processing/seq_data_processing.Rmd
/code/data_processing/alignment.Rmd
/code/data_processing/data_cleaning.Rmd

Analysis

Only three of the scripts require a particular order, after that, the order does not matter.

/code/analysis/DEseq2.Rmd
/code/analysis/riborex.Rmd
/code/analysis/combine_data_frames.Rmd

Interpretation

The code to make the figures is in /code/figures. The order of these does not matter, some may require you to run code in analysis before being able to generate a figure. knitting each document will place a pdf, png, and rds file in the /figures directory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Repo organization

Running the code

Sequencing data processing

Analysis

Interpretation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
alignment		alignment
biocyc_files		biocyc_files
code		code
data_frames		data_frames
fastas		fastas
gffs		gffs
seqdata		seqdata
.gitignore		.gitignore
README.md		README.md

shahlab/LTEE_gene_expression_2

Folders and files

Latest commit

History

Repository files navigation

Repo organization

Running the code

Sequencing data processing

Analysis

Interpretation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages