Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
tli14 committed Aug 21, 2022
1 parent 89bed8f commit 58c081e
Showing 1 changed file with 31 additions and 9 deletions.
40 changes: 31 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,18 +17,18 @@ This repository provides codes and files to reproduce data and figures from the

## Scripts
* **Python_Shell_scripts**
- 1. Genome_Data_Collection: collect and analyze genome data.
- **Genome_Data_Collection: collect and analyze genome data.**
- [download_all_complete_genome_fasta.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/1.Genome_Data_Collection/download_all_complete_genome_fasta.sh): Download complete bacteria genomes from assembly_summary.txt.
- [download_genus_contaminaton_genomes.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/1.Genome_Data_Collection/download_genus_contaminaton_genomes.sh): Download bacteria genomes as contamination datasets.
- [fastANI.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/1.Genome_Data_Collection/fastANI.sh): Calculate average nucleotide identity (ANI) for bacteria species.
- [fastANI.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/1.Genome_Data_Collection/fastANI.sh): Calculate average nucleotide identity (ANI) for bacteria species.<br>

- 2. 17_species: pan-genome analysis for 17 species.
- **17_species: pan-genome analysis for 17 species.**
- [prokka.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/2.17_species/prokka.sh): Genome annotation by using Prokka.
- [gen_gff.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/2.17_species/gen_gff.sh): Rename .gff files from Prokka results.
- [roary_species.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/2.17_species/roary_species.sh): Pan-genome analysis by using Roary.
- [sbatch_roary.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/2.17_species/sbatch_roary.sh): Run multiple jobs for pan-genome analysis.
- [sbatch_roary.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/2.17_species/sbatch_roary.sh): Run multiple jobs for pan-genome analysis.<br>

- 3. MAG_Simulation: simulate MAGs from complete genomes.
- **MAG_Simulation: simulate MAGs from complete genomes.**
- [fragmentation.py](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/3.MAG_Simulation/fragmentation.py): Fragmentation simulation - random cut the genome to fragments (random number of fragments).
- [fragmentation_avrg_length.py](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/3.MAG_Simulation/fragmentation_avrg_length.py): Fragmentation simulation - random cut the genome to fragments (random length of fragments).
- [incompleteness.py](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/3.MAG_Simulation/incompleteness.py): Incompleteness simulation - remove a percentage of sequence length from each fragment.
Expand All @@ -38,13 +38,35 @@ This repository provides codes and files to reproduce data and figures from the
- [generate_numbers.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/3.MAG_Simulation/generate_numbers.sh): Generate numbers for genome list to assign random fragmentation/incompleteness/contamination numbers.
- [simulation.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/3.MAG_Simulation/simulation.sh): Automatic simulation scripts.
- [batch_files.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/3.MAG_Simulation/batch_files.sh): Batch files for simulation.
- [multiple_dataset.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/3.MAG_Simulation/multiple_dataset.sh): Generate multiple datasets for testing the dataset variations.
- [multiple_dataset.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/3.MAG_Simulation/multiple_dataset.sh): Generate multiple datasets for testing the dataset variations.<br>

- 4. Mixed_datasets: generate mxied datasets contain MAGs and complete genomes.
- **Mixed_datasets: generate mxied datasets contain MAGs and complete genomes.**
- [rad_combine.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/4.Mixed_datasets/rad_combine.sh): Generate mixed datasets with different percentage of MAGs.
- [copy_ori_file.py](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/4.Mixed_datasets/copy_ori_file.py): Generate mixed datasets by combining original and simulated MAG dataset.
- [Pan-genome_and_summary.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/4.Mixed_datasets/Pan-genome_and_summary.sh): Perform pan-genome analysis for mixed datasets.
- [loop_rad_combine.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/4.Mixed_datasets/loop_rad_combine.sh): Run rad_combine.sh for multiple times.
- [roary_sum.py](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/4.Mixed_datasets/roary_sum.py): Summary Roary results for multiple mixed datasets.
- [roary_sum.py](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/4.Mixed_datasets/roary_sum.py): Summary Roary results for multiple mixed datasets.<br>

- 5. Three_tools:
- **Three_tools: pan-genome analysis using three different tools.**
- **[Anvi'o](https://github.com/tli14/PanMAGs/tree/main/Python_Shell_scripts/5.Three_tools/Anvi'o):**
- [Anvi'o.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/5.Three_tools/Anvi'o/Anvi'o.sh): Use Anvi'o for pan-genome analysis.
- [genbank-parser.py](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/5.Three_tools/Anvi'o/genbank-parser.py): To generate a tab-delimited file to define external gene calls from Genbank files given by Prokka.
- **[BPGA](https://github.com/tli14/PanMAGs/tree/main/Python_Shell_scripts/5.Three_tools/BPGA):**
- [gen_faa.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/5.Three_tools/BPGA/gen_faa.sh): Rename the .faa files.
- [sort_faa.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/5.Three_tools/BPGA/sort_faa.sh): Prepare .faa files for BPGA pan-genome analysis.
- [summary.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/5.Three_tools/BPGA/summary.sh): BPGA pan-genome result summary.
- **[Roary](https://github.com/tli14/PanMAGs/tree/main/Python_Shell_scripts/5.Three_tools/Roary):**
- [gen_gff.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/5.Three_tools/Roary/gen_gff.sh): Rename the .gff files.
- [prokka_generate.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/5.Three_tools/Roary/prokka_generate.sh): Generate prokka annotation script.
- [bash_prokka.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/5.Three_tools/Roary/bash_prokka.sh): PROKKA annotation.
- [roary.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/5.Three_tools/Roary/roary.sh): Generate Roary pan-genome analysis script.
- [sbatch_roary.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/5.Three_tools/Roary/sbatch_roary.sh): Run Roary analysis.
- [rad_roary_sum.py](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/5.Three_tools/Roary/rad_roary_sum.py): Summarize roary results.<br>

- **COG_analysis: The Clusters of Orthologous Genes (COG) analysis.**
- [prepare_files.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/6.COG_analysis/prepare_files.sh): Prepare for COG functional analysis for core gene families after pan-genome analysis.
- [rpsblast_batch.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/6.COG_analysis/rpsblast_batch.sh): Use rps-blast to perform domian search and then determine COG categories.
- [rps_select.py](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/6.COG_analysis/rps_select.py): Select non-overlap domains for each genes.
- [rps2COG.py](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/6.COG_analysis/rps2COG.py): Extract COG information from rpsblast results after the selection of non-overlap domains.
- [core_rps.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/6.COG_analysis/core_rps.sh): Run rps-blast.
- [rpsblast_to_COG.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/6.COG_analysis/rpsblast_to_COG.sh): Select non-overlap domains for rps-blast results and assign the COG categories.<br>

0 comments on commit 58c081e

Please sign in to comment.