Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
tli14 committed Aug 21, 2022
1 parent 18275c2 commit 89bed8f
Showing 1 changed file with 19 additions and 5 deletions.
24 changes: 19 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,20 +17,34 @@ This repository provides codes and files to reproduce data and figures from the

## Scripts
* **Python_Shell_scripts**
- 1. Genome_Data_Collection:
- 1. Genome_Data_Collection: collect and analyze genome data.
- [download_all_complete_genome_fasta.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/1.Genome_Data_Collection/download_all_complete_genome_fasta.sh): Download complete bacteria genomes from assembly_summary.txt.
- [download_genus_contaminaton_genomes.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/1.Genome_Data_Collection/download_genus_contaminaton_genomes.sh): Download bacteria genomes as contamination datasets.
- [fastANI.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/1.Genome_Data_Collection/fastANI.sh): Calculate average nucleotide identity (ANI) for bacteria species.
- 2. 17_species:

- 2. 17_species: pan-genome analysis for 17 species.
- [prokka.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/2.17_species/prokka.sh): Genome annotation by using Prokka.
- [gen_gff.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/2.17_species/gen_gff.sh): Rename .gff files from Prokka results.
- [roary_species.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/2.17_species/roary_species.sh): Pan-genome analysis by using Roary.
- [sbatch_roary.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/2.17_species/sbatch_roary.sh): Run multiple jobs for pan-genome analysis.
- 3. MAG_Simulation:

- 3. MAG_Simulation: simulate MAGs from complete genomes.
- [fragmentation.py](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/3.MAG_Simulation/fragmentation.py): Fragmentation simulation - random cut the genome to fragments (random number of fragments).
- [fragmentation_avrg_length.py](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/3.MAG_Simulation/fragmentation_avrg_length.py): Fragmentation simulation - random cut the genome to fragments (random length of fragments).
- [incompleteness.py](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/3.MAG_Simulation/incompleteness.py): Incompleteness simulation - remove a percentage of sequence length from each fragment.
- [contamination.py](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/3.MAG_Simulation/contamination.py): Contamination simulation - add fragments from other genomes in the same species (intraspecies).
- [contamination_genus.py](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/3.MAG_Simulation/contamination_genus.py): Contamination simulation - add fragments from other genomes in the same genus (interspecies).
- [random_distribution](https://github.com/tli14/PanMAGs/tree/main/Python_Shell_scripts/3.MAG_Simulation/random_distribution):
Generate random numbers following F distribution for simualtion.
- [random_distribution](https://github.com/tli14/PanMAGs/tree/main/Python_Shell_scripts/3.MAG_Simulation/random_distribution): Generate random numbers following F distribution for simualtion.
- [generate_numbers.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/3.MAG_Simulation/generate_numbers.sh): Generate numbers for genome list to assign random fragmentation/incompleteness/contamination numbers.
- [simulation.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/3.MAG_Simulation/simulation.sh): Automatic simulation scripts.
- [batch_files.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/3.MAG_Simulation/batch_files.sh): Batch files for simulation.
- [multiple_dataset.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/3.MAG_Simulation/multiple_dataset.sh): Generate multiple datasets for testing the dataset variations.

- 4. Mixed_datasets: generate mxied datasets contain MAGs and complete genomes.
- [rad_combine.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/4.Mixed_datasets/rad_combine.sh): Generate mixed datasets with different percentage of MAGs.
- [copy_ori_file.py](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/4.Mixed_datasets/copy_ori_file.py): Generate mixed datasets by combining original and simulated MAG dataset.
- [Pan-genome_and_summary.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/4.Mixed_datasets/Pan-genome_and_summary.sh): Perform pan-genome analysis for mixed datasets.
- [loop_rad_combine.sh](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/4.Mixed_datasets/loop_rad_combine.sh): Run rad_combine.sh for multiple times.
- [roary_sum.py](https://github.com/tli14/PanMAGs/blob/main/Python_Shell_scripts/4.Mixed_datasets/roary_sum.py): Summary Roary results for multiple mixed datasets.

- 5. Three_tools:

0 comments on commit 89bed8f

Please sign in to comment.