Skip to content

MetazoaPhylogenomicsLab/FANTASIA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

FANTASIA: Functional ANnoTAtion based on embedding space SImilArity

License

FANTASIA (Functional ANnoTAtion based on embedding space SImilArity) is a pipeline for annotating GO terms in protein sequence files using GOPredSim (to know more) with the protein language model ProtT5. FANTASIA takes as input a proteome file (either the longest isoform or the full set of isoforms for all genes), removes identical sequences using CD-HIT (ref) and sequences longer than 5000 amino acids (due to a length constraint in the model), and executes GOPredSim-ProtT5 for all sequences. Then, it converts the standard GOPredSim output file to the input file format for topGO (ref) to facilitate its application in a wider biological workflow.

This pipeline results from joined efforts with equal contribution between Ana Roja's lab (Andalusian Center for Developmental Biology, CSIC) and Rosa Fernández's lab (Metazoa Phylogenomics Lab, Institute of Evolutionary Biology, CSIC-UPF) and shows that synergistic collaboration between labs with different expertise can result in great outcomes. We thank LifeHUB-CSIC for being the catalyst of this project and for impulsing us to 'think big'.

Cite FANTASIA

Martínez-Redondo, G. I., Barrios, I., Vázquez-Valls, M., Rojas, A. M., & Fernández, R. (2024). Illuminating the functional landscape of the dark proteome across the Animal Tree of Life. https://doi.org/10.1101/2024.02.28.582465."

For our work about the performance of the different methods in model organisms check: Barrios-Núñez, I., Martínez-Redondo, G. I., Medina-Burgos, P., Cases, I., Fernández, R. & Rojas, A.M. (2024). Decoding proteome functional information in model organisms using protein language models. https://doi.org/10.1101/2024.02.14.580341

Contact information: Gemma I. Martínez-Redondo (gemma.martinez@ibe.upf-csic.es), Ana M. Rojas (a.rojas.m@csic.es), Rosa Fernández (rosa.fernandez@ibe.upf-csic.es).

Before using FANTASIA

To reduce the environmental impact of this pipeline, check if your species of interest has already been functionally annotated using FANTASIA and use that file instead of running it from scratch. A collection of 970 animal and some closely-related outgroups are already calculated and can be found in MATEdb2.

How to use FANTASIA

FANTASIA singularity image (only CPUs)

Download the singularity image from here.

Once downloaded, you can execute it as follows (make sure that you have singularity installed!):

Syntax: ./fantasia --infile protein.fasta [--outpath output_path] [--allisoforms gene_isoform_conversion.txt] [--keepintermediate]
options:
-i/--infile           Input protein fasta file.
-h/--help             Print this Help.
-o/--outpath          (Optional) Output directory. If not provided, input file directory will be used.
-a/--allisoforms      (Optional) Tab-separated conversion file specifying the correspondance between gene and isoform IDs for obtaining a per-gene annotation using all isoforms.
-p/--prefix           (Optional) Prefix to add to output folders and files (e.g. the species code). If not provided, input file name will be used.

At a given point, FANTASIA may raise a warning if your system only has CPUs or the GPU CUDA library version is different than 11.0. You can safely ignore this message as the singularity container is prepared to run on CPUs.


Local installation and execution

1.- Download the files and scripts from here.

2.- Open installation_guide_FANTASIA.sh (you can download it from this Github repository) and follow the instructions.

3.- Execute FANTASIA (you can check the files and options required for each script by adding -h).


Galaxy implementation
Work in progress...

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages