Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proteins with no genes #31

Open
stuppie opened this issue Mar 5, 2018 · 2 comments
Open

Proteins with no genes #31

stuppie opened this issue Mar 5, 2018 · 2 comments

Comments

@stuppie
Copy link

stuppie commented Mar 5, 2018

I think it would be useful for mygene to also store information about proteins with no associated Entrez record. For example:
http://www.uniprot.org/uniprot/A2NXD2
http://www.uniprot.org/uniprot/Q5NV61

@sirloon
Copy link
Member

sirloon commented May 10, 2018

@newgene this issue would require to adjust ID conversion in uniprot parser. Currently it tries to convert uniprot_acc to entrez ID, or if not possible, Ensembl ID. But if none of them are available the document is skipped. Probably some fix around this: https://github.com/biothings/mygene.info/blob/master/src/hub/dataload/sources/uniprot/parser.py#L53. What do you think ?

@newgene
Copy link
Member

newgene commented May 10, 2018

We need to give more thoughts on this one. Supposedly MyGene.info is all about genes, if not a gene, no record in MyGene.info. But I agree, including those uniprot IDs is useful, as genes and proteins are often so tied together. With no associated gene ID for a protein, it just means the corresponding gene has not be identified yet, but there should be a gene somewhere in the genome encoding this protein.

With this in mind, I am not against the idea of giving a "fake" gene id place-holder for a document, and put the corresponding uniprot ID within this document (so that this uniprot ID will be searchable).

One way of making this "fake" gene id is like this:

"_id": "NO_GENE_ID_FOR_A2NXD2"

This expands the gene _id priority list to three tier: NCBI Gene ID-->Ensembl Gene ID-->NO_GENE_ID for Uniprot-only gene.

Your opinions? @stuppie @sirloon @cyrus0824 @andrewsu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants