Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get metabolic pathways and metabolites into the KG #86

Closed
saramsey opened this issue Apr 23, 2018 · 12 comments
Closed

get metabolic pathways and metabolites into the KG #86

saramsey opened this issue Apr 23, 2018 · 12 comments
Assignees

Comments

@saramsey
Copy link
Member

No description provided.

@saramsey
Copy link
Member Author

rationale: many diseases of interest to NCATS pertain to metabolic disorders.

@DeqingQu
Copy link
Member

DeqingQu commented May 9, 2018

note from Steve: for starters we just need a TSV file with four columns, in which column 2 is the metabolite's KEGG ID (like KEGG:C12345) and column 3 is the human-readable name of the metabolite . so this file can be prepared any way you like, it doesn't need to come from RESTful querying of a web API

@saramsey
Copy link
Member Author

saramsey commented May 9, 2018

Deqing, this REST API looks better than humancyc:
http://www.kegg.jp/kegg/rest/keggapi.html

@saramsey
Copy link
Member Author

saramsey commented May 9, 2018

to get a list of all metabolites, you would do:

curl http://rest.kegg.jp/list/compound

(note that in the resulting TSV, instead of identifiers like "KEGG:C00022" you'll have "CPD:C00022"; this is fine).

to get a list of proteins that are associated a specific compound like KEGG:C00022, you would do:

curl http://rest.kegg.jp/link/ec/C00022

that will return TSV result like this:
cpd:C00022 ec:4.4.1.25
cpd:C00022 ec:4.4.1.28
cpd:C00022 ec:4.4.1.35
cpd:C00022 ec:4.4.1.36
cpd:C00022 ec:4.4.1.6
cpd:C00022 ec:4.4.1.8
cpd:C00022 ec:4.5.1.2
cpd:C00022 ec:4.3.3.7

the "ec:" identifiers are protein identifiers, but in a format called "Enzyme Commission". To convert from "ec:4.4.1.35" to "UniProt", we would do:

curl http://www.uniprot.org/uniprot/?query=(ec%3A+4.4.1.35)&format=tab&columns=id

Can you code this up please?

@saramsey
Copy link
Member Author

saramsey commented May 9, 2018

So you will need to create a Python module QueryKEGG.py to expose a method "map_kegg_compound_to_enzyme_commision_ids"

You will then need to add a method "map_enzyme_commission_id_to_uniprot_ids" to the module QueryUniprot.py

@saramsey
Copy link
Member Author

saramsey commented May 9, 2018

@dkoslicki OK we have a plan for this. I'm 95% sure we can get this done before the hackathon.

@dkoslicki
Copy link
Member

Awesome!

@DeqingQu
Copy link
Member

DeqingQu commented May 9, 2018

Thank you, @saramsey. Your instructions are very clear and I will finish the task ASAP.

I have two questions about the list of all metabolites.

  1. The descriptions of some entries are quite long. For example,
    cpd:C00006 NADP+; NADP; Nicotinamide adenine dinucleotide phosphate; beta-Nicotinamide adenine dinucleotide phosphate; TPN; Triphosphopyridine nucleotide; beta-NADP+
    It seems that the description consists of many synonyms which are separated by semicolons. Should I only pick the first one before semicolons, or keep all of them?
  2. Many of them are not related to human. Do I need to filter them? If so, is there any advice about how to do it?

DeqingQu added a commit that referenced this issue May 9, 2018
@saramsey
Copy link
Member Author

just to get this in the comment history: I like your idea of using the text up to the semicolon, as the "name". Then the complete text (including the text before and after the semicolon) can be the node "description".

DeqingQu added a commit that referenced this issue May 10, 2018
@DeqingQu
Copy link
Member

@saramsey The two methods in QueryKEGG.py and QueryUniprot.py are done and all test cases are passed.
BTW, I fixed a bug in the uniprot_id_to_reactome_pathways method in QueryUniprot.py module. 'Assertion Error: assert 200 == res.status_code'.

saramsey added a commit that referenced this issue May 11, 2018
saramsey added a commit that referenced this issue May 11, 2018
saramsey added a commit that referenced this issue May 12, 2018
saramsey added a commit that referenced this issue May 15, 2018
@saramsey
Copy link
Member Author

screen shot 2018-05-21 at 2 47 28 pm

@saramsey
Copy link
Member Author

screen shot 2018-05-21 at 2 47 48 pm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants