get metabolic pathways and metabolites into the KG #86

saramsey · 2018-04-23T17:18:47Z

No description provided.

saramsey · 2018-04-23T17:19:36Z

rationale: many diseases of interest to NCATS pertain to metabolic disorders.

DeqingQu · 2018-05-09T19:32:23Z

note from Steve： for starters we just need a TSV file with four columns, in which column 2 is the metabolite's KEGG ID (like KEGG:C12345) and column 3 is the human-readable name of the metabolite . so this file can be prepared any way you like, it doesn't need to come from RESTful querying of a web API

saramsey · 2018-05-09T20:23:38Z

Deqing, this REST API looks better than humancyc:
http://www.kegg.jp/kegg/rest/keggapi.html

saramsey · 2018-05-09T21:09:29Z

to get a list of all metabolites, you would do:

curl http://rest.kegg.jp/list/compound

(note that in the resulting TSV, instead of identifiers like "KEGG:C00022" you'll have "CPD:C00022"; this is fine).

to get a list of proteins that are associated a specific compound like KEGG:C00022, you would do:

curl http://rest.kegg.jp/link/ec/C00022

that will return TSV result like this:
cpd:C00022 ec:4.4.1.25
cpd:C00022 ec:4.4.1.28
cpd:C00022 ec:4.4.1.35
cpd:C00022 ec:4.4.1.36
cpd:C00022 ec:4.4.1.6
cpd:C00022 ec:4.4.1.8
cpd:C00022 ec:4.5.1.2
cpd:C00022 ec:4.3.3.7

the "ec:" identifiers are protein identifiers, but in a format called "Enzyme Commission". To convert from "ec:4.4.1.35" to "UniProt", we would do:

curl http://www.uniprot.org/uniprot/?query=(ec%3A+4.4.1.35)&format=tab&columns=id

Can you code this up please?

saramsey · 2018-05-09T21:10:46Z

So you will need to create a Python module QueryKEGG.py to expose a method "map_kegg_compound_to_enzyme_commision_ids"

You will then need to add a method "map_enzyme_commission_id_to_uniprot_ids" to the module QueryUniprot.py

saramsey · 2018-05-09T21:11:51Z

@dkoslicki OK we have a plan for this. I'm 95% sure we can get this done before the hackathon.

dkoslicki · 2018-05-09T21:13:20Z

Awesome!

DeqingQu · 2018-05-09T21:38:22Z

Thank you, @saramsey. Your instructions are very clear and I will finish the task ASAP.

I have two questions about the list of all metabolites.

The descriptions of some entries are quite long. For example,
cpd:C00006 NADP+; NADP; Nicotinamide adenine dinucleotide phosphate; beta-Nicotinamide adenine dinucleotide phosphate; TPN; Triphosphopyridine nucleotide; beta-NADP+
It seems that the description consists of many synonyms which are separated by semicolons. Should I only pick the first one before semicolons, or keep all of them?
Many of them are not related to human. Do I need to filter them? If so, is there any advice about how to do it?

saramsey · 2018-05-10T04:37:15Z

just to get this in the comment history: I like your idea of using the text up to the semicolon, as the "name". Then the complete text (including the text before and after the semicolon) can be the node "description".

DeqingQu · 2018-05-10T17:29:36Z

@saramsey The two methods in QueryKEGG.py and QueryUniprot.py are done and all test cases are passed.
BTW, I fixed a bug in the uniprot_id_to_reactome_pathways method in QueryUniprot.py module. 'Assertion Error: assert 200 == res.status_code'.

saramsey · 2018-05-21T21:47:35Z

saramsey · 2018-05-21T21:47:59Z

saramsey self-assigned this Apr 26, 2018

saramsey added the high priority label May 1, 2018

saramsey assigned DeqingQu May 9, 2018

DeqingQu added a commit that referenced this issue May 9, 2018

#86 generate tsv file

a07c324

DeqingQu added a commit that referenced this issue May 10, 2018

#86, all test cases are passed

55e500e

saramsey added a commit that referenced this issue May 11, 2018

#135 and #86

7dcf3dc

saramsey added a commit that referenced this issue May 11, 2018

#86

e61f053

saramsey added a commit that referenced this issue May 12, 2018

#86

0f10ae9

saramsey added a commit that referenced this issue May 15, 2018

#86

de6cce8

saramsey closed this as completed May 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

get metabolic pathways and metabolites into the KG #86

get metabolic pathways and metabolites into the KG #86

saramsey commented Apr 23, 2018

saramsey commented Apr 23, 2018

DeqingQu commented May 9, 2018

saramsey commented May 9, 2018

saramsey commented May 9, 2018

saramsey commented May 9, 2018

saramsey commented May 9, 2018

dkoslicki commented May 9, 2018

DeqingQu commented May 9, 2018 •

edited

Loading

saramsey commented May 10, 2018

DeqingQu commented May 10, 2018

saramsey commented May 21, 2018

saramsey commented May 21, 2018

get metabolic pathways and metabolites into the KG #86

get metabolic pathways and metabolites into the KG #86

Comments

saramsey commented Apr 23, 2018

saramsey commented Apr 23, 2018

DeqingQu commented May 9, 2018

saramsey commented May 9, 2018

saramsey commented May 9, 2018

saramsey commented May 9, 2018

saramsey commented May 9, 2018

dkoslicki commented May 9, 2018

DeqingQu commented May 9, 2018 • edited Loading

saramsey commented May 10, 2018

DeqingQu commented May 10, 2018

saramsey commented May 21, 2018

saramsey commented May 21, 2018

DeqingQu commented May 9, 2018 •

edited

Loading