-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
get metabolic pathways and metabolites into the KG #86
Comments
rationale: many diseases of interest to NCATS pertain to metabolic disorders. |
note from Steve: for starters we just need a TSV file with four columns, in which column 2 is the metabolite's KEGG ID (like KEGG:C12345) and column 3 is the human-readable name of the metabolite . so this file can be prepared any way you like, it doesn't need to come from RESTful querying of a web API |
Deqing, this REST API looks better than humancyc: |
to get a list of all metabolites, you would do: curl http://rest.kegg.jp/list/compound (note that in the resulting TSV, instead of identifiers like "KEGG:C00022" you'll have "CPD:C00022"; this is fine). to get a list of proteins that are associated a specific compound like KEGG:C00022, you would do: curl http://rest.kegg.jp/link/ec/C00022 that will return TSV result like this: the "ec:" identifiers are protein identifiers, but in a format called "Enzyme Commission". To convert from "ec:4.4.1.35" to "UniProt", we would do: curl http://www.uniprot.org/uniprot/?query=(ec%3A+4.4.1.35)&format=tab&columns=id Can you code this up please? |
So you will need to create a Python module QueryKEGG.py to expose a method "map_kegg_compound_to_enzyme_commision_ids" You will then need to add a method "map_enzyme_commission_id_to_uniprot_ids" to the module QueryUniprot.py |
@dkoslicki OK we have a plan for this. I'm 95% sure we can get this done before the hackathon. |
Awesome! |
Thank you, @saramsey. Your instructions are very clear and I will finish the task ASAP. I have two questions about the list of all metabolites.
|
just to get this in the comment history: I like your idea of using the text up to the semicolon, as the "name". Then the complete text (including the text before and after the semicolon) can be the node "description". |
@saramsey The two methods in QueryKEGG.py and QueryUniprot.py are done and all test cases are passed. |
No description provided.
The text was updated successfully, but these errors were encountered: