Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingest SIDER via mychem #90

Closed
dkoslicki opened this issue Apr 24, 2018 · 22 comments
Closed

Ingest SIDER via mychem #90

dkoslicki opened this issue Apr 24, 2018 · 22 comments

Comments

@dkoslicki
Copy link
Member

dkoslicki commented Apr 24, 2018

Apparently some of the SMEs are going to be interested in side effects of drugs. These can be obtained from mychem:
NCATS-Tangerine/translator-knowledge-graph#3 (comment)
http://mychem.info/v1/chem/KWHRDNMACVLHCE-UHFFFAOYSA-N

The R&I group already has some interesting (and within our reach!) questions that could be asked if we ingested this KS.

@dkoslicki
Copy link
Member Author

Priority before the hackathon is TBA depending on what the questions from the SMEs are (will be sent these soon from Christine)

@dkoslicki
Copy link
Member Author

UMLS is used to identify terms. We would probably need to map these to HP phenotypes. (Chris Bizon mentioned this)

@saramsey
Copy link
Member

jared has expressed the opinion that this may introduce "noise" into the KG because of the high multiplicity of side effects reported for drugs, and may make reasoning difficult

@saramsey
Copy link
Member

maybe start with exposing a python method "get_side_effects_for_drug"

@dkoslicki
Copy link
Member Author

Christine says that this is a priority for the proto-MVP, and not so much for the SMEs.

@tylerperyea
Copy link

tylerperyea commented Jun 13, 2018

This is really cool! Does mychem support non small molecule substances as well? Such as monoclonal antibodies, mixtures or other large/disperse drug-like entities?

@dkoslicki
Copy link
Member Author

@saramsey Can you answer this question from @tylerperyea?

@dkoslicki
Copy link
Member Author

@tylerperyea In the Data Matrix Workflow you list Purple team as having a "3" for What are the common side effects of [drug]? A few questions:

  1. Is this information accessible via an API, or is it from a manually constructed query?
  2. If we were to ingest SIDER via MyChem, we would be using particular identifiers (HPO, ChEMBL, etc.). Do you want us to be able to return this information in any particular format/identifiers?

@dkoslicki
Copy link
Member Author

@tylerperyea per your question about non small molecule substances: @saramsey is looking into this now.

@dkoslicki
Copy link
Member Author

See also #243, #244, and #245

@saramsey
Copy link
Member

@flashkicker can you try to add a method to QueryMyChem.py:

map_chembl_compound_to_side_effects_umls(chembl_id)

that will take a ChEMBL compound ID (like "CHEMBL521") as an argument and return a set of UMLS IDs for side effects for the drug. In the response JSON object from mychem, you will want to look for "sider".

screen shot 2018-07-18 at 9 58 41 am

@flashkicker
Copy link
Contributor

We just need to add code to BioNetExpander.py that can map a drug (i.e., a node with label “chemical_substance") to a side effect phenotype (i.e., a node with label “phenotypic_feature”) using the method DrugMapper.map_drug_to_hp_with_side_effects. The relationship name that it should use is “causes_or_contributes_to”, and the sourcedb argument to the Orangeboard.add_rel method can be “SIDER".

@saramsey
Copy link
Member

@flashkicker any update on this issue?

@flashkicker
Copy link
Contributor

@saramsey Almost done, just testing it out.

@saramsey
Copy link
Member

@tylerperyea yes, ChEMBL includes some biologics, like bevacizumab:

screen shot 2018-07-26 at 9 10 49 pm

@dkoslicki
Copy link
Member Author

@tylerperyea Just FYI (see the above comment), it does appear that ChEMBL has biologics!

@saramsey
Copy link
Member

saramsey commented Aug 1, 2018

@flashkicker can you give me an update on this issue, please?

@saramsey
Copy link
Member

saramsey commented Aug 8, 2018

Sent by @DeqingQu to Chunlei Wu:

Hi Chunlei,

This is Deqing Qu from Oregon State University and I am working with Steve Ramsey for NCATS translator project. Could you help us answer one question about the MyChem API?

We are trying to use MyChem to map between a drug (specified by its CHEMBL ID) and its side effects (which are specified by UMLS IDs). For example, we can get the expected side effects by querying http://mychem.info/v1/chem/KWHRDNMACVLHCE-UHFFFAOYSA-N?fields=sider

But when we tried many other drugs (like Penicillin or Cetirizine), MyChem API didn’t return any side effects as we expected, no matter we used CHEMBL ID or InChi Key as the parameter of the MyChem API. Is the API designed not to return any side effects for the drugs? Is it possible to add the 'sider' field for the drugs?

Here are some examples.

Penicillin V
https://www.ebi.ac.uk/chembldb/compound/inspect/CHEMBL615
http://mychem.info/v1/chem/CHEMBL615?fields=sider
http://mychem.info/v1/chem/BPLBGHOLXOTWMN-MBNYWOFBSA-N?fields=sider

Cetirizine
https://www.ebi.ac.uk/chembldb/compound/inspect/CHEMBL1000
http://mychem.info/v1/chem/CHEMBL1000?fields=sider
http://mychem.info/v1/chem/ZKLPARSLTMPFCP-UHFFFAOYSA-N?fields=sider

Amoxicillin
https://www.ebi.ac.uk/chembldb/compound/inspect/CHEMBL1082
http://mychem.info/v1/chem/CHEMBL1082?fields=sider
http://mychem.info/v1/chem/LSQZJLSUYDQPKJ-NJBDSQKTSA-N?fields=sider

Domperidone
https://www.ebi.ac.uk/chembldb/compound/inspect/CHEMBL219916
http://mychem.info/v1/chem/CHEMBL219916?fields=sider
http://mychem.info/v1/chem/FGXWKSZFVQUSTL-UHFFFAOYSA-N?fields=sider

Cheers,
Deqing Qu

@saramsey
Copy link
Member

saramsey commented Aug 9, 2018

Kevin Xin replies:

Hi Steve,

Thanks for pointing that out!

I looked into the source file from SIDER (http://sideeffects.embl.de/download/), and found where the problem is. SIDER use pubchem compound ID for their compound ID. And in MyChem.info, we also use these pubchem compound ID provided by SIDER for merging. However, the pubchem Compound ID provided in the SIDER source file are all wrong. (However, the one shown are their web interface is correct)

For example, 'Amoxicillin' should correspond to CID2171. However, in the source file from SIDER, they put it as CID100002171, while this CID10000217 refers to a totally different compound https://pubchem.ncbi.nlm.nih.gov/compound/100002171. Moreover, 'Amoxicillin' is not a single edge case in SIDER. They arbitrarily added '100000*' to all the pubchem compound IDs in SIDER. So the current SIDER mapping in MyChem.info is all wrong.

We will add some regex in our MyChem.info parser to fix this issue. That's pretty straightforward. But in order for that change to be reflected on the API, we need more time to do the data reindexing. So just be aware you might not see the new changes instantly, but we will keep you updated!

Thanks you and Deqing for reporting that! Really appreciate it!

Best,
Kevin

@saramsey
Copy link
Member

I looked into eHealthMe.com as a possible alternative. Their website claims to have a public REST API for drug adverse event data (which they aggregate from FDA), but the API seems to be down; I just get a 301 error.

@saramsey
Copy link
Member

Kevin Xin has provided an update:

Hi Stephen,

Sorry for the delay. Our expectation is that there will be a full new release of MyChem.info before the Portland hackathon. We are currently also working on a couple of other data issues in MyChem.info related to drugbank, chebi, etc.

I will keep you updated on this!

Thanks!

Best,
Kevin

saramsey added a commit that referenced this issue Sep 18, 2018
@saramsey
Copy link
Member

Hi Steve,

We have just put forward the new MyChem release. It fixes the previous issue regarding SIDER. Please give a try.

Thanks!

Best,
Kevin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants