Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Results not returning with UMLS identifiers #1127

Closed
MarkDWilliams opened this issue Nov 19, 2020 · 22 comments
Closed

Results not returning with UMLS identifiers #1127

MarkDWilliams opened this issue Nov 19, 2020 · 22 comments

Comments

@MarkDWilliams
Copy link

MarkDWilliams commented Nov 19, 2020

In attempting to recreate the results of the TIDBIT regarding cyclic vomiting, I sent out several queries for connections to various identifiers related to vomiting, nausea, etc. There are connections to these identifiers present in SemMedDB, but no results are returned from ARAX. This query graph was submitted to ARAX through the ARS.

{"message":{
  
  "query_graph": {
    "nodes": [
      {
        "id": "n0",
        "set": false,
        "curie":"UMLS:C0520909",
        "type": "ChemicalSubstance"
      },
      {
        "id": "n1",
        "type": "named_thing",
        "set": false
      }
    ],
    "edges": [
      {
        "id": "e0",
        "source_id": "n1",
        "target_id": "n0"
      }
    ]
  }
}}

The following identifiers were also used without results being returned:
UMLS:C0027497
UMLS:C0027498
UMLS:C0718572
UMLS:C0722001
UMLS:C0520904
UMLS:C0520909

@saramsey
Copy link
Member

saramsey commented Nov 19, 2020

OK, checking into this now.

As a first step, I see that UMLS:C0027497 is in KG2 v2.3.4, which is the current version of the KG2 knowledge graph that ARAX uses:

Screen Shot 2020-11-19 at 9 01 13 AM

@saramsey
Copy link
Member

saramsey commented Nov 19, 2020

Five out of six of the UMLS concept IDs are in KG2.3.4:

Screen Shot 2020-11-19 at 9 09 52 AM

@saramsey
Copy link
Member

saramsey commented Nov 19, 2020

Confirmed -- UMLS:C0178572 is not in KG2.3.4, but it does appear to be a valid CUI:

Screen Shot 2020-11-19 at 9 15 53 AM

Also, confirmed that UMLS:C0178572 is not in KG2.3.5.

@saramsey
Copy link
Member

saramsey commented Nov 19, 2020

OK, we can try using endpoint

https://arax.ncats.io/test/

and we can use the query JSON

{
  "edges": [
    {
      "id": "e0",
      "source_id": "n0",
      "target_id": "n1"
    }
  ],
  "nodes": [
    {
      "curie": "UMLS:C0520909",
      "id": "n0"
    },
    {
      "id": "n1"
    }
  ]
}

(note, for query node n1, we did not specify a type in this case). Then we get a bunch of results:
Screen Shot 2020-11-19 at 9 23 43 AM

h/t @amykglen and @edeutsch for their quick assistance

@edeutsch
Copy link
Collaborator

ah, so one issue then is this line in the OP:
"type": "ChemicalSubstance"

In our KG, we still use chemical_substance from TRAPI 0.9.2. (ChemicalSubstance is proposed for TRAPI 1.0)

But, also, if you specify a CURIE, then I suggest not specifying a type, in case the type is different.

@amykglen
Copy link
Member

amykglen commented Nov 19, 2020

hmm - yeah, the query graph interpreter seems to hang at 2020-11-19 17:29:47.289621 INFO: Found input query_graph. Interpreting it and generating ARAXi processing plan to answer it when I run the original JSON query on /test.

If I omit the type of ChemicalSubstance on qnode n0 (I think our system still expects snake case for node types):

{
  "edges": [
    {
      "id": "e0",
      "source_id": "n0",
      "target_id": "n1"
    }
  ],
  "nodes": [
    {
      "curie": "UMLS:C0520909",
      "id": "n0"
    },
    {
      "id": "n1",
      "type": "named_thing"
    }
  ]
}

then the query makes it further, but it tries to use KG1, which doesn't have any results.

if I force the query to use KG2 via a DSL version of the query:

add_qnode(curie=UMLS:C0520909, id=n0)
add_qnode(type=named_thing, id=n1)
add_qedge(source_id=n0, target_id=n1, id=e0)
expand(kp=ARAX/KG2)
resultify()

a bunch of results are returned: https://arax.ncats.io/test/?m=3224

so maybe the query graph interpreter needs to be updated to use KG2 in this case?

@saramsey
Copy link
Member

Hi Amy, thanks, just saw your reply now. Yes, I think we need an endpoint where the QG interpreter defaults to KG2.

@saramsey
Copy link
Member

saramsey commented Nov 19, 2020

OK, I am going to let @amykglen and @edeutsch and @isbluis handle this. Thanks for jumping in to help!

@isbluis
Copy link
Member

isbluis commented Nov 19, 2020 via email

@saramsey
Copy link
Member

Good catch, @isbluis

@amykglen
Copy link
Member

Hi Amy, thanks, just saw your reply now. Yes, I think we need an endpoint where the QG interpreter defaults to KG2.

yeah, eventually I suppose we want to move to KG2 as our default for all queries. though one way the QG interpreter could easily tell it shouldn't use KG1 for this query is that KG1 doesn't even have a node type of named_thing (see: https://github.com/RTXteam/RTX/blob/master/code/ARAX/KnowledgeSources/KG1_allowed_node_labels.csv). maybe it would make sense for the QG interpreter to consider that information in routing queries?

@saramsey
Copy link
Member

Hi Amy, thanks, just saw your reply now. Yes, I think we need an endpoint where the QG interpreter defaults to KG2.

yeah, eventually I suppose we want to move to KG2 as our default for all queries. though one way the QG interpreter could easily tell it shouldn't use KG1 for this query is that KG1 doesn't even have a node type of named_thing (see: https://github.com/RTXteam/RTX/blob/master/code/ARAX/KnowledgeSources/KG1_allowed_node_labels.csv). maybe it would make sense for the QG interpreter to consider that information in routing queries?

Sure!

@saramsey
Copy link
Member

Hey @edeutsch what's your take on how the ARS should tell ARAX (via the API) to use KG2?

@edeutsch
Copy link
Collaborator

My take is: how could the ARS possibly decide which to use?
I think we need to make that decision for the ARS.
Not too long ago we had a discussion about whether we were ready to retire KG1 as a separate entity. the last time we discussed it, I think the answer was: not yet. But maybe today we are closer to that? Or are we ready to switch all the defaults to KG2 for the QueryGraphInterpreter?

It would be useful to discuss the original query_graph at the next AHM or mini-hackathon because there are several reasoning policies revealed by it that we should make decisions on.

@amykglen
Copy link
Member

I'm in favor of making KG2 the default for the QueryGraphInterpreter at this point. I think that would be a good first step towards retiring KG1.

the only way KG1 is still proving useful that I'm aware of is for its speed (since it's so much smaller); it's used a lot in our pytest suite for this reason. this of course isn't a reason to keep it around indefinitely, but it might be nice to phase it out rather than completely retire it off the bat.

@saramsey
Copy link
Member

Good topic for the Dec. 2 mini-hackathon.

Agree with @amykglen , maybe we could make KG2 the default in the meantime, as a stopgap and we could let MarkW know about it.

@edeutsch
Copy link
Collaborator

Shall I push the current master branch to production and everywhere and then switch the QueryGraphInterperter to use kg2 always in /beta? Would someone have time to try that out?

@amykglen
Copy link
Member

sounds good to me! I have time to try it out.

@edeutsch
Copy link
Collaborator

okay, latest master is rolled out everywhere now. All tests are passing. I will now break stuff in /test

@amykglen
Copy link
Member

looks like Mark's JSON query runs smoothly now on /test as long as the type is omitted from qnode n0:

{
  "edges": [
    {
      "id": "e0",
      "source_id": "n0",
      "target_id": "n1"
    }
  ],
  "nodes": [
    {
      "curie": "UMLS:C0520909",
      "id": "n0"
    },
    {
      "id": "n1",
      "type": "named_thing"
    }
  ]
}

https://arax.ncats.io/test/?m=3237

@amykglen
Copy link
Member

amykglen commented Dec 2, 2020

@MarkDWilliams - we believe this issue is fixed on production now. your original query graph (unmodified) is returning results in testing. can you verify from your end?

@MarkDWilliams
Copy link
Author

Appears to be resolved on my end as well. Thanks everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants