Explore hosting KG2 in Plater/Automat #2200

amykglen · 2023-11-13T17:52:29Z

have been working on this for a few weeks and realized we don't yet have an issue for it

current status is that Evan Morris got a preliminary version of KG2 up using Plater (https://automat.renci.org/#/rtx-kg2/).

but it currently doesn't do category reasoning because Plater expects categories to be pre-expanded to their ancestors in the json lines files it ingests.

I'm going to make that tweak to our json lines files and then play around with the re-deployed Plater KG2 to see how it seems to do.

one interesting difference vs. our KG2 API is that Plater expects queries to come in using only canonical node identifiers. I don't think this should cause a problem for ARAX Expand, which I believe only queries using canonical identifiers anyway, but need to check on that...

this is an example query that produces answers from the dev Plater KG2:

curl -X 'POST' \
  'https://automat.renci.org/rtx-kg2/1.4/query' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
   "message":{
      "query_graph":{
         "nodes":{
            "n0":{
               "ids":[
                  "PUBCHEM.COMPOUND:1983"
               ],
               "categories": ["biolink:Drug"]
            },
            "n1":{
               "categories": ["biolink:Protein"]
            }
         },
         "edges":{
            "e01":{
               "subject":"n0",
               "object":"n1",
               "predicates":[
                  "biolink:physically_interacts_with"
               ]
            }
         }
      }
   }
}'

The text was updated successfully, but these errors were encountered:

saramsey · 2023-11-13T23:38:54Z

Thanks for the update. Great progress! Can the PloverDB regression (pytest) suite, perhaps with a bit of modification, be run against the Plater/KG2?

amykglen · 2023-11-14T01:01:59Z

yes, with slight modification! that may be a good starting point for tests.

amykglen · 2024-02-22T19:12:00Z

An update here for the record:

KG2.8.4c has been successfully hosted in Plater (for now on kg2cplover2.rtx.ai), using the new v1.5.0 Plater code, which seems to be much faster than the previous version (I'm told this is due to changes unrelated to Neo4j - i.e., just to the 'wrapper' sort of code that converts answers into TRAPI format and such)
In preliminary testing, Plater is faster than our KG2 API - maybe by about 25% - for single-curie queries (haven't looked into multi-curie queries yet)
- The Plater system does seem to do some caching (I think in Neo4j), but the 25% faster estimate is when caching isn't at play..
I have scripts (verified working) for automating the whole setup/building/hosting of KG2 Plater, starting from KG2c TSV files; for now that code is in this repo, but I'll move it wherever makes sense if we end up using Plater to host KG2
I also have a pytest suite set up that can easily be pointed to whatever endpoint (Plater vs. Plover) and records query times as well as other data like number of nodes/edges returned (it also saves responses locally for later analysis)
- The testing framework is in place (again in this repo) but I need to go through and actually select tests to use for our official comparison
Interestingly, Plater returns a lot more nodes/results per single-curie query than our KG2 API does (on the order of 5-10x as many)
- Looking into this, a lot of it seems to be due to erroneous subclass_of reasoning on Plater's part, which I think is because Plater considers all subclass_of edges in KG2c, whereas we only consider such edges from certain trusted sources. For instance, Plater thinks that "Placental Growth Factor" and "magnesium" are descendants of Aspirin
- I think this problem is significant enough to make KG2 Plater kind of useless in its current state. To get around it, I'm thinking of maybe changing the predicate of the subclass_of edges that come from non-trusted sources to related_to_at_concept_level (the parent of subclass_of in Biolink) when loading KG2c into Plater... this would at least allow for a much more fair/useful comparison of the two tools..

amykglen added the kg2c label Nov 13, 2023

amykglen self-assigned this Nov 13, 2023

amykglen added a commit that referenced this issue Nov 13, 2023

Add script for converting KG2c TSVs to Plater jsonl #2200

97f4dd9

amykglen added a commit that referenced this issue Nov 13, 2023

Pre-expand categories for Plater #2200

702f2bb

amykglen added a commit that referenced this issue Nov 13, 2023

Don't change name of "all_categories" in Plater jsonl #2200

54a0ed3

amykglen added a commit that referenced this issue Nov 13, 2023

Don't include expanded categories in lite file #2200

650cc65

amykglen added a commit that referenced this issue Mar 5, 2024

Add script to get random sample of real (deduplicated) KG2 queries #2200

4c0b856

amykglen closed this as completed in 3a1aa73 Mar 5, 2024

dkoslicki mentioned this issue Apr 11, 2024

Changelog since 2024-03-09 deployment to TEST #2248

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore hosting KG2 in Plater/Automat #2200

Explore hosting KG2 in Plater/Automat #2200

amykglen commented Nov 13, 2023 •

edited

Loading

saramsey commented Nov 13, 2023 •

edited

Loading

amykglen commented Nov 14, 2023

amykglen commented Feb 22, 2024 •

edited

Loading

Explore hosting KG2 in Plater/Automat #2200

Explore hosting KG2 in Plater/Automat #2200

Comments

amykglen commented Nov 13, 2023 • edited Loading

saramsey commented Nov 13, 2023 • edited Loading

amykglen commented Nov 14, 2023

amykglen commented Feb 22, 2024 • edited Loading

amykglen commented Nov 13, 2023 •

edited

Loading

saramsey commented Nov 13, 2023 •

edited

Loading

amykglen commented Feb 22, 2024 •

edited

Loading