-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Roll out KG2.7.4 (Biolink 2.2.6) #1728
Comments
alright, the synonymizer+KG2c build is ongoing on to kick off the build, all I did was 1) update (locally)
and 2) run:
(note: I made sure to create an (empty) |
if all goes well the build should be done this evening (at which point I'll take care of loading it into Plover) |
Thank you! |
Just an FYI that in KG2pre, the edge property formerly called |
the synonymizer build completed successfully and things seem fine so far with that, but the KG2c build errored out while using |
alright, the new KG2c is ready in Neo4j: http://kg2-7-4c.rtx.ai:7474/browser/ everything looks fine so far on spot checking. upload to PloverDB is in progress. |
KG2c has been loaded into Plover and all necessary downstream databases have been rebuilt. will test everything together tomorrow morning. |
actually, ran the ARAX test suite tonight and all I'll do some deeper testing (e.g., Expand's |
This is great! Thank you @amykglen !! |
Hi @finnagin, do we need slim databases for Travis in this time point? Currently. due to the limited time, I only built the refreshed database but the full databases might need longer time. Just want to see if you also need to slim version for the refreshed database. Thanks! |
@chunyuma We do still need those but since it's only used for testing and not the actual system I think we don't need to be sure to make the deadline for the slim database part. Though @amykglen, we will also need to come up with a way to generate slim kg2c and node synonymizer versions if we want Travis to run. |
ah, yeah, I dropped the ball on the slim database thing. I added an agenda item for this week's AHM to touch base on that! (not a blocker for this KG2 rollout) |
everything still looks good on further testing - one |
Hi @amykglen, sorry for late response. For Here is the neo4j query for this test:
|
@amykglen, I think I figure out the problem. It seems like that the function _get_cypher_for_query_edge is deprecated now. This might be an old function that expand used to create the neo4j query. Perhaps we now have other functions somewhere in expand to process this. I'm now modifying this function to solve this error temporarily. Could you please let me know where I can find the new function to replace this function so that we can make everything consistent? Thanks! |
Should we add "slim database" to the KG2c template checklist? (maybe it's already on there, I didn't check). |
yep, we have an item for slim databases already |
the rest of Expand doesn't use neo4j at all anymore, so there is no current |
@amykglen, in DTD querier, it contains two modes "fast mode" and "slow mode". The "fast mode" is to query the DTD database directly while the "slow mode" is to call the DTD model and compute the drug repurposing probability on the fly. So when we use "slow mode", we need Take
the "slow mode" needs to know what "n1" nodes should be paired with |
so you mean you need to run the one-hop query on KG2 to get diseases connected to acetaminophen? you can do that with Plover like so:
by default it will return answers in this format (only including node/edge IDs):
but if you want more info included in the results you can add |
Thanks @amykglen. If I only want all nodes with categories 'biolink:Disease' or 'biolink:DiseaseOrPhenotypicFeature' or 'biolink:PhenotypicFeature', I think the Plover can also do this by modifying the
Is it right? |
yep! |
or wait, so you're trying to get all disease-like nodes in KG2? (not just connected to acetaminophen?) not sure whether that would work... |
@amykglen, yes, I'm thinking that the DTD expand should be independent of RTX-KG2c, right? This means there are some edges generated by DTD expand based on the DTD model with probability > certain threshold which might exist in the RTX-KG2c. So back to the |
ah, ok, I didn't realize you're looking up all disease-like nodes. yeah, that won't work with Plover. so you really only need to get the list of all disease-like node IDs once, right? (for each KG2 version.) not on every query? could you do that during building of DTD? (and then just store the list of IDs in one of your DTD databases, or a separate database if you preferred, which could be added to the database manger) |
Actually, not just all disease-like node IDs. The reason it is a list of all disease-like node IDs is because in this query, we try to expand n0:'acetaminophen' to n1:'disease-like' nodes:
Perhaps in other queries, people are interested in the expand for acetaminophen to other categories via DTD expand. (Note that currently we don't check the category provided by the user via slow mode. In other words, it is allowed that people can provide any kinds of categories via DTD expand.). So actually, we need a function that can extract all nodes corresponding to the categories provided by the user. Do you think it is feasible? I think we can pre-store the ID list corresponding to different categories. However, I'm not sure if we need to consider the hierarchical relation. For example, if the user set |
I think that's right that you would want to do hierarchical reasoning for these category ID lists. if you query the KG2c neo4j by label (e.g., (Plover can't help here since it doesn't currently allow queries where no qnode is "pinned") |
hey @finnagin - have you updated the test triples (for the NCATS repo) for KG2.7.4 yet? |
The pull request for updating the test triples is now in the NCATSTranslator/testing repo. |
Closing as the smart api registry looks to be updated and everything else nopt marked as able to be skipped has been checked |
1. Build and load KG2c:
kg2integration
branch)kg2integration
branch)kg2c_lite_2.X.Y.json.gz
file to the translator-lfs-artifacts repo2. Rebuild downstream databases:
Copies of all of these should be put in
/data/orangeboard/databases/KG2.X.Y
on arax.ncats.io.config_local.json
, since we want it to be used overconfigv2.json
during testing/home/ubuntu/kg2-build/kg2c.dump
)NOTE: As databases are rebuilt, the new copy of
config_local.json
will need to be updated to point to their new paths. However, if the rollout of KG2 has already occurred, then you should update the masterconfigv2.json
directly.3. Update the ARAX codebase:
Associated code changes should go in the
kg2integration
branch.BiolinkHelper
uses the right version)config_local.json
- must locally setforce_local = True
inARAX_expander.py
to avoid using the old KG2 API)4. Do the rollout:
master
intokg2integration
kg2integration
intomaster
config_local.json
the new master config file on araxconfig.rtx.ai (rename it toconfigv2.json
)master
out to the various arax.ncats.io endpoints and delete theirconfigv2.json
s5. Final items/clean up:
config_local.json
on arax.ncats.io toconfig_local.json_FROZEN_DO-NOT-EDIT-FURTHER
(any additional edits to the config file should be made directly to the masterconfigv2.json
on araxconfig.rtx.ai going forward)kg_config.json
in themain
branch of the Plover repo to point to the newkg2c_lite_2.X.Y.json.gz
file (push this change)config_local.json
that points to it and locally setforce_local = True
in Expandconfigv2.json
on araxconfig.rtx.ai to point to this Plover endpoint (used by beta endpoints)kg2
endpoint'sconfigv2.json
to force it to download the new copy and then verify it's working correctly by running a queryThe text was updated successfully, but these errors were encountered: