Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Programmatically use SmartAPI registry in Expand to find KP APIs #1466

Closed
amykglen opened this issue May 15, 2021 · 49 comments
Closed

Programmatically use SmartAPI registry in Expand to find KP APIs #1466

amykglen opened this issue May 15, 2021 · 49 comments

Comments

@amykglen
Copy link
Member

start using @edeutsch's new ARAXQuery/Expand/smartapi.py!

@amykglen amykglen self-assigned this May 15, 2021
@amykglen amykglen added this to Ready to start in Expand dev May 15, 2021
@amykglen amykglen assigned rcpeene and unassigned amykglen Apr 4, 2022
@rcpeene rcpeene moved this from Ready to start to Underway in Expand dev Apr 6, 2022
@rcpeene rcpeene moved this from Underway to Waiting in Expand dev Apr 27, 2022
@rcpeene
Copy link
Contributor

rcpeene commented Apr 29, 2022

While additions and alterations to smartapi.py may continue to be made as we identify a more clear use case, at the moment I have made some changes which return more information about trapi endpoints for more flexibility and convenience. Specifically, smartapi.py can be used to return all endpoints, all KPs (a subset of all endpoints), all dev KPs or all prod KPs (which are subsets of all KPs). These could be used for distinguishing endpoints for various purposes elsewhere in ARAX. Additionally, the functions which return these endpoints return more information, including the list of servers of that endpoint and the infores name. I will push these changes to a branch, Issue1466 shortly.

@rcpeene
Copy link
Contributor

rcpeene commented Jun 1, 2022

After receiving a formal standard set by Translator for maturity levels in the smartAPI registry I continued to implement new functionality in smartapi.py on branch issue1466.

Because not every TRAPI endpoint is standard in their smartapi registry entries, this code is only imperfectly able to return a list of endpoints and its relevant data. For instance, some KPs do not have infores names listed, some KPs have multiple registry entries, and some KPs have maturity levels not compliant with the four maturity levels specified by Translator.

However, this module does well enough at the moment, especially for manual usage. It has been implemented with a CLI for this purpose. Users may get a list of all trapi endpoints, just endpoints which support workflows, or just endpoints which are labeled as KPs. Additionally, these lists can be filtered via the use of a whitelist and blacklist. KPs can be further filtered by specifying a required maturity level with strict or non-strict settings.

We may implement automatic usage of this in some capacity in expand, such as a updating a cached list of KPs with certain parameters daily, but as of yet we haven't decided.

amykglen referenced this issue Jun 1, 2022
…ing of TRAPI endpoints and KPs usage is explained in the file
@rcpeene
Copy link
Contributor

rcpeene commented Jun 7, 2022

Some example CLI commands:

python3 smartapi.py get_trapi_endpoints
python3 smartapi.py get_trapi_endpoints -p
python3 smartapi.py get_operations_endpoints -p

python3 smartapi.py get_kps -p
python3 smartapi.py get_kps -p -m testing
python3 smartapi.py get_kps -p -m development
python3 smartapi.py get_kps -p -m development -f
python3 smartapi.py get_kps -p -m testing -f -i production testing staging development
python3 smartapi.py get_kps -p --whitelist infores:icees-dili infores:icees-asthma infores:icees-pcd
python3 smartapi.py get_kps -p -m production --blacklist infores:rtx-kg2

@rcpeene
Copy link
Contributor

rcpeene commented Jun 7, 2022

There should be sufficient usage documentation in the smartapi.py module, but as an extra measure I can describe the flags here. I believe @edeutsch and @isbluis intend to take a look at the module and provide feedback.

  • -p, --pretty is for pretty output. This also collates multiple smart api registry entries that have the same infores name
  • -m, --req_maturity specifies a required maturity level for the output. By default, the output contains only endpoints that have servers of the required maturity.
  • -f, --flexible, is to be used when a required maturity is given. This was @edeutsch's idea from a few weeks ago. If no servers are found for a KP with the required maturity, it will defer to the next best option, and then the option after that, etc. By default, the hierarchy is [development, staging, testing, production].
  • -i, --hierarchy, is to be used with -m and -f, and allows you to specify your own order for the hierarchy of four maturity levels
  • -w, --whitelist and -b, --blacklist take lists of infores names and will filter the output accordingly.

@amykglen
Copy link
Member Author

amykglen commented Jun 8, 2022

awesome, thanks @rcpeene! this view is great:

(arax3.9) amys-mbp:Expand amyglen$ python smartapi.py get_kps -p
infores name                              component maturities                              n_entries
infores:automat-biolink                   KP        production                              1   
infores:automat-chem-norm                 KP        production                              1   
infores:automat-cord19                    KP        production                              1   
infores:automat-covidkop                  KP        production                              1   
infores:automat-ctd                       KP        production                              1   
infores:automat-drug-central              KP        production                              1   
infores:automat-foodb                     KP        production                              1   
infores:automat-gtex                      KP        production                              1   
infores:automat-gtopdb                    KP        production                              1   
infores:automat-gwas-catalog              KP        production                              1   
infores:automat-hetio                     KP        production                              1   
infores:automat-hgnc                      KP        production                              1   
infores:automat-hmdb                      KP        production                              1   
infores:automat-human-goa                 KP        production                              1   
infores:automat-intact                    KP        production                              1   
infores:automat-mole-pro-fda              KP        production                              1   
infores:automat-ontology-hierarchy        KP        production                              1   
infores:automat-panther                   KP        production                              1   
infores:automat-pharos                    KP        production                              1   
infores:automat-robokop                   KP        production                              1   
infores:automat-text-mining-provider      KP        production                              1   
infores:automat-uberongraph               KP        production                              1   
infores:automat-viral-proteome            KP        production                              1   
infores:biothings-explorer                KP        staging, production, development        1   
infores:cam-kp                            KP        production                              2   
infores:cohd                              KP        production, development                 1   
infores:connections-hypothesis            KP        dev, production, staging                1   
infores:genetics-data-provider            KP        staging, development, production        3   
infores:icees-asthma                      KP        production, development                 2   
infores:icees-dili                        KP        production, development                 2   
infores:icees-pcd                         KP        production, development                 2   
infores:knowledge-collaboratory           KP        production                              1   
infores:molepro                           KP        production, development, test           1   
infores:monarchinitiative                 KP        production                              1   
infores:openpredict                       KP        production                              1   
infores:rtx-kg2                           KP        production, development                 2   
infores:spoke                             KP        production                              1   
infores:sri-ontology                      KP        production                              1   
infores:text-mining-provider-cooccurrence KP        development                             1   

so n_entries captures the number of separate SmartAPI registrations that that KP has for their various instances, right? (since technically KPs can either list multiple maturities under one registration or have a separate registration for each maturity)

@amykglen amykglen removed this from Waiting in Expand dev Jun 23, 2022
@rcpeene
Copy link
Contributor

rcpeene commented Jul 6, 2022

Yes, n_entries is the number of separate smartapi registry entries that exist with that infores name. This is only used when the -p argument is passed. Otherwise, the output just contains every individual smartapi registry entry without any such collating.

isbluis added a commit that referenced this issue Jul 13, 2022
@edeutsch
Copy link
Collaborator

I showed off this system (with Luis's front end) to everyone at the Architecture call and it was well received (some socks came clean off!) One suggestion was made:

suggestion -- can links to the SmartAPI registry entry be added? e.g., https://smart-api.info/registry?q=acca268be3645a24556b81cc15ce0e9a

@rcpeene are those URLs available in the query result from SmartAPI? Can you add those to your JSON output? (don't need it in the -p output)

thanks!

@rcpeene
Copy link
Contributor

rcpeene commented Jul 20, 2022

It took a bit of investigation, but I was able to figure out how to fetch the smartapi registry urls for endpoints. Now a new field is returned with the endpoint dict named "smartapi_url".

@edeutsch
Copy link
Collaborator

@isbluis
Copy link
Member

isbluis commented Jul 21, 2022

Forgot to tag the commit that includes this latest info in the UI: 9ad6f41
Thanks!

@amykglen
Copy link
Member Author

amykglen commented Aug 3, 2022

dynamically selecting KPs from SmartAPI is high priority per 8/3 AHM - and TRAPI 1.3 ARAX should only use TRAPI 1.3 KPs

we need to add to KPSelector in Expand to do this dynamic selection

@amykglen
Copy link
Member Author

amykglen commented Aug 4, 2022

so we don't forget: when we do dynamic KP selection we will also need to address:

  • our auto-documentation (we should no longer have a section per KP, but still give a 'latest' list of KPs, pulled when auto-documentation is generated)
  • our file with the list of KPs we use for the SRI Testing suite (currently hard-coded at RTX/code/ARAX/Documentation/arax_kps.json - should move to an auto-updating system, probably where we no longer store this file in our repo, but somewhere else)

@amykglen
Copy link
Member Author

note to @rcpeene / myself: remember to regenerate DSL_Documentation.md once work on this issue is done (before it's merged into master)

rcpeene added a commit that referenced this issue Aug 11, 2022
…martapi module now. Made small changes to trapi_querier.py and ARAX_expander.py to accomodate this. issue #1466
rcpeene added a commit that referenced this issue Aug 11, 2022
…cally generated list of kps in expand and refactored kp parameter checking. Issue #1466
@edeutsch
Copy link
Collaborator

okay, I think I have updated everything, how does this look:

image

image

What do you think?

@amykglen
Copy link
Member Author

nice, thanks! that looks good to me!

@amykglen
Copy link
Member Author

so arax.ci.transltr.io is now able to select KG2 as a KP successfully, but strangely, this error happens always at this same point in processing, without any message in the log:

Screen Shot 2022-08-18 at 8 58 14 PM

what's very odd is that no such error happens when running the same query locally, even when forcing a maturity of 'staging', like CI has. no idea what this could be... wondering if @edeutsch knows anything?

@edeutsch
Copy link
Collaborator

I'm guessing that the live stream of JSON objects (that provides the logging and traffic light data is corrupted somehow. If you can provide an exact repro, maybe @isbluis can easily tell us what the offending JSON object is and we can trace the source?

@amykglen
Copy link
Member Author

ah, ok. I see one query plan update message that it might possibly have been unhappy about - just pushed a change to that. I'll give it a few minutes to see if that happens to fix it, otherwise post back here with a repro. thanks!

@amykglen
Copy link
Member Author

huh, ok, it's still broken. the way I'm seeing the error is by going to https://arax.ci.transltr.io/ (which is running the current code in master), then running the standard JSON example acetaminophen query.

@edeutsch
Copy link
Collaborator

So the stream and processing just appears to end there...
that's why the UI is confused. Here's a repro.

@edeutsch
Copy link
Collaborator

cat > query_v1.3.json
{
  "stream_progress": true,
  "message": {
    "query_graph": {
   "edges": {
      "e00": {
         "subject":   "n00",
         "object":    "n01",
         "predicates": ["biolink:physically_interacts_with"]
      }
   },
   "nodes": {
      "n00": {
         "ids":        ["CHEMBL.COMPOUND:CHEMBL112"]
      },
      "n01": {
         "categories":  ["biolink:Protein"]
      }
   }
   }
  },
  "max_results": 100
}
^D

CONTENT=`cat query_v1.3.json`
ENDPOINT="https://arax.ci.transltr.io/api/arax/v1.3"

curl -s -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' -d "$CONTENT" $ENDPOINT/query

@edeutsch
Copy link
Collaborator

If you run that, maybe need to fix dialect, you should see this:

...
{"timestamp": "2022-08-19T04:45:44.102565", "level": "INFO", "code": "", "message": "The KPs Expand decided to answer e00 with are: {'infores:rtx-kg2'}"}
{"timestamp": "2022-08-19T04:45:44.102606", "level": "DEBUG", "code": "", "message": "Will use asyncio to run KP queries concurrently"}
{"timestamp": "2022-08-19T04:45:44.103025", "level": "INFO", "code": "", "message": "Expanding qedge e00 using infores:rtx-kg2"}
$

i.e. right after is says it wants to use rtx-kg2, then connection suddenly drops without another word. weird.

@edeutsch
Copy link
Collaborator

The UI is expected a bunch of status objects like that ultimately followed by the final TRAPI message. But there's not TRAPI message and so it act weird. Maybe @isbluis can take the opportunity to make it a little more robust in the face of this, but ultimately it is unexpected output.

@edeutsch
Copy link
Collaborator

@amykglen
Copy link
Member Author

amykglen commented Aug 19, 2022

ah, that's very useful! I think I might've spotted the problem. looks like something is being added to the query plan log with a key that is None, instead of the KP's infores curie

@amykglen
Copy link
Member Author

yep, that seems to have fixed it! arax.ci.transltr.io has finished rebuilding after my commit and now appears to be working.

@amykglen
Copy link
Member Author

quite the list of KPs in the query summary table now! https://arax.ci.transltr.io/?r=57135

@amykglen
Copy link
Member Author

I see infores:aragorn and infores:unsecret-agent are listed in the table, which I thought are only ARAs (not KPs). does the KP selector code currently search for KPs and ARAs, @rcpeene?

@amykglen
Copy link
Member Author

this also seems odd - ARAX apparently tries to query itself? though it says there isn't a TRAPI 1.3 endpoint for ARAX, which isn't true. wonder why? (@rcpeene)

Screen Shot 2022-08-18 at 10 56 18 PM

wonder if maybe the logic that creates the 'excluded_by_version' list is off? (and is somehow including ARAs in that list)

@isbluis
Copy link
Member

isbluis commented Aug 19, 2022

Hi @amykglen . Regarding infores being None, I also had to code around that in the UI. You can see that there is one entry that looks incomplete:
image

I ignore it, but maybe that's what is causing the issue?

Maybe we should inform the owners of that server registration, and add more checks on our end?

@amykglen
Copy link
Member Author

yeah, good idea, @isbluis!

so I think we're in a working state except cicd.rtx.ai is still unhappy; some tests are failing, apparently because it's having trouble saving responses?

Screen Shot 2022-08-18 at 11 03 00 PM

not sure if @edeutsch knows anything about this. you can see output for its latest run of pytests here, by expanding the 'Run tests with pytest' section

@amykglen
Copy link
Member Author

actually, I'm thinking that might resolve itself on cicd.rtx.ai's run for my latest commit (that fixed the None key issue in the response), which hasn't finished yet. will find out tomorrow I guess.

@amykglen
Copy link
Member Author

yep, looks like it resolved itself. awesome. so the main issue remaining I'm aware of is ARAs appearing as KPs (for when @rcpeene has a chance).

@rcpeene
Copy link
Contributor

rcpeene commented Aug 22, 2022

This bug with considering ARA's in smartapi.kps_excluded_by_version was me forgetting to implement a check in smartapi.py's get_trapi_endpoints method. It has been resolved and changes have been pushed.

@amykglen
Copy link
Member Author

amykglen commented Aug 22, 2022

looks great, thanks! (on https://arax.ci.transltr.io/)

@amykglen
Copy link
Member Author

are we good to close this issue, @rcpeene? everything seems to be working great as far as I can tell!

@rcpeene
Copy link
Contributor

rcpeene commented Aug 31, 2022

Yup!

@rcpeene rcpeene closed this as completed Aug 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

4 participants