Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validator is working again and ARAX results are not valid #2046

Open
edeutsch opened this issue Jun 1, 2023 · 20 comments
Open

Validator is working again and ARAX results are not valid #2046

edeutsch opened this issue Jun 1, 2023 · 20 comments

Comments

@edeutsch
Copy link
Collaborator

edeutsch commented Jun 1, 2023

The TRAPI validator is on-line again and showing all the TRAPI problems everyone has. We should address these as soon as we can so the validator is showing green for us. Example:
https://arax.ci.transltr.io/?r=140793

"error.knowledge_graph.edge.attribute.type_id.unknown": {
"biolink:original_edge_information": [
I think this means that we are adorning our edges with an attribute "biolink:original_edge_information" but this is not a biolink concept.

  1.   "biolink:support_graphs": [
    

Not much we can do about this. I will inquire about this

  1. "error.knowledge_graph.edge.predicate.unknown": {
    "biolink:decreases_activity_of": [
    I don't think this is a valid predicate any more?

  2. "biolink:directly_interacts_with": [
    {
    "edge_id": "NCBIGene:2353--biolink:directly_interacts_with->UniProtKB:P0DPQ6"
    }
    ],

  3.  "biolink:entity_negatively_regulates_entity": [
     {
       "edge_id": "FMA:84050--biolink:entity_negatively_regulates_entity->FMA:264827"
     },
     {
       "edge_id": "MESH:D014150--biolink:entity_negatively_regulates_entity->UMLS:C0597357"
     }
    

    ],

  4.  "biolink:entity_positively_regulates_entity": [
    
  5. "biolink:increases_activity_of": [

  6. "error.knowledge_graph.node.categories.not_concrete": {
    "FMA:82762": [
    {
    "categories": "['biolink:BiologicalEntity']"
    I think we're not supposed to use abstract classes?

Can we make a plan to address these 8 items?

@amykglen
Copy link
Member

amykglen commented Jun 5, 2023

I think 3 - 7 will be fixed with the XDTD for KG2.8.0.1c (the current XDTD is still running an old KG2 version).

For 1, we should try to pick out something better than biolink:original_edge_information. This attribute captures the KG2pre edge IDs that a given KG2c edge was created from (for debugging purposes)

I don't understand 8... what's the definition of an abstract class? BiologicalEntity is at depth 2 in the Biolink category tree...

@edeutsch
Copy link
Collaborator Author

edeutsch commented Jun 5, 2023

How about instead of biolink:original_edge_information, we change it to biolink:original_predicate (https://github.com/biolink/biolink-model/blob/ea800f98f41f6e42134011573a4ce60cd39a9151/biolink-model.yaml#L4989)

It is not quite the right concept, but maybe close enough? and will quiet the error?

@amykglen
Copy link
Member

amykglen commented Jun 5, 2023

sounds good to me! I just changed that in master

@edeutsch
Copy link
Collaborator Author

edeutsch commented Jun 6, 2023

thanks!

@chunyuma can you confirm that you are on a path to solving items 3-7?

@edeutsch
Copy link
Collaborator Author

edeutsch commented Jun 6, 2023

Regarding 8: BiologicalEntity is a abstract: true
https://github.com/biolink/biolink-model/blob/ea800f98f41f6e42134011573a4ce60cd39a9151/biolink-model.yaml#L6770

and I am thinking that this means that there should (must?) not be instances of it.

So I suppose the solution is to update KG2 to not use this category any more.
Should we do that?
Or ask for more clarification from Sierra et al?
Or raise a stink?

@edeutsch
Copy link
Collaborator Author

Steve will create an issue over in KG2 repo

@saramsey
Copy link
Member

RTXteam/RTX-KG2#286

@chunyuma
Copy link
Collaborator

Yes, the new version of xDTD can solve items 3-7 if KG2c v2.8.0.1 has updated those predicates.

@edeutsch
Copy link
Collaborator Author

great, thanks, let's leave this open until we confirm that Which Drugs May Treat Disease X from UI is showing ARAX results that pass all validation, hopefully as soon as we roll out the next xDTD

@saramsey
Copy link
Member

Eric where in the code base is the TRAPI validator?

@saramsey
Copy link
Member

Regarding 8: BiologicalEntity is a abstract: true https://github.com/biolink/biolink-model/blob/ea800f98f41f6e42134011573a4ce60cd39a9151/biolink-model.yaml#L6770

and I am thinking that this means that there should (must?) not be instances of it.

So I suppose the solution is to update KG2 to not use this category any more. Should we do that? Or ask for more clarification from Sierra et al? Or raise a stink?

Yes, we are in discussions with Sierra about whether it can be made abstract: false or whatever.

@saramsey
Copy link
Member

I think it will not be easy to eliminate use of biolink:BiologicalEntity without having a lot of nodes wind up with category biolink:NamedThing. I suppose with enough curation effort we could eliminate all the resulting NamedThing-ness, but it would take a lot of effort.

@saramsey
Copy link
Member

saramsey commented Jul 10, 2023

Eric, since Sierra has agreed that the use of biolink:BiologicalEntity in RTX-KG2 is reasonable and suggested that maybe we could continue using it (while changing the resulting validator error to a warning), perhaps in the meantime, until Biolink is changed or the reasoner validator is changed, we could add some code around these two lines of code:

validator.check_compliance_of_trapi_response(envelope)
validation_messages_text = validator.dumps()

to catch any validator error corresponding to BiologicalEntity and turn it into a warning? @ecwood and I could do the work. (at this point we are just brainstorming ideas; we could code it in a branch and test it out, with the understanding that the workaround could be chucked if you decide it is not the right approach).

@edeutsch
Copy link
Collaborator Author

I am thinking that editing the output of the validator dumps() and also the json output would be quite an extensive hack. I think it would be much better to add that exception to the validator itself.

Is this only for KG2, or is Sierra's thinking that using BiologicalEntity should be fine for anyone?

I think it would be best to start a discussion with Sierra and Richard and ask Sierra if she would approve of the validator being updated to include this exception.

What do you think?

@edeutsch
Copy link
Collaborator Author

(part of the reason that it would be an extensive hack is that editing validation_messages_text would be one tricky thing, but then line 319 gets the json version of the validator output and then decides on PASS, ERROR, FAIL based on the contents of the json, so that would need to be hacked, too. although that would be easier to hack than the text)

@ecwood
Copy link
Collaborator

ecwood commented Jul 10, 2023

Is this only for KG2, or is Sierra's thinking that using BiologicalEntity should be fine for anyone?
I think it would be best to start a discussion with Sierra and Richard and ask Sierra if she would approve of the validator being updated to include this exception.

Here is a snippet of what Sierra told Lili: RTXteam/RTX-KG2#286 (comment)

the long answer to your question is that no (from a "best practice" approach - we should really consider loosening the validation error in this case for September), using abstract or mixin classes as node categories is not advised.

@edeutsch
Copy link
Collaborator Author

are there any objections to asking Sierra & Richard about adding a formal exception to the validator?

@ecwood
Copy link
Collaborator

ecwood commented Jul 10, 2023

I don't object.

@saramsey
Copy link
Member

saramsey commented Jul 10, 2023

@edeutsch understood. Thank you. We will take up the issue with Sierra and Richard. I think there is a good chance we can eventually get an exception in there.

In the meantime, please let us know if at some point the errors become sufficiently burdensome/problematic that immediate workarounds (e.g., hacking response_cache or using NamedThing) are on the table.

@edeutsch
Copy link
Collaborator Author

great, thanks. Updating the validator can often be quite rapid when there is a clear path forward. I suggest we wait for that until such a time as it becomes clear that it won't happen quickly. Only then should we contemplate hacking the validator output, I feel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants