Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throw an error when filter_kg removes all nodes of a particular qnode_id #845

Closed
chunyuma opened this issue Jun 20, 2020 · 10 comments
Closed
Assignees
Labels

Comments

@chunyuma
Copy link
Collaborator

chunyuma commented Jun 20, 2020

@amykglen, I found a DSL that makes me feel there might be a potential bug in add_qedge(), expand() or resultify().

Here is the DSL command:

add_qnode(curie=DOID:14330,id=n00,type=disease)
add_qnode(type=phenotypic_feature,is_set=true,id=n01)
add_qedge(source_id=n00,target_id=n01,id=e00,type=has_phenotype)
expand(edge_id=e00,kp=ARAX/KG1)
overlay(action=fisher_exact_test,source_qnode_id=n00,target_qnode_id=n01,virtual_relation_label=FET1,rel_edge_id=e00)
filter_kg(action=remove_edges_by_attribute,edge_attribute=fisher_exact_test_p-value,direction=above,threshold=0.0001,remove_connected_nodes=t,qnode_id=n01)
add_qnode(type=disease,id=n02)
add_qedge(source_id=n01,target_id=n02,id=e01,type=has_phenotype)
expand(edge_id=e01,kp=ARAX/KG1)
resultify()

The problem here is that in filter_kg step, it filtered out all the n01 nodes based on the cutoff 0.0001, then the expand should not be able to expand the edges between n01 and n02 because there is no n01 for expanding. I found that no error was reported but it just got stuck in somewhere. I think in this case it should throw out an error.

@chunyuma chunyuma added the bug label Jun 20, 2020
@amykglen
Copy link
Member

amykglen commented Jun 21, 2020

interesting case. indeed, the second expand call is taking forever because it's trying to do a general query for (phenotypic_feature)--(disease) because there are no curies to use for either of those qnodes (since no n01 nodes remain in the KG), and this would produce massive results.

the slightly difficult part about having expand throw an error here is that expand can't see what expand calls have already been made. its only way of determining which edges have already been expanded is to look at the KG and see what's in it. but that won't work in this situation since there may be no trace of the prior expansion in the KG - and so expand couldn't know to throw an error (it's just obeying instructions to expand an edge in a valid query graph).

I can think of two options here:

  1. prohibit expand from doing a general expansion like that (where neither qnode has curie(s) involved)
  2. have filter_kg throw an error if it removes ALL of the nodes/edges of a particular qnode_id/qedge_id from the KG

@amykglen amykglen changed the title BUG: if no source node was provided, something wrong happened in add_qedge(), expand() or resultify() Throw an error when filter_kg removes all nodes of a particular qnode_id Jun 22, 2020
@amykglen
Copy link
Member

amykglen commented Jun 22, 2020

I think I would vote for option 2, as that seems more precise. and I can't think of a situation when we would want execution to keep going if filter_kg removed all nodes of a particular qnode_id from the KG... (since that effectively creates a 'no paths found matching your query graph' situation). though maybe others know better?

@chunyuma
Copy link
Collaborator Author

I agree. I think it would be more reasonable to let filter_kg throw an error.

@edeutsch
Copy link
Collaborator

I agree. If there are no nodes for qnode_id, resultify() will claim 0 results.

@amykglen
Copy link
Member

cool! I will assign this to @finnagin then.

@edeutsch
Copy link
Collaborator

But it also seems to me that we don't want to let expand do an expand step where neither end has a curie? I mean it's okay if the query_graph has no curie specified, but if other expansion doesn't narrow it down, then it probably shouldn't be permitted?

e.g., a query graph of:
n01(type=protein)--[e01]--n02(type=disease)
shouldn't cause expand to expand this whole thing out, should it? I mean it could, but we wouldn't want to let it, would we?

but a query_graph of:
n00(DOID:12342)--[e00]--n01(type=protein)--[e01]--n02(type=disease)
would be okay because expansion of e00 would give curies to n01.

Side wonderment:
If there a difference between:
expand(edge_id=[e00,e01], kp=ARAX/KG1)
and
expand(edge_id=[e01,e00], kp=ARAX/KG1)
?
Does expand always do e00 first because it seems more sensible?
Or would the latter cause and all vs. all expansion only to be trimmed later?

@amykglen
Copy link
Member

yeah - I can add some code to prohibit expand from expanding an edge where neither end has a curie (whether input or derived). I can create a separate issue for that. (I can't really think of someone wanting to do such a query... maybe there's a tiny chance someone might want to once we have the ability to expand on only certain provided by's... e.g., 'give me all the (chemical_substance)-[contraindicated_for]-(disease) triples you know about from MyChem.info'?)

re: your side wonderment: yes, expand would always do e00 first, because it knows to start with an edge that has a curie specified and proceed onwards from there.

@saramsey
Copy link
Member

If ARAX_resultify.py should return zero results for the case where we have a non-empty KG but for a specific query node there are no corresponding KG nodes (and that would make sense given issue #692), I think we should (1) open a new issue requesting that behavior of Resultify and (2) write a unit test specifically to test that behavior.

@amykglen
Copy link
Member

I think the group agreed that that is the desired behavior for resultify in #774 (see this comment from David).

I can create an issue to request that from resultify if you'd like.

@saramsey
Copy link
Member

saramsey commented Jul 7, 2020

finn reports that he has implemented the necessary error-reporting code in filter, in his local code enlistment; just waiting on having the cycles to push it

finnagin added a commit that referenced this issue Jul 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants