-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 73 - make axioms.owl update #93
Conversation
I know very little about OBO format but here are some notes on ROBOT that I hope are helpful. ROBOT is designed for OWL and almost every ROBOT operation uses OWLAPI to load ontologies. The main exception is OWLAPI has code for reading/writing OBO format, which is what ROBOT uses. Like me, ROBOT knows very little about OBO format. Loading NCIT into OWLAPI or Jena is never going to be as fast as scanning through an OBO format file as text. As an aside, we are starting to play with a streaming XML processor for MIREOT, and handling The OBO STRUCTURE ERROR is thrown by the OBO format code in OWLAPI, but http://robot.obolibrary.org/errors#obo-structure-error |
@jamesaoverton |
I assume this is in the context of a Makefile Note that in the general case you can use make commands like this
It's a bit more verbose but not a big deal But that is often not necessary
No need to guess, the docs are clear: http://robot.obolibrary.org/merge
This will work with It may be more unixesque to accept open ended lists of files as final argument but I'm happy with existing robot idioms, and introducing changes now would be disruptive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for a my-Makefile. Be bold! You own the Makefile, make the changes there!
Regarding the speed issues in sparql queries. @balhoff has done some investigation here, and has made changes to the GO pipeline to speed things up by avoiding multiple invocations (the speed issue is frequently to do with initial load using owlapi, followed by conversion to Jena model, as James indicates). I know you have also looked into the TDB option of ROBOT and found that helped. We've also discussed whether other triplestores, embedded or as a server, would help here. We can imagine this being orchestrated by ROBOT, as per the TDB option. We can also imagine workflows where a triplestore is loaded from multiple sources at the outset with subsequent queries happening on that. The beauty is the same SPARQL queries will work either way! |
When ROBOT gets ready to answer SPARQL queries, it first converts the ontology to a Jena |
@cmungall ROBOT calling an arbitrary SPARQL endpoint is worth discussing, but I'm not 100% convinced yet. Maybe a shared Python script would be just as good. @balhoff Yes, by default ROBOT loads the ontology with OWLAPI, then converts to a Turtle string and loads into Jena (then converts back to OWLAPI after SPARQL Update). This is slow but allows for reading any OWL format and chaining. But with the |
It may be possible to speed up the model creation slightly by avoiding the intermediate string, but adding to the dataset takes by far the most time, and it is a single method call into Jena. I'm not sure if there are any options for speeding that up. This is the slow part: https://github.com/ontodev/robot/blob/cca7343b5c17a29fc99b5e3ac81b75c9d73158a1/robot-core/src/main/java/org/obolibrary/robot/QueryOperation.java#L103 |
@cmungall |
@jamesaoverton @balhoff |
Ran the following in my makefile to merge three ontologies:
The purpose of the first three commands was to check if the robot would read and output the obo files. However, when I run the
I also tested running:
and received the same error. Any thoughts on how to address this? Going to stick with obo-cat.pl approach for now. |
@wdduncan That is weird, and I can't think of a workaround off the top of my head that doesn't involve just working with Regarding your previous comment: One call to |
Maybe
OBO only allows one |
I've removed a lot from the makefile. Please let me know what I need to add back in. @cmungall help? |
format-version: 1.2 | ||
ontology: rad/axioms | ||
|
||
[Term] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, a chemical entity should never be equivalent to an exposure to a chemical entity. We should filter these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that need to be addressed before this PR can be merged?
@wdduncan - I moved my-Makefile to Makefile but it seems to be have syntax errors:
|
I update the Makefile in master to be the correct one. |
To test making the axioms.owl target, I have created the file
my-Makefile
. It contains a subset of the originalMakefile
. It is run using the-f
flag:make -f my-Makefile axioms.owl
Current targets in
my-Makefile
are:Targets that still need to be re-worked are:
I tried to use
robot
where I could but encountered difficulties.Issue 1: memory.
Building exposure-ncit.obo had memory issues.
Using the query:
I got
robot query
to work by upping the memory to 16G:However, this was very slow (over 5 minutes).
Using
obo-grep.pl
andobo-strip.pl
was much faster:For building exposure.obo,
robot extract
worked fine, but usingobo-grep.pl
was better for filtering the obo file:Issue 2: obo format error*
When I used
robot merge
on the obo files:it threw the error:
So instead, I used
obo-cat.pl
:Not sure of the best way to address this. Plus, it is nice to call
obo-cat.pl
with$^
. Maybe a similar feature can be added to robot? I didn't test this, but maybe--inputs
will work with"file1 file2"
. The documentation specifies a regular expression. So maybe this would work:--inputs "file1|file2"
.