Issue 73 - make axioms.owl update #93

wdduncan · 2020-03-28T17:55:27Z

To test making the axioms.owl target, I have created the file my-Makefile. It contains a subset of the original Makefile. It is run using the -f flag: make -f my-Makefile axioms.owl

Current targets in my-Makefile are:

exposure-xco.obo
exposure-zeco.obo
exposure-peco.obo
- In your comments, you say that this is now called pheco, but I couldn't find that ontology. Also, this has replaced eo.
exposure-ncit.obo
exposure-go.obo
exposure-exo.obo
exposure-ecto.obo
slim-exposure.obo
axioms.owl

Targets that still need to be re-worked are:

probs.tsv (planning to focus on this next)
exposure-mesh.obo
exposure-snomed.obo
exposure-mre.obo (what is mre?)
access-ncit.obo
exposure-wikidata.obo

I tried to use robot where I could but encountered difficulties.
Issue 1: memory.
Building exposure-ncit.obo had memory issues.
Using the query:

prefix xsd: <http://www.w3.org/2001/XMLSchema#> 
select ?id 
where
	{
		?id ?p ?o .
		filter ( !isblank(?id) )
		filter ( datatype(?o) = xsd:string )
		filter ( contains(?o, "xposure") )
	}

I got robot query to work by upping the memory to 16G:

export ROBOT_JAVA_ARGS=-Xmx16G && robot query \
	  -i tmp-in.obo \
	  -q xposure.sparql tmp-out.obo

However, this was very slow (over 5 minutes).

Using obo-grep.pl and obo-strip.pl was much faster:

obo-grep.pl -r ".*xposure.*" tmp-in.obo > tmp-xposure.obo # find obo stanzas containing *xposure*. 
obo-strip.pl -t id tmp-xposure.obo | grep id: | cut -f2 -d' ' | sed 's/NCIT:/http:\/\/purl\.obolibrary\.org\/obo\/NCIT_/g' > tmp-ids

## run robot filter to get any subclasses of the NCIT classes
robot filter -i tmp-xposure.obo --term-file tmp-ids --trim false --select "self children" -o tmp-out.obo

For building exposure.obo, robot extract worked fine, but using obo-grep.pl was better for filtering the obo file:

robot extract --method top -i tmp-in.obo -t GO:0050896 -o tmp-out.obo
obo-grep.pl -r 'name: response to' tmp-out.obo > tmp-go.obo

Issue 2: obo format error*
When I used robot merge on the obo files:

robot merge \
	  -i exposure-zeco.obo \
	  -i exposure-peco.obo \
	  -i exposure-exo.obo \
	  -i exposure-xco.obo \
	  -i exposure-ecto.obo \
	  -o tmp.obo

it threw the error:

OBO STRUCTURE ERROR Ontology does not conform to OBO structure rules:
multiple def tags not allowed. in frame:Frame(ENVO:00010483 id( ENVO:00010483)synonym( portion of environmental material EXACT)name( environmental material)namespace( plant_experimental_conditions_ontology)def( A portion of environmental material is a fiat object part which forms the medium or part of the medium of an environmental system.[DOI:10.1186/2041-1480-4-43 MA:ma ORCID:0000-0002-4366-3088 URL:http://ontology.buffalo.edu/smith/articles/niches.html ])def( A portion of environmental material is a fiat object which forms the medium or part of the medium of an environmental system.[DOI:10.1186/2041-1480-4-43 MA:ma ORCID:0000-0002-4366-3088 URL:http://ontology.buffalo.edu/smith/articles/niches.html ])is_a( BFO:0000040)is_a( BFO:0000024))

So instead, I used obo-cat.pl:

obo-cat.pl $^ | grep -v ^namespace: | grep -v ^property_value: | ./add-syns.pl > $@

Not sure of the best way to address this. Plus, it is nice to call obo-cat.pl with $^. Maybe a similar feature can be added to robot? I didn't test this, but maybe --inputs will work with "file1 file2". The documentation specifies a regular expression. So maybe this would work: --inputs "file1|file2".

…exposure-ontology into issue-73

jamesaoverton · 2020-03-30T12:38:27Z

I know very little about OBO format but here are some notes on ROBOT that I hope are helpful.

ROBOT is designed for OWL and almost every ROBOT operation uses OWLAPI to load ontologies. The main exception is robot query which primarily uses Apache Jena. OWLAPI is pretty heavy.

OWLAPI has code for reading/writing OBO format, which is what ROBOT uses. Like me, ROBOT knows very little about OBO format. Loading NCIT into OWLAPI or Jena is never going to be as fast as scanning through an OBO format file as text. As an aside, we are starting to play with a streaming XML processor for MIREOT, and handling ncit.owl is a use case: ontodev/robot#650

The OBO STRUCTURE ERROR is thrown by the OBO format code in OWLAPI, but robot convert has a --check flag to ignore that error:

http://robot.obolibrary.org/errors#obo-structure-error
http://robot.obolibrary.org/convert#converting-to-obo-format

wdduncan · 2020-03-31T01:37:56Z

@jamesaoverton
Thanks for the info about --check false.

cmungall · 2020-03-31T16:11:14Z

Plus, it is nice to call obo-cat.pl with $^

I assume this is in the context of a Makefile

Note that in the general case you can use make commands like this $(patsubst %,-i %,%^)

    robot COMMAND $(patsubst %,-i %,%^) OTHER_ARGS

It's a bit more verbose but not a big deal

But that is often not necessary

I didn't test this, but maybe --inputs will work with "file1 file2". The documentation specifies a regular expression. So maybe this would work: --inputs "file1|file2".

No need to guess, the docs are clear:

http://robot.obolibrary.org/merge

You can also specify merging of multiple files that match a pattern with --inputs. The argument to --inputs must be a quoted wildcard pattern. This option supports ? to match any single character, or * to match any number of characters.

This command will merge edit.owl and edit2.owl:

robot merge --inputs "edit*.owl" --output results/merged2.owl

This will work with $^ in Makefiles

It may be more unixesque to accept open ended lists of files as final argument but I'm happy with existing robot idioms, and introducing changes now would be disruptive.

cmungall

No need for a my-Makefile. Be bold! You own the Makefile, make the changes there!

cmungall · 2020-03-31T16:18:18Z

Regarding the speed issues in sparql queries. @balhoff has done some investigation here, and has made changes to the GO pipeline to speed things up by avoiding multiple invocations (the speed issue is frequently to do with initial load using owlapi, followed by conversion to Jena model, as James indicates).

I know you have also looked into the TDB option of ROBOT and found that helped. We've also discussed whether other triplestores, embedded or as a server, would help here. We can imagine this being orchestrated by ROBOT, as per the TDB option. We can also imagine workflows where a triplestore is loaded from multiple sources at the outset with subsequent queries happening on that. The beauty is the same SPARQL queries will work either way!

balhoff · 2020-03-31T17:26:39Z

When ROBOT gets ready to answer SPARQL queries, it first converts the ontology to a Jena Model, then adds the model to a Jena Dataset. On my laptop, for the GO build, the first part takes ~30 seconds, then the second step takes ~225 seconds. So it is important not to do this repeatedly if you are concerned about time.

jamesaoverton · 2020-03-31T17:51:10Z

@cmungall ROBOT calling an arbitrary SPARQL endpoint is worth discussing, but I'm not 100% convinced yet. Maybe a shared Python script would be just as good.

@balhoff Yes, by default ROBOT loads the ontology with OWLAPI, then converts to a Turtle string and loads into Jena (then converts back to OWLAPI after SPARQL Update). This is slow but allows for reading any OWL format and chaining. But with the robot query --tdb true flag we try to be smarter and load directly into Jena. If we can do it faster, let me know.

balhoff · 2020-03-31T17:56:54Z

@jamesaoverton

If we can do it faster, let me know.

It may be possible to speed up the model creation slightly by avoiding the intermediate string, but adding to the dataset takes by far the most time, and it is a single method call into Jena. I'm not sure if there are any options for speeding that up.

This is the slow part: https://github.com/ontodev/robot/blob/cca7343b5c17a29fc99b5e3ac81b75c9d73158a1/robot-core/src/main/java/org/obolibrary/robot/QueryOperation.java#L103

wdduncan · 2020-03-31T18:20:09Z

@cmungall
I will edit make file with new steps.

wdduncan · 2020-03-31T18:26:41Z

@jamesaoverton @balhoff
From the discussion I'm not sure if sure if robot query --tdb true would help with loading and querying ncit. Originally, I had hoped the ontology could be queried just referencing the purl and not having to go through a download step.
I agree that loading robot multiple times is not efficient. The largest ontologies that will need to be queried are ncit, go, and (eventually) snomed. Perhaps I could put these into tdb once and then query when the appropriate goals are triggered in the Makefile.

wdduncan · 2020-03-31T20:56:49Z

@jamesaoverton

Ran the following in my makefile to merge three ontologies:

test-robot: exposure-peco.obo  exposure-xco.obo  exposure-zeco.obo
	robot convert -i exposure-peco.obo -o exposure-peco-tmp.obo
	robot convert -i exposure-xco.obo -o exposure-xco-tmp.obo
	robot convert -i exposure-zeco.obo -o exposure-zeco-tmp.obo


	mv exposure-peco-tmp.obo exposure-peco.obo
	mv exposure-xco-tmp.obo exposure-xco.obo
	mv exposure-zeco-tmp.obo exposure-zeco.obo

	robot merge $(patsubst %,-i %, $^) -o merge-test.obo

The purpose of the first three commands was to check if the robot would read and output the obo files.
This runs fine.

However, when I run the robot merge $(patsubst %,-i %, $^) -o merge-test.obo, I receive the error:

OBO STRUCTURE ERROR Ontology does not conform to OBO structure rules:
multiple def tags not allowed. in frame:Frame(ENVO:00010483 id( ENVO:00010483)synonym( portion of environmental material EXACT)name( environmental material)namespace( plant_experimental_conditions_ontology)def( A portion of environmental material is a fiat object part which forms the medium or part of the medium of an environmental system.[DOI:10.1186/2041-1480-4-43 MA:ma ORCID:0000-0002-4366-3088 URL:http://ontology.buffalo.edu/smith/articles/niches.html ])def( A portion of environmental material is a fiat object which forms the medium or part of the medium of an environmental system.[DOI:10.1186/2041-1480-4-43 MA:ma ORCID:0000-0002-4366-3088 URL:http://ontology.buffalo.edu/smith/articles/niches.html ])is_a( BFO:0000040)is_a( BFO:0000024))

I also tested running:

robot merge -i exposure-peco.obo -i exposure-xco.obo -i exposure-zeco.obo -o merge-test.obo

and received the same error. Any thoughts on how to address this?

Going to stick with obo-cat.pl approach for now.

jamesaoverton · 2020-03-31T21:09:22Z

@wdduncan That is weird, and I can't think of a workaround off the top of my head that doesn't involve just working with .owl files. I'm really not the guy to ask about OBO format stuff.

Regarding your previous comment: One call to robot query can run multiple SPARQL queries. If you can't run all your queries at once, then yes it should be much more efficient to reuse TDB directories.

matentzn · 2020-04-01T06:51:05Z

Maybe --check false works on merge as well? Or, if not, sed the culprits from OBO files. Thats what I usually do if I need valid OBO (the error pertains to the fact that you can gave

def: "Definition 1"
def: "Definition 2"

OBO only allows one def: tag, so the other needs to be removed. Ah, and one alternative is to merge them into owl, and then use convert --check false to transform back into OBO as a seconds step.

wdduncan · 2020-04-02T22:04:41Z

I've removed a lot from the makefile. Please let me know what I need to add back in.
Also, the makefile now includes a snomed mapping, but the this info isn't being picked up by rdfmatch.

@cmungall help?

cmungall · 2020-04-02T23:00:09Z

src/mapping/axioms.obo

-format-version: 1.2
-ontology: rad/axioms
-
-[Term]


hmm, a chemical entity should never be equivalent to an exposure to a chemical entity. We should filter these

Does that need to be addressed before this PR can be merged?

cmungall · 2020-04-09T21:52:33Z

@wdduncan - I moved my-Makefile to Makefile but it seems to be have syntax errors:

$ make slim-exposure.obo
Makefile:118: *** missing separator.  Stop.

wdduncan · 2020-04-10T00:31:33Z

I update the Makefile in master to be the correct one.

wdduncan added 11 commits March 26, 2020 12:54

update makefile

8b1816b

remove axioms.owl

63ad0ff

add custom makefile for testing

6894581

update make for building exposure-go and exposure-ncit

90c2f04

update makefile

3db56e2

remove axioms.owl

0b11542

add custom makefile for testing

c3d1a55

update make for building exposure-go and exposure-ncit

7e87a82

add sparql to find 'xposure' strings

e90839f

Merge branch 'issue-73' of https://github.com/wdduncan/environmental-…

86d6db0

…exposure-ontology into issue-73

add sparql query

20d0bfb

wdduncan requested a review from cmungall March 28, 2020 17:55

wdduncan added 2 commits March 30, 2020 21:26

add shacl declarations that need to be inserted for rdfmatch to run

253c148

update my-Makefile to generate probs.tsv and run kboom

f9ed269

cmungall requested changes Mar 31, 2020

View reviewed changes

wdduncan added 2 commits April 2, 2020 17:59

add script file for building snomed.obo for snomed.obo.zip file

9b6ed0c

add snomed mapping to makefile

4c60276

cmungall approved these changes Apr 2, 2020

View reviewed changes

cmungall reviewed Apr 2, 2020

View reviewed changes

add slim-exposure.owl target to Makefile

dc39a3d

cmungall merged commit 4c7ce71 into EnvironmentOntology:master Apr 9, 2020

cmungall added a commit that referenced this pull request Apr 9, 2020

Fixed syntax error: #93 (comment)

0fd2377

wdduncan deleted the issue-73 branch April 10, 2020 00:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue 73 - make axioms.owl update #93

Issue 73 - make axioms.owl update #93

wdduncan commented Mar 28, 2020

jamesaoverton commented Mar 30, 2020

wdduncan commented Mar 31, 2020

cmungall commented Mar 31, 2020

cmungall left a comment

cmungall commented Mar 31, 2020

balhoff commented Mar 31, 2020

jamesaoverton commented Mar 31, 2020

balhoff commented Mar 31, 2020 •

edited

Loading

wdduncan commented Mar 31, 2020

wdduncan commented Mar 31, 2020

wdduncan commented Mar 31, 2020

jamesaoverton commented Mar 31, 2020 •

edited

Loading

matentzn commented Apr 1, 2020

wdduncan commented Apr 2, 2020

cmungall Apr 2, 2020

nlharris Apr 8, 2020

cmungall commented Apr 9, 2020

wdduncan commented Apr 10, 2020

Issue 73 - make axioms.owl update #93

Issue 73 - make axioms.owl update #93

Conversation

wdduncan commented Mar 28, 2020

jamesaoverton commented Mar 30, 2020

wdduncan commented Mar 31, 2020

cmungall commented Mar 31, 2020

cmungall left a comment

Choose a reason for hiding this comment

cmungall commented Mar 31, 2020

balhoff commented Mar 31, 2020

jamesaoverton commented Mar 31, 2020

balhoff commented Mar 31, 2020 • edited Loading

wdduncan commented Mar 31, 2020

wdduncan commented Mar 31, 2020

wdduncan commented Mar 31, 2020

jamesaoverton commented Mar 31, 2020 • edited Loading

matentzn commented Apr 1, 2020

wdduncan commented Apr 2, 2020

cmungall Apr 2, 2020

Choose a reason for hiding this comment

nlharris Apr 8, 2020

Choose a reason for hiding this comment

cmungall commented Apr 9, 2020

wdduncan commented Apr 10, 2020

balhoff commented Mar 31, 2020 •

edited

Loading

jamesaoverton commented Mar 31, 2020 •

edited

Loading