Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

possible bug when using --xml option #61

Closed
goekce opened this issue Nov 30, 2020 · 6 comments
Closed

possible bug when using --xml option #61

goekce opened this issue Nov 30, 2020 · 6 comments

Comments

@goekce
Copy link

goekce commented Nov 30, 2020

I noticed that if --xml is used after the XPath expression, then it is not applied. Is this a feature or a bug?

$ wget https://raw.githubusercontent.com/DiseaseOntology/HumanDiseaseOntology/main/src/ontology/doid.owl
$ xidel -se "//rdfs:label[text()='malignant hyperthermia']/../rdfs:subClassOf" doid.owl --xml
<?xml version="1.0" encoding="UTF-8"?>
<xml>



            
                
                
            
        
</xml>

$ xidel --xml -se "//rdfs:label[text()='malignant hyperthermia']/../rdfs:subClassOf" doid.owl
<?xml version="1.0" encoding="UTF-8"?>
<xml>
<rdfs:subClassOf xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.obolibrary.org/obo/doid.owl#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:so="http://purl.obolibrary.org/obo/so#" xmlns:obo="http://purl.obolibrary.org/obo/" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" xmlns:doid="http://purl.obolibrary.org/obo/doid#" xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:terms="http://purl.org/dc/terms/" xmlns:oboInOwl="http://www.geneontology.org/formats/oboInOwl#" rdf:resource="http://purl.obolibrary.org/obo/DOID_0050736"/>
<rdfs:subClassOf xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.obolibrary.org/obo/doid.owl#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:so="http://purl.obolibrary.org/obo/so#" xmlns:obo="http://purl.obolibrary.org/obo/" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" xmlns:doid="http://purl.obolibrary.org/obo/doid#" xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:terms="http://purl.org/dc/terms/" xmlns:oboInOwl="http://www.geneontology.org/formats/oboInOwl#" rdf:resource="http://purl.obolibrary.org/obo/DOID_66"/>
<rdfs:subClassOf xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns="http://purl.obolibrary.org/obo/doid.owl#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:so="http://purl.obolibrary.org/obo/so#" xmlns:obo="http://purl.obolibrary.org/obo/" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" xmlns:doid="http://purl.obolibrary.org/obo/doid#" xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:terms="http://purl.org/dc/terms/" xmlns:oboInOwl="http://www.geneontology.org/formats/oboInOwl#">
            <owl:Restriction>
                <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/IDO_0000664"/>
                <owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/GENO_0000147"/>
            </owl:Restriction>
        </rdfs:subClassOf>
</xml>
@Reino17
Copy link

Reino17 commented Dec 1, 2020

$ wget https://[...]/doid.owl

If you're using xidel, then there's no reason to use wget:

xidel -s "https://[...]/doid.owl" -e '$raw' > doid.owl
#or
xidel -s "https://[...]/doid.owl" --download=.
$ xidel -se "//rdfs:label[text()='malignant hyperthermia']/../rdfs:subClassOf" doid.owl --xml
  • This use of quotes is for Windows. Since you're on Unix please use -e '//rdfs:label[text()="malignant hyperthermia"]/../rdfs:subClassOf'.
  • Although not mentioned in xidel --help, 'readme.txt', or anywhere else, it's adviced to put the extraction-query behind the input, especially when you specify multiple inputs (xidel -s <file> -e '[...]' <url> -e '[...]'). And I also specify global options, like -s, always first. So I would enter:
xidel -s --xml doid.owl -e '//rdfs:label[text()="malignant hyperthermia"]/../rdfs:subClassOf'

Benito has to determine whether this is a bug or not.
For what it's worth, with --input-format=xml --output-format=xml at the end...

xidel -s doid.owl -e '//rdfs:label[text()="malignant hyperthermia"]/../rdfs:subClassOf' --input-format=xml --output-format=xml

...xidel does return the node's content.

@benibela
Copy link
Owner

benibela commented Dec 1, 2020

Generally options should be on the left side of the thing they modify.

That way you can have multiple queries with different options:

xidel --printed-node-format xml -se "//rdfs:label[text()='malignant hyperthermia']/../rdfs:subClassOf" --printed-node-format text -e "//rdfs:label[text()='malignant hyperthermia']/../@rdf:about"  doid.owl

Or multiple inputs:

xidel --input-format xml doid.owl --input-format html http://example.org -e '$url'

Some options are rearranged if they are in the wrong order. Looks like --xml is missing there

For what it's worth, with --input-format=xml --output-format=xml at the end xidel does return the node's content.

Now that is weird, since --xml should be the same as those two

@goekce
Copy link
Author

goekce commented Dec 1, 2020

For what it's worth, with --input-format=xml --output-format=xml at the end...

xidel -s doid.owl -e '//rdfs:label[text()="malignant hyperthermia"]/../rdfs:subClassOf' --input-format=xml --output-format=xml

...xidel does return the node's content.

I assume node's content means all the XML nodes and not the text part. I can confirm this behavior.

Although not mentioned in xidel --help, 'readme.txt', or anywhere else, it's adviced to put the extraction-query behind the input, especially when you specify multiple inputs (xidel -s <file> -e '[...]' <url> -e '[...]'). And I also specify global options, like -s, always first. So I would enter:

xidel -s --xml doid.owl -e '//rdfs:label[text()="malignant hyperthermia"]/../rdfs:subClassOf'

Thank you, my actual problem is solved then. I must use --xml in front of the corresponding expression.

I leave the issue open for the For what it's worth issue ;)


The rest is bit OT, but I am grateful for @Reino17 's elaborate feedback, so:

$ wget https://[...]/doid.owl

If you're using xidel, then there's no reason to use wget:

xidel -s "https://[...]/doid.owl" -e '$raw' > doid.owl
#or
xidel -s "https://[...]/doid.owl" --download=.

doid.owl or other ontologies can get about 100 MB, and xidel does not cache. That is the reason why I use wget. But I did not know about --download=, thanks!

$ xidel -se "//rdfs:label[text()='malignant hyperthermia']/../rdfs:subClassOf" doid.owl --xml

This use of quotes is for Windows. Since you're on Unix please use -e '//rdfs:label[text()="malignant hyperthermia"]/../rdfs:subClassOf'.

I use xidel extensively in scripts. Using your style I cannot write something like:

ARG='malignant hyperthermia'
xidel -s doid.owl -e '//rdfs:label[text()=$ARG]/..'

That is the reason why I use this style.

@Reino17
Copy link

Reino17 commented Dec 1, 2020

With Windows quoting style that won't work either:

$ xidel -s doid.owl -e "//rdfs:label[text()=$ARG]/../@rdf:about"
Error:
err:XPST0003: Unknown or unexpected operator: "hyperthermia" (possibly missing comma "," or closing parentheses ")}]" )
in line 1 column 21
//rdfs:label[text()=malignant hyperthermia]/../@rdf:about
                    ^^^^^^^^^^^  error occurs around here

You'd always have to quote the external variable:

-e "//rdfs:label[text()='$ARG']/../@rdf:about"   # Windows quoting style
#or
-e '//rdfs:label[text()="'$ARG'"]/../@rdf:about'   # Unix quoting style

@benibela
Copy link
Owner

benibela commented Dec 2, 2020

I use xidel extensively in scripts. Using your style I cannot write something like:

ARG='malignant hyperthermia'
xidel -s doid.owl -e '//rdfs:label[text()=$ARG]/..'

There is a special option for that:

export ARG='malignant hyperthermia'
xidel --variable ARG -s doid.owl -e '//rdfs:label[text()=$ARG]/..'


@benibela
Copy link
Owner

For what it's worth, with --input-format=xml --output-format=xml at the end xidel does return the node's content.
xidel -s doid.owl -e '//rdfs:label[text()="malignant hyperthermia"]/../rdfs:subClassOf' --input-format=xml --output-format=xml

Now that is weird, since --xml should be the same as those two

You had me confused there. --xml is indeed always the same as those two. You got the right output, because you put doid.owl at the beginning

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants