Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDF issues testing 4.1.0 RC 1 #412

Closed
msalvadores opened this issue May 23, 2015 · 18 comments
Closed

RDF issues testing 4.1.0 RC 1 #412

msalvadores opened this issue May 23, 2015 · 18 comments
Milestone

Comments

@msalvadores
Copy link

I have an owlapi wrapper that translates different ontology formats to OWL/RDF. I recently pull the latest code. had to add the following to my pom.xml to avoid some RDFFormat not found class exception.

</dependency>
        <dependency> 
        <groupId>net.sourceforge.owlapi</groupId>
        <artifactId>owlapi-rio</artifactId>
        <version>4.1.0-RC1</version>
</dependency>

Not sure if this is an issue or if I am suppose to add this new artifact. It is fine if that is case. But the issue is that this new RDF library/module is generating RDF that does not parse with other tools (i.e: rapper and http://www.w3.org/RDF/Validator/). The problem is with potentially bad encoded XMLLiteral values.

With owlapi version 4.0.x I was getting:

<Class rdf:about="http://bioportal.bioontology.org/ontologies/msotes#class6">
        <rdfs:subClassOf rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
</Class>
<rdf:Description rdf:about="http://bioportal.bioontology.org/ontologies/msotes#class6">
<rdfs:label>rdfs label value</rdfs:label>
<rdfs:comment rdf:datatype="http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral">&lt;ncicp:ComplexDefinition&gt;&lt;ncicp:def-definition&gt;A form of cancer that begin    s in melanocytes (cells that make the pigment melanin). It may begin in a mole (skin melanoma), but can also begin in other pigmented tissues, such as in the eye or in the intestines.&lt;/ncicp:def-definition&gt;&lt;ncicp:def-source&gt;NCI-GLOSS&lt;/ncicp:def-source&gt;&lt;/ncicp:ComplexDefinition&gt;</rdfs:comment>
<metadata:prefixIRI rdf:datatype="http://www.w3.org/2001/XMLSchema#string">msotes:class6</metadata:prefixIRI>
 <msotes:mySynonymLabel>syn for class 6</msotes:mySynonymLabel>
</rdf:Description>

With 4.1.0 RC1 I get the following:

    <Class rdf:about="http://bioportal.bioontology.org/ontologies/msotes#class6">
        <rdfs:label>rdfs label value</rdfs:label>
        <rdfs:comment rdf:datatype="http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral"><ncicp:ComplexDefinition><ncicp:def-definition>A form of cancer that begins in melanocytes (cells that make the pigment melanin). It may begin in a mole (skin melanoma), but can also begin in other pigmented tissues, such as in the eye or in the intestines.</ncicp:def-definition><ncicp:def-source>NCI-GLOSS</ncicp:def-source></ncicp:ComplexDefinition></rdfs:comment>
        <rdfs:subClassOf rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
        <msotes:mySynonymLabel>syn for class 6</msotes:mySynonymLabel>
        <metadata:prefixIRI rdf:datatype="http://www.w3.org/2001/XMLSchema#string">msotes:class6</metadata:prefixIRI>
    </Class>

The second one (4.1.0 RC1) has the XML tags in the XMLLiteral value not encoded in the same way that 4.0.X has. Both the rapper tool and the W3C validator failed to parse the 4.1.0RC1 version so I imagine that this is invalid RDF.

@ignazio1977
Copy link
Contributor

had to add the following to my pom.xml to avoid some RDFFormat not found class exception

This is not expected - the same settings you used for 4.0.x should bring in the same dependencies in 4.1.0. Can you share what you used for 4.0.x so I can figure out what went wrong?

@ignazio1977
Copy link
Contributor

Regarding the XML literals, #333 was the bug report that originated the change.
As far as I can tell from looking at the specs,

http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-XMLLiteral

the XML literal is intended to /not/ be escaped, and needs to not break the XML structure of the containing RDF/XML file. However, I'm not particularly confident of my own interpretation of the specs, so I'd welcome more eyes on the specs.

Under the assumption that the resulting XML is as specified by the specs, this would point out a bug in other tools. To keep compatibility where these tools cannot/have not yet been updated, we could introduce an XMLWriter preference to output XML literals in the old way.

@sesuncedu
Copy link
Contributor

sesuncedu commented May 23, 2015 via email

@ignazio1977
Copy link
Contributor

Nice. So in RDF 1.1 the type is no longer XMLLiteral but Literal.

Yes the problem is clearly RDF/XML. Insert rant about separation of concerns and design by committee here.

@sesuncedu
Copy link
Contributor

sesuncedu commented May 23, 2015 via email

@ignazio1977
Copy link
Contributor

Ok, so the example shows the XML content is not escaped - this would mean OWLAPI is behaving as expected.

The failing fragment looks like this:

<ncicp:ComplexDefinition>
    <ncicp:def-definition>A form of cancer that begins in melanocytes...</ncicp:def-definition>
    <ncicp:def-source>NCI-GLOSS</ncicp:def-source>
</ncicp:ComplexDefinition>

I can't tell if ncicp is already defined in the containing RDF/XML elements - this might make the fragment not self contained, stopping it from being moved to another ontology and serialized correctly. It might be the issue other tools are revealing - @msalvadores are you able to verify if this is the case?

@msalvadores
Copy link
Author

@ignazio1977 and @sesuncedu thanks a lot for looking into this issue. Very much appreciated.

@ignazio1977 I think that is the case. I cannot access the full content right now but I remember that the error was complaining about a missing prefix, probably ncicp. I can double check on Tuesday when I am back at the office but I am almost certain that that is the case.

If the XML did not have to be escaped what is the solution here ? To add the prefix definition ? ... and should the OWL API generate an error instead of generating invalid RDF/XML ?

For our processing workflow in BioPortal it'd be better if the OWLAPI could generate an error in this case.

Thanks again.

@ignazio1977
Copy link
Contributor

There would be a performance price to pay for parsing the XML and make sure it can be parsed validly. It could be shifted to happen on creation of the literals rather than serialization - therefore allowing only well formed XMLLiterals to be created, although this would likely cause its own set of problems.
Overall, I don't think the performance would be too bad - unless the literals were many and large.

@msalvadores
Copy link
Author

@ignazio1977 this is the previous pom.xml config file

https://github.com/ncbo/owlapi_wrapper/blob/master/pom.xml

@ignazio1977
Copy link
Contributor

About the pom config, I get the same issues in 4.0.2 running the verification project with only owlapi-distribution as a dependency. The issue seems to be with the managed jackson dependencies - the dependency management is not extended through the dependencies, and the wrong versions of jackson are pulled in.

With these in my pom, I can run all OWLAPI unit tests from a separate project. Dropping the dependency management bits, I get errors trying to use JSON-LD, due to incompatible jackson versions:

    <dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-core</artifactId>
            <version>2.3.3</version>
        </dependency>
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-databind</artifactId>
            <version>2.3.3</version>
        </dependency>
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-annotations</artifactId>
            <version>2.3.3</version>
        </dependency>
    </dependencies>
</dependencyManagement>

<dependencies>
    <dependency>
        <groupId>net.sourceforge.owlapi</groupId>
        <artifactId>owlapi-distribution</artifactId>
        <version>4.1.0-RC1</version>
    </dependency>
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>4.12</version>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>org.mockito</groupId>
        <artifactId>mockito-all</artifactId>
        <version>1.9.5</version>
        <scope>test</scope>
    </dependency>
</dependencies>

@ignazio1977
Copy link
Contributor

Would be better if the dependencies for owlapi-distribution took care of that.

@ignazio1977
Copy link
Contributor

This change allows the dependency to be just:

<dependency>
    <groupId>net.sourceforge.owlapi</groupId>
    <artifactId>owlapi-distribution</artifactId>
    <version>4.1.0-RC1</version>
</dependency>

ignazio1977 added a commit that referenced this issue Jun 1, 2015
This commit adds a test for ill formed XML literals, which are expected
to fail to serialize.
The issue is that ill formed literals would create ontologies that
cannot be parsed back.
@ignazio1977
Copy link
Contributor

Second commit throws an error when trying to save RDF/XML ontologies that have ill formed XML literals in them.

@ignazio1977 ignazio1977 added this to the Version 5 milestone Jun 1, 2015
@ansell
Copy link
Member

ansell commented Jun 1, 2015

If you need JSONLD-Java changed to a newer version of Jackson I can do that. It is known not to work with 2.5.0 but they fixed the incompatibility in 2.5.1:

FasterXML/jackson-core#178

If there are other incompatibilities with newer versions let me know and I can try to avoid them or get them fixed in jackson.

@ignazio1977
Copy link
Contributor

That would be useful - currently things work with 2.3.3 for core, databind and annotations. I can try updating the managed versions to 2.5.1 and see what happens.

Part of the issue is that databind 2.3.3 depends on annotations 2.3.0, and sesame-rio-rdfjson depends on 2.2.1 - so a third party project without management gets a variety of versions.

@ignazio1977
Copy link
Contributor

Pushing up to 2.5.1 works fine.

@ansell
Copy link
Member

ansell commented Jun 2, 2015

In regards to sesame-rio-rdfjson that I also manage, I have stuck with 2.2.1 due to the great forward compatibility with jackson through the version 2.x series (except for 2.5.0 that was a very small oversight), as sesame-rio-rdfjson doesn't use any of the newer features and it is simpler just to stay with a single version.

However, on the OWLAPI end, as a user of the libraries, it is a great idea to manage them to consistent versions, so it is great that it all works with 2.5.1.

ignazio1977 added a commit that referenced this issue Jun 5, 2015
ignazio1977 added a commit that referenced this issue Jun 5, 2015
This commit adds a test for ill formed XML literals, which are expected
to fail to serialize.
The issue is that ill formed literals would create ontologies that
cannot be parsed back.
@ignazio1977
Copy link
Contributor

RC3 is available on Sonatype and it solves the dependency issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants