Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Codelists as rdfs:Datatype: mixture of formal ontology and thesauri, or an only way to have data interoperable #14

Open
nataschake opened this issue Jan 16, 2025 · 10 comments

Comments

@nataschake
Copy link
Collaborator

nataschake commented Jan 16, 2025

@rob-metalinkage @avillar @ar-chad @VladimirAlexiev

I would like you to speculate a bit on this.

We had a discussion today, January 16th, on whether we get rid of codelists as skos:ConceptSchemas. The main reason to raise the discussion was that we mix formal ontologies and thesauri, that are different by nature.

See skos-reference:

“To understand this distinction, consider that the “knowledge” made explicit in a formal ontology is expressed as sets of axioms and facts. A thesaurus or classification scheme is of a completely different nature, and does not assert any axioms or facts. Rather, a thesaurus or classification scheme identifies and describes, through natural language and other informal means, a set of distinct ideas or meanings, which are sometimes conveniently referred to as “concepts”.” And this is about my comment about making certain ‘ontological commitment’ by including SKOS Concepts in OWL. More here: https://www.w3.org/TR/skos-reference/

Now we have:

bldgcode:BuildingClassValue a skos:ConceptScheme ;
    rdfs:label "BuildingClassValue"@en ;
    dct:isFormatOf bldg:BuildingClassValue ;
    skos:definition "BuildingClassValue is a code list used to further classify a Building."@en .
...
bldg:BuildingClassValue a owl:Class ;
    rdfs:label "BuildingClassValue"@en ;
    rdfs:subClassOf skos:Concept ;
    skos:definition "BuildingClassValue is a code list used to further classify a Building."@en .

But we might want to do the same as is done for enumerations by ShapeChange:

app:TextureType a rdfs:Datatype ;
    rdfs:label "TextureType"@en ;
    owl:equivalentClass [ a rdfs:Datatype ;
            owl:oneOf ( "specific" "typical" "unknown" ) ] ;
    skos:definition "TextureType enumerates the different texture types."@en .

app:TextureType is the enumeration, values are known.

So we might have:

bldg:BuildingClassValue a owl:Class ;
    rdfs:label "BuildingClassValue"@en ;
    owl:equivalentClass [ a rdfs:Datatype ;
            owl:oneOf ( ) ] ;
    skos:definition "BuildingClassValue is a code list used to further classify a Building."@en .

As owl:oneOf can be with 0 values, such definition will be correct, and a particular application can feed values to elements of it.

Here I explicate contras:

Code lists should use the stereotype codeList.
The name of the codelist or enumeration should include the suffix Value
The documentation field of the codeList classes in the UML application schemas shall include the -- Name --, -- Definition --, and -- Description -- information.
The natural language name of the code list (given in the -- Name -- section) should not include the term Value.
The type of code list shall be specified using the tagged value extensibility on the codeList class.
For each code list, a tagged value called vocabulary shall be specified. The value of the tagged value shall be a persistent URI identifying the values of the code list.
A code list may also be used as a super-class for a number of specific codelists whose values may be used to specify the attribute value.
Values of INSPIRE-governed code lists and enumerations shall be in lowerCamelCase notation

  • codelists cannot be defined externally, as skos:ConceptSchema and skos:Concepts, it violates data interoperability. Example: if we define some strict codelist for BuildingClassValue, it will be enumeration, but to add a new value to this list we shold edit Building ontology.

Pros:

  • we don't mix formal ontologies and thesauri, see skos-reference:
  • we don't have duplications as we see in Protege: BuildingClassValue is simultaneously a owl:Class and an individual skos:Concept
@ar-chad
Copy link
Collaborator

ar-chad commented Jan 18, 2025

Maybe this will help:

opengeospatial/CityGML-3.0CM#10 (comment)

CodeList values shall be expressed as UML enumerations in the Conceptual Model within a UML class given the CodeList stereotype.

https://www.ibm.com/docs/en/dma?topic=diagrams-enumerations
In UML models, enumerations are model elements in class diagrams that represent user-defined data types.

According to the above, if a CodeList is a class and CodeListValue is an user-defined datatype, conformance to the UML is not lost.

@rob-metalinkage
Copy link
Collaborator

CodeLists should be empty in class models - they are placeholders for controlled vocabularies - other packages can extend them with a set of values, which must be present but may be further extended by applications implementing these vocabularies.

  • although some earlier models define a core set of values. Enumerations should only be used when these are immutable and not extensible and there is a strong dependency on the values in the model definition - a very rare circumstance.

We can think of each code list as a class or super-class for the entries (instances) in some external registry, which can be extended with additional model(s) for these instances.

We should never mix CodeList instances in the ontology artefacts IMHO - and thus the natural and inevitable OWL punning isnt baked in to the OWL representation of the model - we can can have SKOS or OWL (or other) representations of the values in different views/artefacts.

@ar-chad
Copy link
Collaborator

ar-chad commented Jan 20, 2025

I agree that CodeList instances should be not baked in and code lists should be classes. Ideally, extending those classes should be left to the particular implementation. SKOS Concept for every CodeListValue with a corresponding instance in the ontology, as of now, pretty strongly suggests a particular way of implementing code lists.

@nataschake
Copy link
Collaborator Author

Thanks everyone for the mental efforts spent)
Now each codelist is represented as:

bldgcode:BuildingUsageValue
        rdf:type         skos:ConceptScheme;
        rdfs:label       "BuildingUsageValue"@en;
        dct:isFormatOf   <https://www.opengis.net/ont/citygml/building/BuildingUsageValue>;
        skos:definition  "BuildingUsageValue is a code list that enumerates the different uses of a Building."@en .

and we can think of creation of

bldg:BuildingUsageValue a owl:Class;
        owl:equivalentClass owl:oneOf ();
        rdfs:label       "BuildingUsageValue"@en;
       skos:definition  "BuildingUsageValue is a code list that enumerates the different uses of a Building."@en .

We won't have punning, only one BuildingUsageValue class is in the package building
and these small ontologies are left open to be externally filled with values.
Does it make sense?

@ar-chad
Copy link
Collaborator

ar-chad commented Jan 20, 2025

@nataschake my emoji reaction to your latest reply, means that this change makes a lot of sense to me! 😉

@nataschake
Copy link
Collaborator Author

nataschake commented Jan 20, 2025

Asked :ShapeChange/ShapeChange#475 (comment), because seems I can not simply use ShapeChange rule rule-owl-cls-codelist-external

@VladimirAlexiev
Copy link

app:TextureType a rdfs:Datatype ;
    rdfs:label "TextureType"@en ;
    owl:equivalentClass [ a rdfs:Datatype ;
            owl:oneOf ( "specific" "typical" "unknown" ) ] ;

I dislike this for 2 reasons:

  • by using strings not things, you cannot define what exactly the strings mean, nor provide translations. It's better to use resources
  • a datatype is attached to a value like this "specific"^^app:TextureType. But that is untypical use that not many people would be familiar with; and repository support for custom datatypes is not great

As owl:oneOf can be with 0 values, such definition will be correct

It's not correct: it says the class has no members, and cannot have any members.
To define some class members, use this:

app:TextureType a rdfs:Class.
app:TextureType_specific a app:TextureType; rdfs:label "Specific texture"@en, ...@fr, ...

Due to OWA, someone can add more members later.

I slightly prefer using ConceptSchemes instead of dedicated enumeration classes, because it simplifies the model slightly.
ERA uses class skos:Concept, plus a property annotation era:inSkosConceptScheme.
Eg see Interoperable-data/ERA-Ontology-3.1.0#69

@ar-chad
Copy link
Collaborator

ar-chad commented Jan 20, 2025

If we want to go with the resources and things, there is also a GML format for it that could be RDFized https://data.ogc.org/citygml-swg/CodeList_Examples_3.0.0/

An example for BuildingFunctionValue https://data.ogc.org/citygml-swg/CodeList_Examples_3.0.0/Building.BuildingFunctionValue.xml It is GML Dictionary

Although this response, mentions strings for code list values explicitly: opengeospatial/CityGML-3.0CM#10 (comment)

The enumerated CodeList values shall consist of strings which are regular expressions (including explicit fixed values). Note: the use of regular expressions not fully discussed.

@nataschake
Copy link
Collaborator Author

Merging of ontology file with codelists file was declared in the script update-triples.sh.
It was written in VCityTeam's version for UD-Graph, so to keep ontology and codelists separate I just put codelists generated at step 1 into /CityRDF/codelists folder, and the ontology will keep the names of codelists (that was the thing I cannot achieve with the ShapeChange rule "rule-owl-cls-codelist-external").
/CityRDF/codelists folder serves now as a place to keep codelists as skos:ConceptSchemeaccording to ISO19150-2. The values of such codelists are in https://data.ogc.org/citygml-swg/CodeList_Examples_3.0.0/
One can RDFize them as skos:Concepts (e.g. with Ontotext Refine) and use in applications

@ar-chad
Copy link
Collaborator

ar-chad commented Jan 23, 2025

Probably the last thing that I am going to say about using skos:Concept as default for code lists in CityRDF is that it scores a little bit less with it than it does without it, when the ontology is assessed by using those criteria:

https://www.semantic-web-journal.net/system/files/swj657.pdf

Adaptability Coupling: Number of external classes referenced

On the other hand, those concept schemes are meant to provide this kind of adaptability. It is still a choice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants