Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assorted feedback: order, equality, reversibility and RDF #9

Open
EmmanuelOga opened this issue Nov 27, 2020 · 10 comments
Open

Assorted feedback: order, equality, reversibility and RDF #9

EmmanuelOga opened this issue Nov 27, 2020 · 10 comments

Comments

@EmmanuelOga
Copy link

Equality

There was an interesting thread going on on HN recently about type definition languages.

Two type definition languages presented these ideas:

  • Equality: two values are equal if neither is less than the other according to the total order.
  • Reversibility: serialized . deserialize = identity

I saw C/E covers some equivalence considerations but was wondering, maybe there's a way, perhaps by picking a subset of the spec, that order, equality and reversibility could be backed in.

RDF

The design is clearly attempting to cover the use cases of JSON and XML, I thought one last leg to be all-encompassing would be to cover RDF, being able to express triples in such nice way as Turtle does. I think mapping to RDF would be doable with the current design (say, xsd:string map to c/e string, etc.). One thing that is not as easy to express are tagged string literals: "Hello"@en.

@kstenerud
Copy link
Owner

kstenerud commented Nov 30, 2020

Hi Emmanuel, thanks for the feedback!

My intent is to keep data "value", "type", and "meaning" mostly separate in the Concise Encoding universe.

I want the "value" format (CBE and CTE) to concern itself with providing the minimum amount of information necessary to (in theory) efficiently rebuild the original data structure with correct values after transmission (while not complicating it with data compression techniques). I've left the data format deliberately vague on types because I believe that a moderately sophisticated schema should be able to handle this seldom-changing information, such that it doesn't have to be transmitted with every payload, and can also be ignored by untyped ingestors depending on your use case.

A decent schema could also specify ordering and equivalence rules, with sensible defaults for implementations where tight definitions are not so important. The equivalence section is intended more as a default starting point for schemaless designs in order to keep the confusion levels down. I intend to bake more control into the schema format so that equivalence, ordering, etc can be explicitly defined when it's important.

I've been reading up on RDF over the weekend, and I like the idea of recording relationship data, but I'm having a hard time wrapping my head around how I'd encode quads (triples would be easy). Specifically, I'd like the encoding to be able to model this sort of thing:

named-triple-use-case
(from https://blogs.oracle.com/oraclespatial/rdf-extending-rdf-to-support-named-triples)

Quads such as [donation john donatedTo TopUniv] (optional-label, subject, predicate, object) would only require the addition of one new type in CBE, and some kind of specialized prefixed list format (such as @[label subject predicate object]) in CTE. But this Turtle-like syntax would effectively split the relationship (and some of the data) from the object it describes. Putting the data directly into the map structure (like in JSON-LD) feels a lot better. Markers and references actually make this easy and require no changes to the format:

c1
{
	definitions = [
		&TopUniv:|u http://universities.edu/SomeTopUniversity|
		&donated_to:|u http://w3c-something.org/money#donate_to|
		&father_of:|u http://w3c-something.org/family#father_of|
		&child_of:|u http://w3c-something.org/family#child_of|
		&admitted_to:|u http://w3c-something.org/organizations#admission|
	]
	people = {
		&john:John = {
			$donated_to = $TopUniv
			$father_of = $mary
		}
		&mary:Mary = {
			$admitted_to = $TopUniv
			$child_of = $john
		}
	}
}

But this only supports triples, not quads... If I could find a way to encode quads into the key-value structure of map entries, that would be ideal. Something along the lines of:

c1
{
	definitions = [
		&TopUniv:|u http://universities.edu/SomeTopUniversity|
		&DonationLaw:|u http://government.gov/laws/donations#conflict_of_interest|
		&donated_to:|u http://w3c-something.org/money#donate_to|
		&father_of:|u http://w3c-something.org/family#father_of|
		&child_of:|u http://w3c-something.org/family#child_of|
		&admitted_to:|u http://w3c-something.org/organizations#admission|
		&helped:|u http://w3c-something.org/cooperation#help|
		&violates:|u http://w3c-something.org/legal#violation|
	]
	people = {
		&john:John = {
			&donation:@[$donated_to = $TopUniv]
			$father_of = $mary
		}
		&mary:Mary = {
			&admission:@[$admitted_to = $TopUniv]
			$child_of = $john
		}
	}
	meta-relationships = [
		&conflict_of_interest:@[$donation $helped $admission]
		@[$conflict_of_interest $violates $DonationLaw]
	]
}

... except I'd want something better, because the above looks messy and hard to read. Also, the meta-relationships end up placed ad-hoc in some other section of the document, which data ingestors would need to understand how to find. This gets tricky fast...

One other thing that came from this exercise is IRIs (which I had no idea existed). I think I'll just rename the URI type to Resource, whose format can be defined by a schema, with a default format of IRI.

@EmmanuelOga
Copy link
Author

Hey Karl,

I'm glad you are digging into RDF!

You describe "reification", not needed as often as one would imagine. For instance, on could have stand-alone donation and admission entities: ex:donation1 a schema:Donation . and ex:Admission1 a schema:Admission ., then draw connections as needed.

Have you heard of RDF*? It is a newish reification spec. There's an intro here, a WIP spec, and a note describing a "hacky" implementation (through a form: <urn:triple:${enc(S)}:${enc(P)}:${enc(O)}> 😄). Note that this is already implemented in triplestores like Jena and RDF4J.

One thing that makes RDF serializations a lot more readable are IRI prefixes, if you could have something like that it would really help.

There are formats like microdata and RDFA (embedding RDF in HTML), but I think is better to start with the data and then embed that data into document. I'm thinking of how to do this with my website (expressed in turtle). I have a little proof-of-concept of rendering a RDF+markup template too. Maybe concise-encoding could express this sort of thing better than turtle 🙂 .

@kstenerud
Copy link
Owner

OK, I've been thinking on-and-off about this in my spare time, and I think I've got my head wrapped around most of it now. Please tell me if I've got this right:

  1. There is the idea that we can have a universal set of uniquely referencable "concepts" that computers can use to correlate data. For example, "father of" would be canonicalized as http://blah.w3c.org/relatives#father_of or something like that.
  2. We'd need a way to map each of our internal datum to those universal concepts, such that when we transmit the data, the receiving end can know that we're talking about the same things.
  3. We'd need an understandable way of transmitting data in commonly defined structures to a foreign system (which is what Concise Encoding currently does).
  4. Any relationship is itself data, and therefore should be capable of being used as a subject or object in another relationship.

Tying the information theory to data communications theory:

  • We want to keep the transmitted data small if possible, so putting those big canonicalized references directly in the transmitted data is a bad idea.
  • Mappings from internal data to concepts is unlikely to change often, so this information shouldn't be transmitted in every payload.

Schemas are currently used to enforce structure on data so that foreign systems can safely validate and ingest it. Since schemas don't change much, it seems that the schema would be a good place to codify mappings to concepts. You'd then have two mapping levels in any information system:

  1. Schema-level mapping of universal concepts to a "canonical" name of some sort (http://blah.w3c.org/relatives#father_of -> father_of)
  2. Implementation-level mapping of canonical name to internal name (father_of -> FatherOf, or maybe something more complex because the internal structure uses a list called Children)

The schema would also set out the limitations of what relationships can be used where (e.g. male people can have father_of relationships, but female people cannot). Data classes and such...

Then the transmitted data would only contain a reference to the schema, which the receiver would consult if it needs to derive meaning from the data's relationships, or wants to validate types/structure/format/whatever. In Concise Encoding, this could be done using a metadata map containing a pointer to the schema.

As a side note, it seems that relationships can't be restricted to map-like structures where the map itself is the implied subject (like in JSON-LD). An information system could hold canonical relationship information about data it does not itself control (for example, [http://people.org/everyone#john_smith http://blah.w3c.org/relatives#father_of http://people.org/everyone#mary_smith]). So we absolutely need a "relationship" type in Concise Encoding.

Also, since Concise Encoding is merely the transmission format, it doesn't need to concern itself with addressibility of the relationships contained in the data; that's what the universal resources (IRIs) are for. For internally referencing such data, the existing marker/reference types are sufficient.

Am I understanding this correctly? Is there anything I've missed?

@kstenerud
Copy link
Owner

OK I think I've got it...

My above comments should handle semantic content within the bounds of data defined by a schema, but it doesn't deal with referencing semantic data. CE would need some changes in order to make things less cumbersome.

Prefixes & Concatenation

Unaided, semantic references become a madness of endless repetition of the same base IRIs with slightly different endings, thus IRI prefixes in Turtle etc. Since prefixes are special definition operations that don't represent actual data but rather references to IRI partials that will be used elsewhere in the document:

@prefix local: <https://myself.org> .
@prefix pub: <https://blah.w3c.org/publishing#> .

(Data here)

CE could accomplish something similar using metadata maps and markers:

(
	prefixes = [
		&local:|u https://myself.org/|
		&pub:|u https://blah.w3c.org/publishing#|
	]
){
	// Data here
}

To produce this effect, I've made a metadata map containing a list of marked resources. The name "prefixes" is purely arbitrary and could be anything. In fact, the entire structure of the metadata map is arbitrary, and won't affect how the actual data is processed in this case since it's just being used to store the definitions that will be referenced elsewhere in the real data section. Since the metadata is "outside" of the data, we now in effect have reference definitions for "local" and "pub".

To use these definitions, CE requires a new type to represent the concatenation of a resource and a string. An actual "concatenation" operator type would probably work best here, with the restriction that it can only concatenate a string onto a resource. This complicates parsing a little bit by requiring a lookahead, but overall I don't think it's too terrible.

For CBE, I can simply add a new type code for "concatenate". For CTE, I'll need to modify the markup a little bit. The least disruptive, most recognizable approach would be to use : as the concatenate operator, like so: |u https://somewhere.com/|:blahblah (which is https://somewhere.com/blahblah). Then, using the marker/reference functionality, we can do things like $myurl:blahblah. Since : is only otherwise used in markers and as part of the time format, overloading it for this purpose poses no risk of parsing ambiguity.

Putting it all together:

In Turtle:

@prefix local: <https://myself.org/> .
@prefix pub: <https://blah.w3c.org/publishing#> .

pub:author local:me ;

In Concise Encoding:

(
	prefixes = [
		&local:|u https://myself.org/|
		&pub:|u https://blah.w3c.org/publishing#|
	]
){
	$pub:author = $local:me
}

Relationship data with ad-hoc subjects

Maps are basically sets of relationships where the map itself is the implied subject for each relationship (denoted by key-value pairs). For ad-hoc relationship data where this is not the case, CE would need another type to represent the [subject predicate object] container. I could overload @ for this purpose in CTE since it's only used for named values and UUIDs currently. To address data of this type locally, the marker/reference syntax works fine (&myrelation:@[somesubject somepredicate someobject]). To address globally, just expose the relationship object as an IRI. In fact, since marker IDs must be unique to the document, they would serve nicely as the identifier after the # in an IRI for those wishing to expose a resource using CE as the data payload medium (https://mydata.org/relationships#myrelation).

I think this covers everything?

@kstenerud
Copy link
Owner

kstenerud commented Dec 5, 2020

To test these ideas, I've converted the examples from https://www.w3.org/TR/turtle to Concise Encoding. The following modifications to CE seem to suffice:

  • URI has been renamed to Resource, and has the string-array type r. Default resource type is now IRI.
  • New concatenation operator which concatenates a string to a resource. The CTE concatenation format is (resource):(string).
  • New relationship type, with a CTE format of @[subject predicate object].
    • Subject can be any type that represents an entity: a resource, a map, a list, a relationship, or null for a blank node. Actually maybe an empty map would be better for the blank node...
    • Predicate can be a resource.
    • Object can be any type.

Example 1:

Turtle:

@base <http://example.org/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rel: <http://www.perceive.net/schemas/relationship/> .

<#green-goblin>
    rel:enemyOf <#spiderman> ;
    a foaf:Person ;    # in the context of the Marvel universe
    foaf:name "Green Goblin" .

<#spiderman>
    rel:enemyOf <#green-goblin> ;
    a foaf:Person ;
    foaf:name "Spiderman", "Человек-паук"@ru .

Concise Encoding:

There's no @base command, so I've used the prefix ex. There's also no a operator, but it's just syntactic sugar for http://www.w3.org/1999/02/22-rdf-syntax-ns#type anyway.

c1 (
    rdf-prefixes = [
        &ex:|r http://example.org/#]
        &rdf:|r http://www.w3.org/1999/02/22-rdf-syntax-ns#]
        &rdfs:|r http://www.w3.org/2000/01/rdf-schema#]
        &foaf:|r http://xmlns.com/foaf/0.1/]
        &rel:|r http://www.perceive.net/schemas/relationship/]
    ]
){
    $ex:green-goblin = {
        $rel:enemyOf = $ex:spiderman
        $rdf:type = $foaf:Person //  in the context of the Marvel universe
        $foaf:name = "Green Goblin"
    }
    $ex:spiderman = {
        $rel:enemyOf = $ex:green-goblin
        $rdf:type = $foaf:Person
        $foaf:name = "Spiderman"
    }
}

Example 2:

Turtle:

<http://example.org/#spiderman> <http://www.perceive.net/schemas/relationship/enemyOf> <http://example.org/#green-goblin> .

Concise Encoding:

c1 [
    @[|r http://example.org/#spiderman| |r http://www.perceive.net/schemas/relationship/enemyOf| |r http://example.org/#green-goblin|]
]

Example 3:

Turtle:

<http://example.org/#spiderman> <http://www.perceive.net/schemas/relationship/enemyOf> <http://example.org/#green-goblin> ;
                <http://xmlns.com/foaf/0.1/name> "Spiderman" .

Concise Encoding:

c1 {
    |r http://example.org/#spiderman| = {
        |r http://www.perceive.net/schemas/relationship/enemyOf| = |r http://example.org/#green-goblin|
        |r http://xmlns.com/foaf/0.1/name| = Spiderman
    }
}

Example 4:

Turtle:

<http://example.org/#spiderman> <http://www.perceive.net/schemas/relationship/enemyOf> <http://example.org/#green-goblin> .
<http://example.org/#spiderman> <http://xmlns.com/foaf/0.1/name> "Spiderman" .

Concise Encoding:

c1 [
    @[|r http://example.org/#spiderman| |r http://www.perceive.net/schemas/relationship/enemyOf| |r http://example.org/#green-goblin|]
    @[|r http://example.org/#spiderman| |r http://xmlns.com/foaf/0.1/name| Spiderman]
]

Example 5:

Turtle:

<http://example.org/#spiderman> <http://xmlns.com/foaf/0.1/name> "Spiderman", "Человек-паук"@ru .

Concise Encoding:

The @"language" structure in Turtle feels like a cheat that taints the orthogonality of the language, because you're really modeling [spiderman-name inEnglish "Spiderman"] and [spiderman-name inRussian "Человек-паук"]. I'm using a map here to model this.

c1 (
    rdf-prefixes = [
        &lang:|r https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=|
    ]
)[
    @[|r http://example.org/#spiderman] |r http://xmlns.com/foaf/0.1/name| {
        $lang:en=Spiderman
        $lang:ru=Человек-паук
    }]
]

Example 7:

Turtle:

@prefix somePrefix: <http://www.perceive.net/schemas/relationship/> .
<http://example.org/#green-goblin> somePrefix:enemyOf <http://example.org/#spiderman> .

Concise Encoding:

c1 (
    rdf-prefixes = [
        &somePrefix:|r http://www.perceive.net/schemas/relationship/|
    ]
)[
    @[|r http://example.org/#green-goblin| $somePrefix:enemyOf |r http://example.org/#spiderman|]
]

Example 12:

Turtle:

@prefix : <http://example.org/elements#> .                                                                              
<http://en.wikipedia.org/wiki/Helium>                                                                                  
    :atomicNumber 2 ;               # xsd:integer                                                                      
    :atomicMass 4.002602 ;          # xsd:decimal                                                                      
    :specificGravity 1.663E-4 .     # xsd:double 

Concise Encoding:

c1 (
    rdf-prefixes = [
        &elem:|r http://example.org/elements#|
    ]
){
    |r http://en.wikipedia.org/wiki/Helium| = {
        $elem:atomicNumber = 2
        $elem:atomicMass = 4.002602
        $elem:specificGravity = 1.663e-4
    }
}

Example 14:

Turtle:

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
_:alice foaf:knows _:bob .
_:bob foaf:knows _:alice .

Concise Encoding:

This one is a little cumbersome. I'm not really sure what is the best way to encode this data. You could do something like this:

c1 (
    rdf-prefixes = [
        &foaf:|r http://xmlns.com/foaf/0.1/|
    ]
){
    nodes = [
        &alice:{}
        &bob:{}
    ]
    relationships = [
        @[$alice $foaf:knows $bob]
        @[$bob $foaf:knows $alice]
    ]
}

Or perhaps to keep to only the specific data, put the blank nodes in with the metadata?

c1 (
    rdf-prefixes = [
        &foaf:|r http://xmlns.com/foaf/0.1/|
    ]
    rdf-blank-nodes = [
        &alice:{}
        &bob:{}
    ]
)[
    @[$alice $foaf:knows $bob]
    @[$bob $foaf:knows $alice]
]

Example 15:

Turtle:

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
# Someone knows someone else, who has the name "Bob".
[] foaf:knows [ foaf:name "Bob" ] .

Concise Encoding:

An anonymous blank node could be represented by @null:

c1 (
    rdf-prefixes = [
        &foaf:|r http://xmlns.com/foaf/0.1/|
    ]
)[
    // Someone knows someone else, who has the name "Bob".
    @[@null $foaf:knows {$foaf:name = Bob}]
]

Or just use map notation:

c1 (
    rdf-refs = [
        &foaf:|r http://xmlns.com/foaf/0.1/|
    ]
)[
    // Someone knows someone else, who has the name "Bob".
    {
        $foaf:knows = {
            $foaf:name = Bob
        }
    }
]

Example 16:

Turtle:

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
[ foaf:name "Alice" ] foaf:knows [
    foaf:name "Bob" ;
    foaf:knows [
        foaf:name "Eve" ] ;
    foaf:mbox <[email protected]> ] .

Concise Encoding:

c1 (
    rdf-prefixes = [
        &foaf:|r http://xmlns.com/foaf/0.1/|
    ]
){
    $foaf:name = "Alice"
    $foaf:knows = {
        $foaf:name = "Bob"
        $foaf:knows = {
            $foaf:name = "Eve"
        }
        $foaf:mbox = |r mailto:[email protected]|
    }
}

Example 17:

Turtle:

_:a <http://xmlns.com/foaf/0.1/name> "Alice" .
_:a <http://xmlns.com/foaf/0.1/knows> _:b .
_:b <http://xmlns.com/foaf/0.1/name> "Bob" .
_:b <http://xmlns.com/foaf/0.1/knows> _:c .
_:c <http://xmlns.com/foaf/0.1/name> "Eve" .
_:b <http://xmlns.com/foaf/0.1/mbox> <[email protected]> .

Concise Encoding:

I'm putting the blank nodes in the metadata again to keep the focus on the relationship data.

c1 (
    rdf-prefixes = [
        &foaf:|r http://xmlns.com/foaf/0.1/|
    ]
    rdf-blank-nodes = [
        &a:{}
        &b:{}
        &c:{}
    ]
)[
    @[$a $foaf:name "Alice"]
    @[$a $foaf:knows $b]
    @[$b $foaf:name "Bob"]
    @[$b $foaf:knows $c]
    @[$c $foaf:name "Eve"]
    @[$b $foaf:mbox |r mailto:[email protected]|]
]

@kstenerud
Copy link
Owner

kstenerud commented Dec 6, 2020

Tagged string literals still bother me... They feel like arbitrary constructs that aren't extensible or composeable in any way.

Here is the standard example in Turtle:

<http://example.org/#spiderman> <http://xmlns.com/foaf/0.1/name> "Spiderman", "Человек-паук"@ru .

This data represents the statement Spiderman has the name "Spiderman", and the Russian name "Человек-паук".

Let's first fix this so that the language tagging is consistent:

<http://example.org/#spiderman> <http://xmlns.com/foaf/0.1/name> "Spiderman"@en, "Человек-паук"@ru .

Spiderman has the name "Spiderman" (English), and "Человек-паук" (Russian).

This is actually two statements with the same subject and predicate:

<http://example.org/#spiderman> <http://xmlns.com/foaf/0.1/name> "Spiderman"@en .
<http://example.org/#spiderman> <http://xmlns.com/foaf/0.1/name> "Человек-паук"@ru .

Spiderman has the name "Spiderman" (English).
Spiderman has the name "Человек-паук" (Russian).

But this breaks the subject-predicate-object model (there are four pieces of information per statement here). What you actually have is a relationship to an object with multiple properties:

:spiderman-name <http://languages.org/en> "Spiderman" .
:spiderman-name <http://languages.org/ru> "Человек-паук" .
<http://example.org/#spiderman> <http://xmlns.com/foaf/0.1/name> :spiderman-name .

Spiderman has a name which in English is "Spiderman", and in Russian is "Человек-паук".

Tagged literals feel like a convenience when language tagging, but they obscure the actual relationship graph. They also completely lack any provenance for the language codes (they just magically mean something), which means that this tagging scheme cannot be extended to anything else (due to the implicit knowledge that must be added), and any additions to this implicit "language code" knowledge would require a re-issuance of the spec or risk incompatibilities between implementations.

@EmmanuelOga
Copy link
Author

Karl: still need to sit down and read your updates attently, but few points about string literals.

Since this was discussed recently in Clojure's RDF Chat, maybe it will be useful to provide pointers:

Quoting @quoll:

"foo" and "foo"@en are different literals. In fact, for RDF 1.0, there were 3 distinct types of string: • "foo" was a Simple Literal • "foo"^^xml:string was a Typed Literal • "foo"@en was a simple literal with a Language Tag All 3 were distinct, and I can’t tell you the grief this caused. It was a relief when RDF 1.1 was introduced and gave all simple literals (that didn’t have a language tag) a datatype of xml:string. Those with a tag are now rdf:langString [...]

As for languages… the idea of tagging is to provide semantics for a group of letters. The simple literal "chat" is just a sequence of 4 unicode characters. However, "chat"@en has a semantic that means a conversation, and "chat"@fr has a semantic that means a male cat. These semantics were considered important to capture

I had the same feelings you are expressing about redundancy, but put in that light I think it makes sense.

In any case, note that any value is susceptible of becoming "stringly typed" since you can specify any user defined type in a string literal:

<http://ex.com/subject> <http://ex.com/predicate> "1607253502"^^random:definition-of/unix-timestamp .

I'm not well versed on the details of how these things work but I think those user defined types follow the conventions defined by the XSD schema language... although in practice it seems a typed string value can be just about anything and the type just any url you want ... a deserializer would then read that Turtle tag and "hydrate" the string to whatever makes sense on the environment of the programming language:

<http://ex.com/subject> <http://ex.com/predicate> "c1 { ... }"^^<https://concise-encoding/...> .

😛

@kstenerud
Copy link
Owner

kstenerud commented Dec 6, 2020

Yeah sorry :P I'm coming from zero experience or knowledge here, so this is mostly my journey into the wild world of knowledge systems and semantic data, while at the same time trying not to hobble the Concise Encoding implementation ;-)

I can see why they chose to add string tags, but I dunno... I'm not really convinced that the trade-off was worth it. It feels too much like magic for a system that's supposed to rely on formalized descriptions and allow no assumptive knowledge to creep in.

The example "chat" was a complete surprise to me. I didn't realize that it's an indicator of a complete semantic meaning. I'd just assumed that it was a generic "This text is in language X" marker ("chat" trn:isLanguage lang:fr .), not ("chat" tax:isA animal:cat .). How do they deal with homonyms? For example, "chat"@en could mean a conversation, or it could be a kind of bird. Then there's the issue of words taking on or morphing meaning over time (for example "gay" taking on the additional meaning of "homosexual"). It seems like they're falling into the very semantic trap they were trying to fix...

@kstenerud
Copy link
Owner

kstenerud commented Dec 7, 2020

WIP description:

Relationship

A relationship is a container-like structure for making statements about resources in the form of subject-predicate-object triples (like in RDF). Relationships form edges between nodes (resources or values) to build a semantic graph. Local resources are anonymous by default, but can be made addressable by marking them.

A relationship is composed of the following three components (in order):

  • Subject, which must be a resource
  • Predicate, which must be a resource pointer that represents a semantic predicate
  • Object, which can be a resource or a value.

Maps as Relationships

Maps can also be used to represent relationships because they are natural relationship structures (where the map itself is the subject, the key is the predicate, and the value is the object). In Concise Encoding, the key-value pairs of a map are only considered relationships if their types match the requirements for the predicate and object of a relationship.

Using maps to represent relationships can make the document more concise and the graph structure easier to follow, but the relationships expressed as key-value pairs cannot be made addressable (and thus cannot be used as resources). This is generally not a problem because few relationships actually need to be used as resources in real-world applications.

Resource

A resource is one of:

Examples:

At their most basic, relationships are simply 3-component statements containing a subject, a predicate, and an object:

c1 [
    @[|r https://springfield.gov/people#homer_simpson| |r https://example.org/wife| |r https://springfield.gov/people#marge_simpson|]
    @[|r https://springfield.gov/people#homer_simpson| |r https://example.org/employer| |r https://springfield.gov/employers/nuclear_power_plant|]
]

Using the full URI is tedious, but we can use markers and the concatenation operator to make things more manageable. In the following example, the marked resource pointers are placed in a list (arbitrarily named "rdf") in a top-level metadata map so that they themselves don't constitute data, but can still be referenced from the data.

c1 (
    rdf = [
        &people:|r https://springfield.gov/people#|
        &employers:|r https://springfield.gov/employers/|
        &e:|r https://example.org/|
    ]
)[
    @[$people:homer_simpson $e:wife $people:marge_simpson]
    @[$people:homer_simpson $e:employer $employers:nuclear_power_plant]
]

We can also use map syntax to model most relationships, which often makes the graph more clear to a human reader:

c1 (
    rdf = [
        &people:|r https://springfield.gov/people#|
        &employers:|r https://springfield.gov/employers/|
        &e:|r https://example.org/|
    ]
){
    $people:homer_simpson = {
        $e:wife = $people:marge_simpson
        $e:employer = $employers:nuclear_power_plant
    }
}

With map syntax, relationships can't be marked. When relationship marking is needed, they must be written using standard relationship statements:

c1 (
    rdf = [
        &people:|r https://springfield.gov/people#|
        &employers:|r https://springfield.gov/employers/|
        &e:|r https://example.org/|
    ]
){
    $people:homer_simpson = {
        $e:wife = $people:marge_simpson
        $e:prev_employer = $employers:nuclear_power_plant
        $e:regrets = [
            $firing
            $forgotten_birthday
        ]
    }
    rdf-statements = [
        &marge_birthday:@[$people:marge_simpson $e:birthday 1956-10-01]
        &forgotten_birthday:@[$people:homer_simpson $e:forgot $marge_birthday]
        &firing:@[$people:montgomery_burns $e:fired $people:homer_simpson]
        @[[$firing $forgotten_birthday] $e:contribute $e:marital_strife]
    ]
}

Note: If the previous document were published at https://mysite.org/data.cte, all markers would be accessible using fragments:

  • https://mysite.org/data.cte#marge_birthday
  • https://mysite.org/data.cte#forgotten_birthday
  • https://mysite.org/data.cte#firing

Technically, these would also be accessible, although they would only resolve to resource pointers:

  • https://mysite.org/data.cte#people
  • https://mysite.org/data.cte#employers
  • https://mysite.org/data.cte#e

@kstenerud
Copy link
Owner

OK, I've read through all of the semantic web and RDF literature on w3.org and I think I've got the important bits now.

Quick description: https://concise-encoding.org/index.html#relationships

Long descriptinon: https://github.com/kstenerud/concise-encoding/blob/master/ce-structure.md#relationship

No matter how many ways I look at it, string tags just feel like a mistake. Language is a property (a predicate in fact), and should be recorded as relationship data, not directly into the literal itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants