-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Grammar updates for triple terms and occurrences. #51
Conversation
My interpretation (BNF only) of @afs proposed changes for triple terms and triple occurrences. No change to parser rules, thus far. Raw BNF in Files view, rendered via GitHack. |
The nomenclature and wording in the Quoted Triples section will still require quite a bit of revision. Conceptually, we need to know how to talk about triple descriptors in relation to other triples in a graph, and how quoted triples/triple tokens/triple occurrences related to triple descriptors and what the mean. Most of this needs to go in Concepts, but needs to be echoed in Turtle and other concrete syntaxes. Also, we may discourage the direct use of triple descriptors favoring annotations and quoted triples/whatever. The main point of this draft, so far, is to get the grammar and basic usage consistent with discussions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not finished my review of the Parsing section, but here are already a number of comments.
3518294
to
e271513
Compare
spec/turtle.bnf
Outdated
reifier ::= '<<' ((iri | BlankNode) '|' )? subject predicate object '>>' | ||
tripleTerm ::= '<<(' subject predicate ttObject ')>>' | ||
ttObject ::= iri | BlankNode | literal | tripleTerm | ||
annotation ::= '{|' ( (iri | BlankNode) '|' )? predicateObjectList '|}' |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
I think |
@gkellogg Could you elaborate on that. Maybe I missed something but why => (two characters) is better than | (one character)? |
Can you point out where that text is in the document? I don't see it anywhere. That said, other places in this and other documents use "zero or more", so I'd be fine with that. |
@niklasl pointed out that using Whatever we do for |
See also w3c/rdf-star-wg#116. One important aspect (IMO) is that a prefix or wrapping notation is valuable for reading these, to avoid reading the name (which may be a long hash, as in Wikidata qualifiers) as a predicate (in annotations) or the subject (in the We also need to be careful with whatever is added so it doesn't block any other future designs that we can foresee, or steps on other syntaxes unnecessarily. In Notation 3 I've done some more evaluation (all the examples from the UCR plus a gamut of Wikidata data), and actually found that wrapping the name in (That's the Ruby/Rust lambda-style; also figuring in some musical and mathematical notations (|abs|), etc. I put examples in a gist.) I've also suggested some more radical changes; but admittedly some of those caters less for what is likely the more common case (one- or many-to-one; names added for reference; annotation data still better to keep with the value). The naming-only form (with any SPARQL-compatible syntax) would still work well if you need to "tag" a bunch of triples with the name of a many-to-many reifier. |
@niklasl totally agree, => is used in N3. IMHO it's a bad choice. |
How about |
That's pretty much what this version of the grammar does if we allow whitespace in the
To make it a token, we'll need to define some terminals:
I also have a version which defines a |
I think that's still ambiguous though, unless using a preceding whitespace is to be significant? Otherwise, this: I've tried the suggestion to wrap the name, like:
which parses the examples I linked above. A simplified sample: <Q34851> :nominatedFor <Q103618> {| |<Q103618#6698506f>| a :Nomination ;
:forWork <Q582281> ;
:date "1958-02-18"^^xsd:date |} ,
<Q103618> {| |<Q103618#6698cb58>| a :Nomination ;
:forWork <Q713979> ;
:date "1959-02-23"^^xsd:date |} . This also helps spotting the identifier in named "quoted" triples (using more real wikidata to illustrate the problem):
Since otherwise you'd have to scan (as a human reader) beyond the id to know if it's the subject of a triple, or a name followed by a triple. (I've mixed up name and subject in these even when editing my own "toy" examples.) |
@niklasl - could you expand on that point please? In SPARQL 1.1 Isn't it, going left-to-right, The SPARQL grammar target is LL(1) and LALR(1) which covers the mostly available choices for many programming languages. |
I like where this is going. (The SPARQL grammar indeed requires much more care!) Triple reference The :s :p :o {| :q :z |} .
:s :p :o ~(:r) .
:s :p :o ~(:r, _:q) .
<< :s :p :o ~ :r >> . |
Agreed. At one level, it is shame to change what's been written about, but at the same time, it hasn't been universally adopted. |
I think that's good to allow. I can do that for SPARQL once the style is agree and I can trim down the universal grammar. The difference choice start to interact when Writing the multi-occurences does start to mix up with the couple some of the altenrative styles Tentative direction:
|
Yes - while its not been the style up to now, the uniformity is appealing.
Firstly, if we have this, we can have I don't think that
These more complex case maybe better as declaration-pattern:
that is, |
I'm not sure that this will be a common enough pattern to create special syntax for it, as it's fairly easy (and arguably clearer) to create separate statements for each reifier. Also, serializing a graph containing duplicate reifiers with some overlapping annotations would be pretty challenging. I'd say we start with the single reifier grammar and re-consider adding |
Agreed.
Makes sense.
I agree (and find that combination harder to read too).
Yes, I think I'd readily accept that. It's akin to blank nodes, where the embedded |
Sure; there are pros and cons here (repeating only the object probably isn't too bad). It might be somewhat important though, so let's keep it open for more feedback. For example, cases derived from Wikidata may map cleanly to multiple reifiers per triple — here's a sketch using the Some serialization considerations. Given a "random" triple stream (here as pseudo-ntriples-with-pnames): :s :p :o .
r1 rdf:reifies <<( :s :p :o )>> .
:r2 rdf:type :Note .
:r1 rdf:type :Note .
:r2 rdf:reifies <<( :s :p :o )>> .
:r2 rdf:reifies <<( :s :p :q )>> .
:s :p :q . A process with some memory of seen triples but no buffering nor indexing can still stream out "best effort" Turtle line-wise, making it more "well-formed": :s :p :o ~ :r1 .
:r2 a :Note .
:r1 a :Note .
:s :p :o ~ :r2 .
<< :s :p :q ~ :r2 >> .
:s :p :q . Whereas a pretty-printer with access to the entire graph could do: :s :p :o ~( :r1 :r2 ) ,
:q ~ :r2 .
:r1 a :Note .
:r2 a :Note . It would, for each triple, serialize all AFAICS, only if such markers are neither reifiers of any other triple, nor the subject of any other triple, can they be serialized using the blank |
Am I reading Gregg's multiple annotations proposal correctly here and this can be done with:
? |
A question for clarification:
Is this case 1, which I prefer, and was my initial reading The second annotation block has a generated reification id and would be the same as writing:
and making
two separate blank nodes?
or is it case 2 At some point, we have to say "don't rely on gnarly expressions to do what you want - write them clearly" and provide a justifiable reading. Case 1 style would be explaining
as shorthand for
|
That's my interpretation, and what I think makes sense.
+1
To me, that doesn't make sense.
+1 |
There is a bit of ambiguity still in the proposed grammar.
This way :s :p :o ~r {||} {| :p :o |} .
# expands to
:s :p :o .
:r rdf:reifiies <<(:s :p :o)>> .
_:b1 rdf:reifiies <<(:s :p :o)>> .
_:b1 :p :q . Alternatively, the ambiguity can be resolved in parser logic and use an alternative grammar:
If a parser parses a reifier and subsequently parses the annotation block it would assign the previously parsed reifier to that annotation block, but the BNF itself is ambiguous which is concerning. |
Is there a need to both name and describe the reifier in place? With bnodes its either id or description block, so it would follow the general Turtle design to either id or describe an anonymous reifier here too. |
We need the ability to name a description block with either an IRI or a blank node. If not provided, the name (reifier) is automatically generated. Because the grammar allows both the description block and the reifier to be optional we have a conflict. Based on discussion, it seems that there is a need to both name and describe or just describe a description block. |
The BNF is fine - what has to be defined is the translation from the syntax tree to triples output (section 7). This is LL(1) for the multiple annotation case via the
(In SPARQL, These parse rules do not try to associate the reifier with the annotation block. It is not showing as ambiguous because the sequence The meaning, the translation to triples, would have a state variable for the reifier id which is initially unset, then set by Writing
(I think there was a missing is a problem for multiple reifiers/annotation blocks.
|
To move forward I suggest moving this PR out of draft so as to merge it to get everything else into the doc even if the grammar isn't final. Create a follow-up issue, or issues, for specific points in the grammar. |
* Fix some references to non-existent term definitions. * Spec updates (with placeholders) for reified triples and annotations. * Update grammar for annotations and triple terms using `~` prefix for reifier. * Remove extraneous statement on curSubject and curTriple. * Note on old vs new reification. * Fix duplicate example identifier. * Remove reference to "asserted triple", and fix reference to "annotation" production. * Update to use "reification" and "rdf:reifies" instead of "triple occurence" and "rdf:nameOf". * Fix object of tripleOccurence to be `object`, not `ttObject`. * Update description and processing instructions for triple terms, triple occurrences, and annotations. * Grammar updates for triple terms and occurrences. Co-authored-by: Ted Thibodeau Jr <[email protected]> Co-authored-by: Pierre-Antoine Champin <[email protected]>
42c86c3
to
9d9c7a4
Compare
Squashed and force-pushed to rebase to main. |
Agreed to merge in WG meeting on Aug 01
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GitHub suddenly told me that this was merged... an hour ago. sigh
I'm thinking it will be faster/easier for you to put these into a new PR than for me, but I can do it if it's a burden.
|
||
<pre id="ex-quoted-triple" | ||
<a href="#grammar-production-ttObject"><code>ttObject</code></a>, | ||
optionally follwed by a <a href="#grammar-production-reifier"><code>reifier</code></a> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm making up this "reifier delineator" (other wording may be better), but something like it must be called out as being here, between the ttObject
and the IRIREF
/BlankNode
.
optionally follwed by a <a href="#grammar-production-reifier"><code>reifier</code></a> | |
optionally followed by a reifier delineator (tilde, `~`) | |
with a <a href="#grammar-production-reifier"><code>reifier</code></a> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the optional reifier is not present, a fresh RDF blank node is allocated, | ||
as with `<< :subject :predicate :object >>`.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the reifier delineator (~
) be present without a following IRIREF
/BlankNode
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not allowed by the grammar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no technical problem with ~
used as Ted suggests. The usage of wantign to allocate the reifier id once when asserted for common use later has up before.
It has uses to allocate the id at that point so that there is one reifier agreed between later updates.
:s :p :o ~ .
INSERT { ?e :added ?t } WHERE { :s :p :o ~?e . BIND(NOW() as ?t) }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is in the context of reifiedTriple
, so I don't see a case where << :s :p :o ~ >>
does any more than << :s :p :o >>
, and is certainly not allowed by the grammar (at least the grammar that is included now).
In the annotation use case, :s :p :o ~
could, indeed, allocate a new blank node, although one which cann't be referenced for creating more triples. It could be equivalent to the following:
_:bn rdf:refies <<( :s :p :o )>> .
Where the :_:bn
node is freshly allocated. Still not allowed in the existing grammar, and would require something like the following:
annotation ::= (('~' (iri | Blank Node)?) | '{|' predicateObjectList '|}')*
IMO, better to have a single reifier
rule for both reified triples and annotations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Turtle, yes, <<:s :p :o >>
and <<:s :p :o ~>>
amount to the same thing.
The :s :p :o ~
also asserts which is the preparation for later use is more motivating. It's more regularity to include it in <<>>
.
better to have a single reifier rule for both reified triples and annotations.
Agreed.
like `<< :subject1 :predicate1 << :subject2 :predicate2 :object2 >> ~:IRIREF1 >>` | ||
or `<< :subject4 :predicate4 << :subject3 :predicate3 :object3 ~:IRIREF3 >> >>`.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why
instead of
? And why are they sometimes doubled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They should not be doubled, but obviously, the purpose of the
is to keep the triple elements together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if the line is longer than the viewpane, it should be allowed to (and fairly clear that it did) wrap. Forcing no-wrap seems likely to cause more confusion than it saves.
which provides a convenient shortcut. | ||
An annotation can be used to both assert a triple and have that triple be the | ||
An annotation can be used to both assert a triple, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An annotation can be used to both assert a triple, | |
An annotation can be used to simultaneously assert a triple, |
of the <a href="#grammar-production-predicateObjectList"><code>predicateObjectList</code></a> | ||
contained within the annotation delimeters. | ||
If explicitly identified, the same reifier can then be used as the | ||
<a data-cite="RDF12-CONCEPTS#dfn-object">object</a> of additional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<a data-cite="RDF12-CONCEPTS#dfn-object">object</a> of additional | |
<a data-cite="RDF12-CONCEPTS#dfn-subject">subject</a> or | |
<a data-cite="RDF12-CONCEPTS#dfn-object">object</a> of additional |
</p> | ||
|
||
<p class="note">The annotation syntax is a syntactic short cut in Turtle, | ||
<p class="note">The annotation syntax is a syntactic shortcut in Turtle, | ||
and the RDF Abstract Syntax [[RDF11-CONCEPTS]] does not | ||
distinguished how the triples were written.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
distinguished how the triples were written.</p> | |
distinguish how the triples were written.</p> |
@@ -1533,6 +1609,8 @@ <h3>Parser State</h3> | |||
|
|||
<p>Parsing Turtle requires a state of six items:</p> | |||
|
|||
<p class="ednote">Describe parser state for tracking reifier to associated with an annotation block.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<p class="ednote">Describe parser state for tracking reifier to associated with an annotation block.</p> | |
<p class="ednote">Describe parser state for tracking reifier to be associated with an annotation block.</p> |
The term constructed from this production | ||
is composed of an identifier from either the <a href="#grammar-production-iri"><code>iri</code></a> | ||
or <a href="#grammar-production-BlankNode"><code>BlankNode</code></a> productions, | ||
if present, otherwise from a fresh RDF <a data-cite="RDF12-CONCEPTS#dfn-blank-node">blank node</a>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The term constructed from this production | |
is composed of an identifier from either the <a href="#grammar-production-iri"><code>iri</code></a> | |
or <a href="#grammar-production-BlankNode"><code>BlankNode</code></a> productions, | |
if present, otherwise from a fresh RDF <a data-cite="RDF12-CONCEPTS#dfn-blank-node">blank node</a>. | |
The term constructed from this production | |
is composed of an optional identifier from either the <a href="#grammar-production-iri"><code>iri</code></a> | |
or the <a href="#grammar-production-BlankNode"><code>BlankNode</code></a> productions; | |
otherwise, from a fresh RDF <a data-cite="RDF12-CONCEPTS#dfn-blank-node">blank node</a>. |
is composed of an identifier from either the <a href="#grammar-production-iri"><code>iri</code></a> | ||
or <a href="#grammar-production-BlankNode"><code>BlankNode</code></a> productions, | ||
if present, otherwise from a fresh RDF <a data-cite="RDF12-CONCEPTS#dfn-blank-node">blank node</a>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is composed of an identifier from either the <a href="#grammar-production-iri"><code>iri</code></a> | |
or <a href="#grammar-production-BlankNode"><code>BlankNode</code></a> productions, | |
if present, otherwise from a fresh RDF <a data-cite="RDF12-CONCEPTS#dfn-blank-node">blank node</a>. | |
is composed of an optional identifier from either the <a href="#grammar-production-iri"><code>iri</code></a> | |
or the <a href="#grammar-production-BlankNode"><code>BlankNode</code></a> productions; | |
otherwise, from a fresh RDF <a data-cite="RDF12-CONCEPTS#dfn-blank-node">blank node</a>. |
Sure, I can incorporate this into a followup PR. |
This was discussed during the rdf-star meeting on 26 September 2024. View the transcriptsyntax for reifiers<doerthe> I have to leave, sorry ora: I think the main point of contention is whether this is prefix or postfix gkellogg_: that, and tilda versus pipe or other characters. ora: AndyS, you make a point about ease of parsing AndyS: not just that. The pipe is already used in SPARQL, although there are ways around that. tl: I made a few proposals, including the use of pipe everywhere, and replacing the curly brackets in the annotation syntax. ora: you are saying this is a usability issue. tl: yes, it is the interface, it is important to get this right. niklasl: I agree, affordances are important, that's why the pipe is tricky because of its use in SPARQL. pchampin: I agree that this is turnning into a broad discussion we can't do in a short amount of time. niklasl: agreed, long prefix makes things hard to read <ora> STRAWPOLL: Postfix? <ora> +1 <gkellogg_> +1 <pchampin> +1 <tl> +1 <niklasl> +1 <pfps> 0 <Dominik_T> 0 <gtw> +1 <TallTed> +1 <AndyS> +1 <eBremer> +1 <ktk> +1 <ktk> Tpt: are you around? <Tpt> I am back Ora: there is still the question of which character we choose ora: There are arguments against | ora: There will re reifirers without annotations blocks and annotation blocks without reifiers ora: if you see an annotation block after a reifier, it is related to this reifier so there is some memory needed <tl> my 5cents on syntax: https://lists.w3.org/Archives/Public/public-rdf-star-wg/2024Sep/0073.html AndyS: it's easier than doing RDF list gtw: Have we a concised summary of the various syntaxes? <AndyS> https://github.com/w3c/rdf-turtle/blob/main/spec/turtle.bnf <pchampin> << :s :p :o ~ :r >>. <tl> Souri asked for that <niklasl> I tried to have a bunch of variants appear "naturally" in https://niklasl.github.io/rdf-docs/presentations/RDF-reifiers-1/ Slide 19 uses that form. tl: I would like to point this syntax proposal but I thought we would do syntax later : https://lists.w3.org/Archives/Public/public-rdf-star-wg/2024Sep/0073.html <pchampin> :s :p :o ~ :r1 ~ :r2 {| :a :b |}. gkellogg: you can insert more than 1 annotation or refifier <pchampin> :s :p :o ~ :r1 ~ :r2 {| :a :b |} ~ :r3. gkellogg: in any order pchampin: if there is no reifier before annotations, the reifier is a blank node AndyS: what I find odd is that the annotation block have to have at least one predicate object inside AndyS: it makes generating this kind of syntax from a program more complicated <niklasl> That empty annotation blocks weren't allowed did trip me up in my introductory slide (8) for annotation sugar. ora: Both Turtle and SPARQL use predicateObjectList+ <niklasl> So +1 from me for allowing it. Makes it easier to save hand-edited, unfinished turtle... <tl> from my proposal: { :s1 :p :o . :s2 :p :o | :r1 } [| :a :b |] . <AndyS> :s :p :o ~ :r1 ~ :r2 {| :a :b |} {| :c :d |} <niklasl> <s> :p <o> ~ <r1> {| a :Named |} . <s> :p <o> ~ <r1> ~ {| a :NotNamed |} . AndyS: Are you suggesting we have an empty annotation block to "cancel" the preceding reifier? <niklasl> See above line. :) gkellogg: you can do "~ {|" to get a blank node <tl> from my proposal: <| :s :p :o | :r |> :a :b . tl: We should keep {} for group of statements, not annotations tl: If we change the abstract reified triple to <<| we use pipes everywhere tl: That way the pipe would be everywhere we use RDF-* gkellogg: I am afraid it collide with N3 where they use | for object paths gkellogg: the triple object can be a path, and I believe it can include "|" gkellogg: This would be against a bare pipe <Dominik_T> gkellogg can you provide a link or an example where in N3 pipe can be used? pchampin: I would like to come back to the previous topic, my personal opinion is that ~ without identifier is a bit strange. I would argue it's not ncessary required we can still write ~ [] gkellogg: A [] now means bnode property list gkellogg: If we allow empty annotation blocks, it's also a way to avoid the empty ~ <gtw> I believe per the current Turtle draft spec, [] would be valid per the reifier rule: `reifier::='~' (iri | BlankNode)?` (via BlankNode) AndyS: I think it's a bit confusing because it would be the only place where you can have [] but not [ propertyObjectList ] ora: If we confuse users it's not going to lead to anything good ora: We have this think with multiple reifiers and annotations. Is it really relevant? ora: I don't want for people to start to write things and getting it wrong <niklasl> Pro/con: <s> :p <o> ~ [ :date "2024" ] . # Pro: Regularity, same syntax for bnodes. Con: may be odd in combination with the naming-and-describing pairing mechanism. ora: Syntax discussions are often more difficult semantic discussions <niklasl> +1 for syntax being more difficult (also: "there is only syntax") ora: It would be nice if we can break this up into a series of decisions ora: would be nice if somebody take the trouble to figure out which decisions we have to make, we would have examples of the variants pchampin: if we keep "<<" we need to keep it consistent with what people expect from the CG tl: << has been used also for asserted things tl: what part of history do we refer to when we talk about user assumptions? <pchampin> q. pchampin: To be clear I said "if we keep" the <<, getting ride of it alltogether is a way to solve the problem gkellogg: It would be nice to make a decision, everything depends on it ora: it's unfortunate that the syntax PR has been opened for such a long time with not enough attention ora: People often take tiny interest on syntax, way less than it is warenteed ora: I am open to suggestions how to do this AndyS: we should take this offline ora: agree we do this offline, in a way we ended up in a place I did not wanted to end, fighting over these things ora: I suggest chairs will pick this up and will go from there <pfps> which PR? <pchampin> w3c/rdf-turtle#51 pchampin: In the interest of splitting into multiple decisions, I think we can bundle the brackets for triple term, unasserted triples and annotations |
I hope this is still valid, and it is good to know.
I beg to differ: we may have been working for years on this, but we're still not in the situation where we have to cater for an installed base. We can still do what we want, and we should strive for a design that is coherent and compelling. Updating our examples or getting confused in discussions by examples from different periods is a minor problem compared to users of the finished spec having to deal with the side effects of some tactical decisions forever. |
quotedTriple
(and related) totripleTerm
(Note, this could perhaps be "Reified Triple Term", as "tripleTerm" and "reifier" have subtly different meanings).annotation
to allow an identifier. (Note, this change makes the grammar no longer context free).triple term
being defined in RDF Concepts.Preview | Diff