-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why quoted triples, when we already have named graphs? #46
Comments
As long as named graphs can't be nested in the syntax they will always be used for one primary purpose mainly. What that purpose is is a very application specific question. It must not always be the same purpose, but managing provenance is a natural default in an integration-focused project like the semantic web. |
I'm sure a NGS design could be done. (NGS = Named Graph Singleton") But as already noted, named graphs are already in use for e.g. data management - a non semantic usage pattern - indeed the concepts behind Named Graphs existed before SPARQL. Some systems have the default as the union of named graphs. One use case for quote triples is referring to triple as triples - the "syntactic case". "Triple was added to database" as used in bitemporal modelling for "as it was recorded", different to when a fact was original asserted. The challenge that I see is what is the reasonably minimal building block. Encoding abstractions with triples (c.f RDF Lists, Reification) has practical usage problems. |
The point is nesting. Then different levels can have different purposes. The outermost named graph could still be used for data management.
Defined in Named Graphs, Carroll et al 2005, with a very definitive semantics, which was largely ignored in practice. My take on that history is: if a semantics doesn't match the predominant intuition and needs of users, it won't survive. RDF-Star semantics as defined in the CG report will face the same fate.
You will always find some system that has some default that doesn't match a certain proposal. We have to be pragmatic. A pragmatic approach to named graph semantics IMO would be: formalize the "data management" semantics as the default - names address graphs, graphs are just ordinary chunks of RDF data - and add a facility to specify other semantics on demand. Plain and simple IMHO.
Define reasonably. The quoted triple as defined by the CG report is indeed minimal. The downside however is that the overwhelming majority of use cases will have to add another triple to refer to an occurrence, and two more triples to implement the TEP. All that for the sake of a very specific use case and an unwillingness to tackle the type/instance problem (which will not go away, ever). This is a design that is minimal in what it achieves, but maximal in the trouble it causes. Reasonable? Hardly. I can only repeat that I find the graph literal datatype as proposed by Antoine Z. and before by Ivan H. much more convincing as a way to satisfy the need for syntactically faithful representations of triples: it has clear and intuitive semantics that even a novice user will immediately understand, it represents a minimal extension to RDF, it perfectly captures the meaning of blank nodes if the graph captures the whole CBD. What more could you want? |
@rat10 (and others) — Please avoid the temptation to use every acronym/abbreviation available to you without linking to the meaning you intend. I stumble over many of these, even when I have known their (typical) meaning for years, significantly but not only because they are not always used for that meaning. TEP? Probably but not certain to have been intended to mean CBD? Probably but not certain to have been intended to mean Concise Bounded Description, which has very limited validation as a W3C Member Submission ("This document is not the product of a chartered W3C group, but is published as potential input to the W3C Process.") Even the recent citation of the Named Graphs paper was challenging to run down. I think this link gets what was intended. I note that there is no official W3C Standard or Draft for Named Graph, though there does exist a W3C Editor's Draft. |
"Named Graph" was introduced in SPARQL in the definitions section
It is in RDF Concepts where it was generalized: https://www.w3.org/TR/rdf-concepts/#dfn-named-graphs |
@TallTed This is a discussion board. I don't need to treat entries here with the same diligence than a mail to the list. TEP and CBD have already been discussed in meetings and on the mailinglist, the Named Graph paper is easy to find with the information I gave. The TEP is the only bridge the proposed semantics provides to implement referentially transparent embedded triples, so the acronym should be familiar to anyone with a passing interest. The concept of CBDs has been around more than 20 years now, and I do even write it out half of the time I use it. I think that if you are a member of an RDF 1.2 WG you should feel compelled to look these acronyms up if you don't know them already, and you should be able to disambiguate the correct reference from any other Google result on "TEP" or "CBD" (if there are any). That much I think can be expected from participants in this discussion. |
Actually, I'd say it's the other way around. A mailing list, even if it does get archived, is an ongoing conversation which the subscribers are following; if you don't understand, you can ask. An issue tracker is far more something that will see renewed interest much later, when a poster need not be around anymore. In particular, due diligence before posting an issue is that one at least scans old issues to check if one's own issue is a duplicate of an old issue. That's hard to do if the comments are too obscure.
The published public draft for RDF 1.2 concepts refer to this tracker for feedback. It would be wrong to assume it is only visited by WG members. |
@lars-hellstrom Okay, I'll take that into account in future posts. Still, my feeling is that too much of these formal demands rather stiffle discussions than enabling them. And is the mailing list now not anymore considered part of the conversation that I can rely on that readers know about? I did recently write a really exhaustive review of the TEP after Olaf asked me to. Do I have to link to that every time I mention the TEP in this issue tracker? |
To come back on the discussion: As far as I know there are multiple semantics for named graphs which are all implemented. So, if we want to bind rdf-star to named graphs we would need to fix their semantics. Given the discussions we already have here and the fact, that the multiple semantics are captured in a WG-report (that is: there was a discussion), I am not as optimistic as @rat10 that we would come to an agreement. I'd like to be wrong here :) I think we first need to fully understand the use cases and decide based on these where (not) to go. |
@afs wrote:
Do you mean "implementing quoted triples in terms of named graph singletons"? (Feels like 'singleton' should be qualifying 'named graph' rather than the other way around, but whatever.) Effectively that would mean that in Turtle-star, the quoting That violates the spec in that quoted triples are no longer distinct from blank nodes. (I'm not sure whether thinking of them as being drawn from a separate namespace makes sense, especially not when considering dataset isomorphism.) A naive implementation also violates the spec in that multiple quotations of the same triple might generate new blank nodes representing the same quoted triple, but I suppose you could get around that by keeping a table of named graph singletons and always picking the first match from that table. (What if someone explicitly created a named graph singleton, though — would that be appropriated to represent the quoted triple?) Finally this would allow for creating data structures that indistinguishable from cyclic quotations:
So it seems such a design has some issues to nail down. But the point of my original question was: What do you (the WG) say to an implementor (not me) who feels quoted triples ought to work in that way? Why shouldn't it be correct?
Is this the data structures versus structured data distinction? (And it's probably called something else in the CS literature, sigh.) The programming language mainstream don't bother to support composite values, because there are tools (such as cons cells) for building data structures into which you can store as many atomic values as you wish, by allocating dynamic memory as needed. This is not quite the same as proper composite values (a difference that becomes apparent when you work with computer algebra), but the opinion which seems to be dominant among language designers is that it should be enough. Just like on a heap you can build data structures nested to whatever depth you want, named graphs with blank node names can in principle represent nesting to whatever depth one wants. The Turtle serialisation may not show that nesting, but you could regard But does the WG want to do it that way, or do you want it to not be done that way? |
I was alluding to the syntactic issue. For one, replacing <<...>> with nested {...} makes for a much saner syntax. It also makes the question obsolete what the difference between << :s :p :o >> and { :s :p :o } might be - there shouldn't be one IMO. It also solves the question of quoted graphs, i.e. "why don't we have << :s :p :o. :x :y :z>>?" because we would have {:s :p :o. :x :y :z} just as well. This makes navigating and querying much more straightforward, as there would be no need to e.g. check for the provenance of a statement twice: as an annotation on a quoted triple or on the named graph containing it. Blank nodes would be local to graphs, as in Pat Hayes' BLogic proposal (see this PDF for an introduction) and that way able to avoid some of the contortions in the CG semantics. [EDIT] The reasons why we are discussing RDF-star today and not Notation3 or other approaches to Named Graphs are IMHO largely social and "political". The discussions in the RDF 1.1 WG must have been intense, judging from the mailinglist archive, and it seems that nobody wants to get burned again with that topic. But OTOH a few years ago also only very few people would have thought that a proposal like RDF-star that requires some mayor tweaking in the installed base (specs, serializations and code), would garner such wide support. IMO that support is foremost an expression of a desire to get a sound and concise meta-modelling mechanism, but not very specific to the RDF-star proposal itself. When I ask people why they like RDF-star, or hope for it, they mostly are not aware of its pitfalls and consequences. They just hope for some rubber-stamped solution that they can use for what they actually care about. |
In trying to understand the possible relationships between quoted triples and named graphs (as I elaborated upon here), I think the notion of "graph literals", denoting themselves, help. It appears to me that the reason for not defining semantics of named graphs has been about the relation between a graph and its name, not "what a graph is". The graph is already defined as a set of triples. A named graph could be just (in Notation 3, where a formula are "literals which are graphs themselves"): <g1> rdfg:nameOf { <s> :p "o" . } . (Note: In an imagined future where TriG 1.X acquired such a syntax it would be nigh ambiguous with named graphs, so perhaps some "quotation" marker, such as This also makes it possible to define subproperties of rdfg:nameOf, e.g. Such "graphs themselves" would be opaque until explicitly linked together, with "transparency enabling"properties" between named graphs. This is exactly like how named graphs in a dataset are often managed in practise, just with explicit semantics. But what would such graph literals mean in the default graph of an implicit or explicit dataset, or "within" a named graph? This is where opacity is an issue. Are graph literals "visible" there? Named graphs are not. as I understand it. So I would define them as not, until "enabled". Within a graph that "enabling" could be done with Another question is whether a quoted triple is such a singleton graph (one statement), or still contained by it? As I mentioned in the referenced email, RDF 1.1 semantics state:
Given that plus a notion of graph literals gives us a yes, they are identical. This sets it even further apart from a reified statement though, which could, IIUC, be defined as unique, given OWL, with There are practical questions in play too. Given that named graphs are what is currently managed as "units of description" in quad stores, it would be necessary to make some room for these self-denoting graphs. Given that room is already being required for quoted triples I think it would be beneficial to consider this larger question of "nested graphs". Technically, a "special URI" made by uniquely hashing its canonicalized ntriples representation (e.g. like Storing such as actual specially-named graphs in separate "documents" or "contexts" could work in existing implementations as is (with some code wrangling, cf. how we handle bnodes, especially RDF lists). But it may be untenable for lots of small graphs (see e.g. this performance assessment). This is also a lot of overhead for the case of annotated triples (both asserted and quoted). This may be another practical reason for keeping single triples within the same "graph storage unit". But I think that's an implementation detail. (At the National Library of Sweden we store RDF as just JSON-LD, so we basically side-step all of that (at a cost). This of course has a lot of bearing upon my view of RDF data as "raw", with all entailment happening upon that.) Some more motivation to think along these lines is that quoted triples quickly blow up syntactically when you need to talk about, i.e. quote, facts like: <bob> :knows [ a :Person ; :name "Mary" ] . or: <abc> :hasOrderedParts (<a> <b> <c>) . And we still need separate graphs to capture things like: <ng1> { <mary> a :Person ; :name "Mary" . }
<ng2> { <mary> :name "Mary" . }
<ng2> rdfg:subGraphOf <ng1> . With all this said, I must stress that we have a need for quoted triples, often also asserted, i.e. annotated triples. But as we use named graphs for all provenance management (including data from other sources), I strongly believe these ought to be explicitly co-defined in RDF 1.2, and not just being "different" in an undefined way. See e.g. json-ld/json-ld-star#45 for more thoughts and needs. (Also, see this wikidata experiment for related use of annotations for provenance.) |
A post that IMO didn't get the attention it deserves. However, to answer the main question: many people just don't want to get "there" again - "there" being named graphs and all the stressful disagreement around them. A truce was called in RDF 1.1 that left many people unsatisfied, but is a truce nonetheless. IMO this is a social issue rather than a technical one. A technical solution that just adds to what we have now, without disturbing anything that exists, is perfectly possible. But there is a very outspoken reluctance to go there. Just check recent WG minutes, 15.6.23 and 22.6.23. The RDF*/star CG didn't engage in discussing the issue, instead taking Olaf's dictum that the two approaches are "orthogonal" as the last word on it.
Basically: yes. The graph name "identifies" a graph. It doesn't necessarily denote it, but may mean something else. That is in line with the muddled way how identification works on the semantic web in general (some background). It just gets more apparent here because the thing addressed is not a thing in the real world or a resource on the web (e.g. an HTML document) but another piece of RDF data.
This I can't agree with. IMO an RDF 1.1 named graph, lacking any further specification, can only be assumed to be referentially transparent, just like any other set of triples, because that is the way RDF is defined. Can you explain how you come to the conclusion that in practice they are managed to be referentially opaque?
I'm pondering a design in which RDF 1.1 named graphs are left as they are (of course a vocabulary to describe their intended or application-specific semantics should be added), but nested graphs - using curly brackets as well - would be defined with a clear (and configurable) semantics.
From what I heard they are rather implemented as triples with an identifier and a special marker. Maybe that "special URI" is the identifier? Well, implementation detail...
My proposal to the WG would be to say that RDF 1.1 named graphs are meant to implement application-specific semantics. RDF currently provides no means to explicitly describe their meaning, but such a facility could (and IMO should) easily be added. Nonetheless there are things that RDF is too weak to do and that can only be achieved through out-of-band means. That's what RDF 1.1 named graphs are for, and should remain to be used for (in the absence of any other grouping mechanism they also are and will be used as an optimization technique for purposes that RDF could do, albeit only on a singleton level - one of the downsides of the proposed singleton <<...>> approach and the reason why I would favor nested graphs). Any solution - quoted triples, or nested graphs or a singleton property-based triple identification mechanism or what have you - should have clearly defined default semantics (asserted referentially transparent occurrences IMO) and a syntactically concise mechanism to define other semantics. And any solution has to be able to be mapped to an n-ary relation in an unambiguous way - flattened in your words - to guarantee compatibility with RDF 1.1. |
@rat10 wrote
I agree it's nicer, but wouldn't braces for quoting get into trouble with SPARQL's use of braces for grouping? I'm thinking specifically about group patterns. Even if the context of a group pattern can be proved sufficiently distinct from the context of an RDF term that there is no confusion, one would still have to parse a lot of text before being able to determine which it is, if both use braces.
That PDF rather seems to suggest the blank nodes are local to graph surfaces, where a graph surface may contain multiple graphs, so no change from the current state of affairs in that respect. (I've come across a scheme for signatures of graphs that would put the blank node naming the graph to sign inside the signature graph, but also the blank node naming that signature graph inside the signature graph, so merely nesting graphs would not suffice for that scheme. Then again, I don't know if it was a good scheme.) |
@niklasl wrote:
I think this is a red herring. That 'is identified' has the smell of something an author writes because they don't want to burden the presentation with extra formalia (or themselves with having to write out those formalia), not something that logically means anything. The classical example is to identify the letter
That assessment is annoyingly void on details on how they actually store the quoted triples. For the other approaches they refer to Reifying RDF: What works well with wikidata, whose authors give plenty of details, but for quoted triples there is just "our product does this today". Also the dataset appears not to have any nesting of quoting, which could be skewing the assessment.
You're thinking the asserted and the quoted forms of a triple should share storage? Oh well, implementation details, I suppose. One interesting example mentioned in that Reifying RDF paper is the presidencies of Grover Cleveland. In Wikidata there were (at least at the time) two edges asserting Grover Cleveland was president of the United States, corresponding to his two non-consecutive terms as president, and these could be distinct by virtue of having different start time and end time annotations. In RDF, an edge is just a triple, no matter what assertions might have been made about that triple. This corresponds neatly to the RDF-star spec that |
@rat10 wrote:
I can understand the reluctance, and the desire to do things step by step. I don't read too much reluctance in those minutes though (albeit signs thereof), I mostly see the trouble stemming from the known issue that named graphs aren't enough. They are occurrences of graph literals, or graph terms, but the latter, which do appear to almost(?) equate with quoted triples, are beyond the charter of the WG. It is this limitation that is somewhat concerning, but the WG can still address questions thereof (and appears to want to). It appears to me that the old "graph literals", now named graph terms in the latest Notation 3 CG draft, are about the same kind of quoting as quoted triples. Perhaps it is an issue of triples versus triple sets? (The Axiom of regularity, stating that "no set is an element of itself", may provide the key difference; since as @lars-hellstrom just pointed out, the phrasing "A triple is identified with the singleton set containing it" in RDF 1.1 Concepts may indeed be a red herring.) Given that the members of these groups overlap, and following the minutes, I have high hope that this can be clarified further. I hope issues such as this provide a sample of perspectives to be taken into account, but these are indeed both technical, social and cognitive issues, all at once. (Standardization requires diligence to eliminate unnecessary differences, whilst being pragmatic enough to yield working implementations and adoption, all the while balancing predictable long-term consequences.)
I have hopes that Notation 3 is aiming for defining precisely this definition of named graphs. I believe that we can strive towards convergence in thinking, design and implementation here. RDF 1.2 won't standardize Notation 3, but it might set the stage for clarification and interoperability with named graphs, quoted triple "constituents" and graph terms as extensions of those. It is not ideal, but they need not diverge.
Two named graphs in a dataset can contain contradictions, and differences can be preserved that have perhaps in its default graph been asserted as This is supported by wording in the RDF 1.1 W3C Working Group Note: On Semantics of RDF Datasets (quoting that document):
(As a note this has no official bearing, but I hope that it states intent for further convergence. I think that the RDF-star WG can follow this intent, if only for quoted triples, to avoid creating differences that become future obstacles when standardizing semantics for named graphs or graph terms. Even if nothing normative can be defined, a Note with advice can help a lot, which I also believe there is will to make.) Also, the ongoing Notation 3 CG work currently states:
(I am not sure what "occurrence" means here as I am quite sure that a graph term ought to "denote itself"...) It also states:
(followed by a rendering of the classical Superman problem). To elaborate on that, I would say that (Also, we might say that the Superman problem lacks an essential Back to quoting and replying to @rat10:
This would need to be aligned with TriG, and with SPARQL. It might be hard, but I believe there is a desire to do so, given e.g. this thread on the semantic web mailing list. I think the core question is whether RDF-star introduces a divergent concept from graph terms, or if they can converge. A semantic difference for quoting may confuse and complicate practises, whereas one of granularity or grouping, which is more about data ergonomics (cf. RDF lists in concrete syntax vis. cons cell form) may be more palatable. Is the notion of "quotation" the same, and the differences lie in granularity? In use cases combining data sources with augmentation (through e.g. editorial work, inference or ML) using quotation or annotation on one or more subsets of facts, this is rather crucial. (I also wonder if there really is any semantic "nesting" going on, more than relations between terms (though a relation may be named
Well, they cannot be asserted triples in a graph, so they need to be "disconnected" but still both referable and conditionally interpreted (so we can find all quoted facts about e.g. To summarize: my current interpretation and hope is that the definition of graph terms, currently as part of the the ongoing work on Notation 3, might shed clarity upon the question of the possible semantics of named graphs, and that the former (graph terms) have a much closer relationship to quoted triples. The challenge is, I believe, to "pave the path" for graph terms by defining quoted triples first, rather than in conjunction. |
@niklasl wrote:
Standardizing RDF-star quoted triples now, Notation3 formulae next, without a coherent vision of how the pieces fit together into a greater whole, may very well lead to more confusion and divergence in modelling idioms. "Step by step" doesn't cut it when such a basic feature as meta-modelling is concerned. RDF semantic extensions are free to do what they want, and indeed encouraged to explore new areas. I would not mind too much if RDF-star became such an extension (although I still find it dangerously disconnected from reality and prone to misuse in practice in more than one way). I would welcome Notation3 as a semantic extension to RDF, as it seems to be a really well-thought out and rounded concept (however, also Notation3 doesn't fit the bill when it comes to e.g. Property Graph compatibility). But integrating one or both into the core of RDF without having a concept of how they should interact with each other, how they could provide solutions to the pressing issues of statement qualification, without even making a discussion of such topics a requirement - that's just an unwarranted hoping for the best.
In the Community Group I tried to discuss the whole problem space, including e.g. named graphs, statement qualification, etc., to which RDF-star claims to provide a solution, but the argument against such initiative was that "we are only working on RDF-star here, we are not trying to solve all the problems of RDF". Then the WG was chartered in a way very much reflecting the view of the CG report, but nonetheless is aiming to become "RDF 1.2" (or even "RDF 2.0" per one of the very same editors that blocked all wider ranging effort in the CG). And now "the narrow charter prevents us...". This is a circular argument, and therefore should be rejected as useless. It also doesn't hold: the WG is free to come to the conclusion that its charter is too narrow to produce a useful result and demand an extension, or dissolve without producing a spec. The WG is now responsible for what it produces, nobody else.
IMO this is angels dancing on the pin of a needle. An RDF graph is a set of RDF triples. A set is not required to contain multiple triples. It can contain only one triple just as well, or even be empty. We also know what a triple is. So, what more is there to know?
Hope alone is not gonna cut it. Some people claim that annotating a triple is completely different to annotating a graph. Some people try to justify not discussing named graphs in the context of the work in RDF-star with that argument. Some people claim that resorting to the safety of mathematical abstractions is a "prudent" approach, as if practice would care about such reserve. Some people claim that named graphs can and will never have an agreed upon semantics, as if there was no way to formalize and manage such an out-of-band arrangement. IMO these are all just lame excuses to not be bothered with the complexities of knowledge representation in the real world. Notation3 nested graphs come with a very specific semantics that doesn't help modelling of complex facts either, but is optimized for reasoning - powerful and interesting indeed, but not what was asked for in the W3C workshop on improving compatibility between RDF and LPG in Berlin 2019. Souri Das has proposed a syntax to the WG (called RDFn) and singleton properties provide a semantics that combined would meet the request from the Berlin workshop. The WG doesn't take them up and I can find no justification for that decision anywhere. Instead that pseudo-simple approach of quoted triples is pursued, favoring very specific demands of - again - out-of band issues like versioning of triples, as if we hadn't the named graph mechanism already for such purposes. That doesn't give me hope. [...]
Notation3 has a very specific interpretation of nested graphs and I don't see how that helps with much more basic needs like qualification of statements, grouping statements (as a very basic KR activity) etc. IIUC the semantics of RDF-star quoted triples per the CG report and of Notation3 formulae is very similar, modulo the treatment of blank nodes in RDF-star (which again is motivated mainly by the need to overcome the limitation of RDF-star to single triples). In my interpretation the semantics grafted on RDF-star by the CG is a reflection of the Notation3 people seeing this as a chance to introduce formulae into core RDF through the backdoor of RDF-star, despite the obvious problems (like what to do with blank nodes, or mimicking formulae as lists of quoted triples) - because they don't believe in sane W3C processes either. No, I don't share your hopes. I see tactical manouvering that reflects a W3C unable to provide vision and guidance and organize support. All I can see from the W3C process seems rather dysfunctional to me, not to mention that it is disturbingly opaque. The whole scenery also seems to reflect an unwillingness to face some fundamental problems of RDF - the issues that arise with meta-modelling and with application-specific intuitions clashing with the integration-focused set semantics of RDF. This is of course no easy terrain, but it is an illusion to think it can be avoided by restricting oneself to "safe" areas of mathematical abstractions. It's easy to get lost in rabbit holes, but trying to ignore them has consequences too.
Okay, I missed the "between" part. But IIUC there is still an important difference between such graphs internally - where IMO they can by default only be understood as referentially transparent - and the CG semantics for quoted triples which makes each term denote, but only in the specific syntactic form provided. [...]
Can you give or point to an example? I think you mean a tree-like nesting, where curly brackets stand in for blank nodes. What I'm talking about is different: whole (sets of) triples nested as vertices within triples (and those within other graphs). That can be flattened to N-Triples, but it admittedly is ugly because very heavy on blank nodes AFAICT. [...]
The RDF 1.1 WG Note on dataset semantics gives an example how SPARQL service descriptions can be used to describe the semantics of named graphs in a dataset, as default semantics and/or specifically per named graph. Add to that an IRI referring to a quotation semantics of your liking and you're all set, aren't you? SPARQL/ RDF 1.1 named graphs are designed to facilitate out-of-band means, to do things with RDF that the spec doesn't specify how to do, because e.g. they are outside the realm of the mathematical abstractions that RDF operates on. I've come to the conclusion that this is a sensible arrangement. We should stop lamenting that "named graphs have no semantics", but embrace the fact that we need another instrument, and this time with semantics, to do all the things that are within the scope of RDF, but presently can resort to no other modelling primitive than RDF standard reification and n-ary relations: nested graphs as a way to ease modelling, inside RDF 1.1 named graphs. If well designed they can also provide hooks to configure semantics, asserted-ness, instantiation etc, meeting all the advanced needs that currently people try to stuff in into only one syntactic instrument.
But who would still need the quoted triples if we had nested graphs with the same semantics? Or should they have different semantics? Then how to sensibly decide on the semantics of quoted triples without knowing what the semantics of nested graphs will be, and without being sure that they indeed will come? This needs a comprehensive architecture, not some narrowly chartered 2-years-let's-just-do-it-WG. Of course this also needs engagement and leadership. I have to confess I know nobody among the well-established members of this community who would be willing to take on such an endeavor, but maybe if its necessity becomes apparent... |
@rat10 wrote:
Could you explain what you mean by non-tree-like nesting? Having multiple triples in a nested graph is no stranger syntactically than a variadic function in a mathematical formula, and certainly fits within the tree-like paradigm. (There are cases where you have to go beyond treelike structure—it's something I've worked on research-wise—but I would be surprised if it would arise in an RDF context.)
With quads, you get one blank node per graph literal. Thus some blank nodes, but way less per atomic term than you get for the RDF encoding of lists. If insisting on triples — assigning a separate identifier / bnode for each distinct triple in the dataset, then separately collecting these identifiers into graphs — the count goes up, but probably not by that much. With
for what flattens to quads (in order graph–subject–predicate–object) as
(Since |
Sorry, I was distracted by the reference to JSON-LD which I don't know well enough. I meant to refer to flattening to triples. More specifically, IMO it is crucial to define how annotated nested graphs can be mapped/flattened to mere triples, because for those we know what they mean. And AFAICT that is possible, but would for example involve a lot of blank nodes and a special property - possibly a subproperty of rdf:value - like rdfx:primaryTopicOf. And "tree-like nesting" is indeed not a useful term to put what I wanted to refer to, which is the way that nested n-ary relations with intermediate blank nodes are expressed in Turtle. |
One thing at a time ;-) I admit I haven't put much thought into how things are queried. But a) it's complicated enough without and b) I guess some formatting (indentation and line breaks) should cover a lot of terrain. If not, then, well, maybe we have to invent a new kind of braces :-/
You got me there. My point was meant to be that with BLogic one can define a boundary in which blank nodes mean the same - there called a surface - and that boundary can contain multiple triples and even nested triples. But I admit that I don't really understand why it is so hard for RDF-star to get the semantics of blank nodes right, so I'll leave it at that. |
@rat10 wrote:
IMO it is crucial to define how annotated nested graphs can be mapped/flattened to mere triples, because for those we know what they mean. And AFAICT that is possible, but would for example involve a lot of blank nodes and a special property - possibly a subproperty of rdf:value - like rdfx:primaryTopicOf.
So on the one hand you insist on regarding the relation between quoted triple and its parts as an N-ary relation — which in a way it necessarily is, but how explicit[*] that relation has to be in the formalism is another matter — and on the other hand you also insist that this relation gets encoded in terms of triples. How is that different from reinventing RDF 1.0 reification (with ~~rdf:Statement~~ as the sought special property)?
(Edit: I got confused about the terminology. "property"=verb, so I suppose the _properties_ of the scheme would be ``rdf:subject``, ``rdf:predicate``, and ``rdf:object``.)
[*] To elaborate on the explicitness scale: In relational databases we know relations are tables — those would be explicit. You can also have relations between entities that are not explicit, but can be constructed when needed as joins of tables. Third, and perhaps most to the point here, there are built-in relations such as =, <, and NOT NULL which are too primitive to be formalised as tables. The relation between quoted triple and its parts may well fall into this last category.
|
I 'insist' that the solution should do what it is expected to do, that it should be easy to use and should put the mainstream needs of non-logicians first.
You probably are aware that the CG report defines an unstar-mapping that maps quoted triples to n-ary relations. Still the result of that mapping is not RDF standard reification: those n-ary relations describe different things. The same would be true for a mapping of nested graphs to n-ary relations: they would describe yet another thing. The ways in which those things differ may often seem arcane, but they have very practical consequences: RDF standard reification describes a statement instance without asserting it. RDF-star per CG report describes a statement type without asserting it (and syntactically constrained). Singleton properties instantiate subtypes of statements, and assert them (and the supertype statement can be entailed). Consequently singleton properties can assert qualified relations, but RDF standard reification and RDF-star can't. IMO if singleton properties had chosen a syntactic extension that puts the (familiar) supertype first, but keeps the annotations/qualifications "nearby", they might have succeeded (despite the lack of a grouping mechanism). What does this have to do with named graphs? Probably nothing. It is an orthogonal issue if the syntax is optimized for triples or graphs. IMO the syntax should be optimized for graphs (nested within RDF named graphs, as those are application-specific devices without a semantics), because the distinction between annotations on one or multiple triples is arbitrary and optimizing the proposed solution for single triples will only encourage people to use RDF named graphs for annotations on multiple triples - no matter the lack of sound semantics, and no matter the resulting cacophony of modelling styles. But RDF named graphs are probably best left to issues that can only be handled out-of-band, that are specific to an application, that don't need to be shared or that are in other ways out of scope of RDF. |
This is not meant to suggest that RDF shouldn't have quoted triples, but rather to point out that this is likely an objection that some people will raise, and the specification should have some answer. Concepts seems the most natural document in the collection.
I'm quite the beginner, so I don't know what the answer might be. Quoted triples definitely seems an obvious way of making claims about other claims, but when suggesting an encoding in terms of quoted triples to people more experienced with semantic web things, I got the reply that "nah, you do that with named graphs instead". It took me a while to grok that — much literature still paints named graphs mainly as a tool for keeping track of remote datasources — but of course a single edge named graph is a way of speaking about that edge, and in a quadstore it is more feasible to search all graphs for a triple than it is to search for a blank node reifying that triple, so named graphs overcome problems with classic reification. Then what is the point of quoted triples as a completely different mechanism?
It is possible that there is some aspect of having just one term that is the quoted triple (s,p,o), instead of any number of size 1 graphs which contain exactly that triple, that is the important difference (at least that is one thing I cannot easily see how it would be simulated), but then that should probably be spelt out. This is the kind of requirement one throws in when aiming to introduce a quoting operation, but it is not entirely clear what application would benefit from it.
The counterargument against quoted triples is conceptual and implementation simplicity: quadstores are nicely flat data structures, whereas quoting brings the complexities of nesting. Of course, representing information that is inherently nested within a flat store only means that the nesting happens at a higher level of the representation, but since this could make it Someone Else's Problem it would probably have its proponents. Why not regard
<< >>
and{| |}
merely as syntactic shorthands for constructing a named graph? (It doesn't meet the spec, but why is the spec that way?)The text was updated successfully, but these errors were encountered: