-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updates rdf:JSON value space. #66
Conversation
Differences between I-JSON and JSON: I-JSON does not allow any UTF-16 or UTF-32 encoding These are not I-JSON but are JSON { "a": 1, "a", 2 } "\uDEAD" This might be I-JSON because I-JSON only says that numbers with too much precision for IEEE floating point double SHOULD not be used 3.141592653589793238462643383279 This might be I-JSON even the SHOULD above is MUST because it is unclear whether this has too much precision 1.2417634328206376 |
For reference: RFC7493: The I-JSON Message Format Also note RFC7493 Errata |
As parsers don't necessarily agree on the cases outside I-JSON, it makes sense to me to stick to that. If we go beyond it, we need to define the outcome of those cases and we are expecting a custom written JSON system which is too much to expect. |
A big problem with this PR is that it also updates the lexical space. There is an official grammar for JSON and the lexical space in this PR does not correspond with the grammar. |
This was discussed on the JSON-LD CG call today. The general consensus seems to be that using I-JSON as the value space makes sense, as it promotes interoperability, which is after all the primary purpose of these specifications. Changes to the value space are acceptable if the net consequences are the same (meaning SPARQL results sort order, effectively). Regarding the potential for duplicate keys, RFC8259 says they SHOULD be unique. I believe this is as strong as they could go given the history of the format. If they are not unique, it leads to unpredictable behavior. This also suggests that I-JSON serves the purposes of interoperability. |
Proposal: The lexical space is anything, JSON-ish. The value space is the standard (appropriate RFC) that defines the web-community agreed subset + a extra value which is "undef". "undef" is like SQL null or IEEE NaN. It does not value-equal itself. |
Proposal: The lexical space is JSON. |
spec/index.html
Outdated
Two values are considered equal | ||
if they are the same <a data-cite="RFC8259#section-7">string</a>, | ||
<a data-cite="RFC8259#section-6">number</a>, or | ||
literal value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So "\u0020" and " " are different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe parsing strings with escape sequences resolves those escapes (as it does in RDF). There is a canonical form when serialized via JCS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want a data structure that is some transformation of the result of parsing a JSON string then you have to say what this data structure is and what the transformation is. This is going to involve, for JSON strings, things like handling surrogate pairs (and unpaired surrogates), or maybe just using UTF 16 code units, as well as handling escape sequences.
spec/index.html
Outdated
with the constraints of the value space described above.</li> | ||
</ol> | ||
<dd>maps every element of the lexical space to the result of parsing it into a | ||
<a data-cite="RFC4627#section-2.1">JSON value</a>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the value for "1.0"^^rdf:JSON is different from the value for "1.00"^^rdf:JSON. This does not work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my interpretation, they are different terms having the same value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not so. There is a missing jump from parsed JSON texts to some sort of data structure. JSON strings, numbers, objects, and arrays are strings, or maybe parse trees.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not against the value space of rdf:JSON being recursively defined as the union of TRUE, FALSE, NULL, finite sequences of Unicode UTF 16 code units, the finite numeric part of IEEE double floating point numbers, finite sequences of value space elements, and finite partial maps from finite sequences of Unicode UTF 16 code units to value space elements but that is not what is being said here.
What's "parse" for multiple fields? For long numbers? Where is this implemented? |
The problem with this is interoperability and what Similarly, JSON Object keys need to be unique, for all practical purposes, as the underlying Hash/Map data structures they parse to typically do not support multiple map entries with the same key. For RDF purposes, literals are restricted as described in RDF string. While we could define a separate space for JSON strings, this also does not serve a practical purpose. Trying to stick to the letter of the JSON grammar does not serve a practical purpose, is not what developers actually use JSON for, and cannot be adequately tested for real systems. I-JSON (RFC7493) was created to describe a profile of JSON closer to what is actually used; particularly important, as RFC8259 (JSON) effectively cannot be updated. Furthermore, as discussed in the meeting, JSON-LD has effectively used JCS (RFC8785) as the serialization format for Me feeling is that the |
JSON parsing. This is the same as specified in this PR. "The lexical-to-value mapping |
RFC 4627 is formally obsoleted by RFC 7158, 7159 and now RFC 8259. RFC 8259 leaves it to the implementation. https://datatracker.ietf.org/doc/html/rfc8259#section-4 and other sections. |
From RFC 8259:
Later on:
And there are also definitions for string, number, and object. |
object - "The names within an object SHOULD be unique." gives ambiguity as recognized by "When the names within an object are not unique, the behavior of software that receives such an object is unpredictable." object - "An object structure is represented" - is the representation unique? By the text, dropping all the pairs is legal!
is the "value" the recognized parser rule? So This is why there are other RFC - the exact formalization of JSON syntax in 8259 does not give a unique lexical to parsed form mapping. |
@afs Welcome to the world of JSON. |
https://www.w3.org/TR/xmlschema11-2/#f-doubleCanmap is a definition of a canonical lexical space for XML schema double. It might fill the bill if it is acceptable for JSON people. |
A proposed update includes defining the Lexical Space as "the set of all RDF strings that conform to the JSON Grammar as described in Section 2 JSON Grammar of [[RFC8259]]". Adding a note that "[[[RFC8259]]] [[RFC8259]] allows strings to include surrogate code points which are not allowed in RDF strings, thus the lexical representation of JSON literals excludes those including surrogate code points." The current PR also requires that the lexical representation also conform to I-JSON, which could obviate the need for that note. To address the grammar/value considerations of using the JSON definitions of value primitives in the current PR, we should probably again differ to INFRA for the definition of Also, note that aligning the value space with I-JSON (required for JCS) means that some values in the lexical space may not have values (e.g., JSON including surrogate code points), may loose precision, or may result in the same value from two different lexical representations (e.g, We also need to define how two JSON literals are compared, which would be by comparing their canonical representations as strings by performing the lexical-to-value mapping and canonicalizing the value as described using JCS. I believe this definition satisfies the needs for JSON-LD. If the lexical space is not limited to I-JSON there can be undefined behavior in the lexlcal-to-value mapping for lexical values which are not I-JSON. |
JCS should be sufficient for us to define a canonical lexical representation. JCS defers to ECMA-262 for serialization of numbers. If there is a mismatch between the value space for XSD double and ECMA-262 we may need to add some clarification along the lines of |
Somehow I have forgotten how to put in a suggested change to the PR. Here is what I think needs to be added at the end of the section. The lexical-to-value mapping MUST produce the same results as Because the value space for rdf:JSON includes positive and negative infinity both the |
There is still the problem with the use of ordered maps. Saying that they are to be treated as unordered is not sufficient as that is ambiguous between using an ordered map but not being able to access the order and using (the undefined) unordered map. Either some clean definition of a map must be found or a definition for unordered map must be included or some wording that does not need a separate definition must be used. |
We already say that a JSON Number is mapped to an xsd:double using a method consistent with
JCS already has such a requirement
|
If there's a way to add wording to make use of the existing (ordered) map definition, I'd be inclined to do that. Alternatively, we could define our own |
Because this is a difference from other JSON-related documents.
The point is precisely because JCS has this requirement thus it can't be used to serialize rdf:JSON values. |
My point is that we need to be unambiguous and the current wording is not. I don't see how ordered map can be used because has different values for { "a": 1, "b": 2} and {"b":2, "a":1}. A SPARQL implementer would be justified to have the values compare as not equal even though there was no way of extracting the ordering. I suppose that one could write something effectively saying that these two different values must compare as equal (and identical and whatever else might distinguish between them) but at this point it is much better (and safer) just to start with the correct datatype. |
I believe that the appropriate thing to do is, in fact, to terminate with an error. In any case, as we don't get into canonicalization, I don't think it's appropriate for us to place restrictions on other specifications. The best we can do is to have a note that explains that due to such considerations, not all values can be serialized back to JSON.
Well, we are explicit in how to determine if two JSON values are equivalent, which considers the unordered nature of maps. We could add a note that highlights these cases and how they are equivalent, according to our definition. I believe that our definition of map equality normatively makes this unambiguous, but an informative note may help drive this point home. If there's other specific language to clarify our intention please suggest something. I think that defining our own definition for unordered map is probably not a good idea, as who would go looking to RDF Concepts for a definition of an unordered map? Better to qualify what we mean by saying that we treat the INFRA map as unordered. |
I just noticed that the value space definition is completely messed up as it does not specify the target space of maps nor values of arrays. This needs to be fixed. |
If we say it's an unordered map then it's an unordered map. |
The definition points to the following text 5.2. Maps An ordered map, or sometimes just "map", is a specification type consisting of a finite ordered sequence of tuples, each consisting of a key and a value, with no key appearing twice. Each such tuple is called an entry. So a map has an order. |
You're going to need to provide some more context for this. It's not clear to me what is missing from the L2V description on how |
A map to what? Integers only? Emoticons? Rocks? People? The value space is recursive and needs a recursive definition. |
It seems properly recursively defined to me:
How can that be made more clear? Implicit in the object mapping is that the member name is mapped to a |
The problem is not in the lexical-to-value mapping but is instead in the value space definition that needs to say what the maps are to (and from, as well) and what elements of lists are. It would be perfectly consistent with the above to have maps be from integers to integers and lists to only have integers as values. |
Perhaps if the value space were defined as follows:
|
Co-authored-by: Ted Thibodeau Jr <[email protected]>
The JSON-LD WG today resolved to adopt this definition for the |
That's concerning as the current definition in this PR is still inadequate. The value space contains elements like the list that is its only member. There is nothing wrong with this, but I don't believe that such values are part of JSON and they certainly cause problems in serialization. If these values are not wanted a datatypes expert should be consulted to provide the best wording for excluding them. The value space still contains orderd maps (with insignicant order). It would be much better to start out with a map datatype that is unordered instead of going through all the contortions to try to eliminate all vestiges of the order. The lexical-to-value mapping does not specify the result of the map for Objects, Arrays, Strings, or Literal Names. For example, the literal name true could be mapped to any of true, false, or nil. For another example, there is no description of how JSON character escapes are handled. The JSON string "\u1234" could be mapped to the value "A" and still be compliant with the lexical-to-value mapping. There may be other problems that are obscured by these. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved just so that the PR can be merged. This is not an endorsement of the currrent state of the PR.
spec/index.html
Outdated
<a>strings</a>, | ||
numbers (<a data-cite="XMLSCHEMA11-2#double"><strong>xsd:double</strong></a>), | ||
<a data-cite="INFRA#ordered-map">maps (unordered)</a>, | ||
<a data-cite="INFRA#list">lists</a>, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the members of lists?
This updates the
rdf:JSON
value space as discussed in #65 to be the result of parsing the lexical form into a JSON value (array, object, string, number,true
,false
, ornull
). It does not try to suggest what any of these values mean outside of the JSON definitions themselves.Fixes #14, and fixes #65.
Preview | Diff