-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Value space of rdf:JSON datatype #65
Comments
As a general principle - I'm in favour of linking to original definitions where possible rather than incorporating material or normative referencing derived works which may diverge because they are for a specific or different purpose. For JSON, RFC 8259 - I think that is the original definitive place (it would mean "map" -> "object"). RFC 8259 is the current STD-90. (certainly for "string" - because a JSON string is not an RDF string or xsd:string) |
But just what are the JSON values, particularly number? |
What are the requirements? https://www.w3.org/TR/json-ld11/#terms-imported-from-other-specifications where number goes to but is that what the value space for a JSON fragment is for? If it is "JSON processors treat them the same" then https://www.rfc-editor.org/rfc/rfc8259.html#section-6 |
First, we need to decide if we want to go for this decomposed notion of a JSON value for the value space. I'm fine with sourcing RFC8259, which would get out of the problem of having to go to ECMAScript for numbers. JSON-LD (and INFRA) tend to use the term "map" rather than "object", as "object" is overly general. We can use the term "map" while still referencing the "Object" section in the RFC8259. Regarding strings, certainly the strings referenced as JSON values (or within a JSON serialization) reference "strings" from RFC8259, and may include their own escape sequences. While Of course, the other alternative is to not go with the decomposed notion of a JSON value as the value space, in which case we're dealing exclusively with RDF strings containing a JSON serialization. Note that the existing value space uses JCS/RFC8765 for the canonical form of JSON, which has similar requirements for character representation as our own, and requires implementations to terminate if a "loan surrogate" is found. |
do the two styles agree on what matches for numbers? (I think JCS does because it (in effect) goes through binary) |
JCS has the decided advantage of only processing a subset of JSON. Unless rdf:JSON is limited to that subset depending on JCS may not be possible. |
I-JSON: RFC 7493 |
JCS was used by JSON-LD to create the RDF serialization of a JSON value in the Object to RDF Conversion algorithm, so it never did allow for surrogates, although JCS was not finalized at that time, so the definition of canonical lexical form may not strictly define that restriction. Any strict update to the rdf:JSON definition within JSON-LD would use JCS directly, and further limit code points similar how we've one in RDF Concepts and disallow surrogates explicitly. Obviously, this is what I-JSON did. |
I-JSON has a lot more restrictions than just nice strings. Is rdf:JSON supposed to have these other restrictions too? If so, these other restrictions need to be stated explicitly. The nice strings restriction needs to be either stated or true. I think that it is not true currently. |
"A lot"? It is those things that make for accurate consistent parsing. |
Number restrictions to IEEE floating point double. Ok, so not a lot in absolute terms. But a large part of the JSON syntax is affected. |
It is the areas where there is no common, stable, implemented values. |
JSON doesn't allow duplicate keys (member names), either, although it is not typically an error condition; the last key wins. Limitations of I-JSON (and JCS) on string and number representation should not be a problem, as they're effectively already in place in JSON-LD due to the tacit correspondence to JCS. |
For JSON numbers, I suggest xsd:decimal. |
Why have something that has different interpretations across different JSON implementations? I-JSON/JCS reflects where JSON is standardised, de-facto and de-jure. |
The question is whether rdf:JSON is going to be the JSON that "does not attempt to impose ECMAScript’s internal data representations on other programming languages" and thus has objects containing "zero or more name/value pairs", strings as "sequence[s] of zero or more Unicode characters", and numbers as potentially unbounded decimal values or the JSON that has objects as EMCAScript objects with all "properties of an object [...] uniquely identified using property keys", strings as "ordered sequences of zero or more 16-bit unsigned integer values", and numbers as a "double-precision 64-bit format IEEE 754-2019 values". If rdf:JSON is going to be the former, then all references should be to json.org and RFCs, JSON values should not be tied to ECMASCRIPT, and string ordering should be by Unicode codepoint; if rdf:JSON is going to be the latter, then all references should be to the ECMAScript 2024 Language Specification or whatever document currently defines ECMASCRIPT and JSON values and string ordering can be by UTF-16 code unit. |
Agree. But referencing to json.org that can change at any time, is not a good idea. |
json.org has a link at the top to ECMA-404 (the link is broken (!! given the number) but ECMA-404 exists)
The warning on the EMCA-404 download page is worth noting. |
Suggest a PR that does the following:
I don't think we need to get into the relationship between JSON strings and RDF strings, or exactly what a JSON number is, other than as defined in RFC8259. Note that the lexical space is an RDF string, as any lexical value must be. |
I prefer a value space that is not tied to ECMAscript and a lexical order that is not tied to UTF-16. I suggest the following, which handles all JSON texts: Value space: The value space of rdf:JSON is recursively defined as the union of
Ordering: Objects are less than arrays, which are less than numbers, which are less than strings, which are less than false, which is less than null, which is less than true. Canonical form: The canonical form of an object is { followed by the canonical form of its members in order from lesser to greater separated by , followed by }. The canonical form of a number is its xsd:decimal canonical form. The canonical form of a string is " followed by the string with " replaced by ", \ replaced by \, The canonical form of false is the string false, the canonical form of null is the string null, the canonical form of true is the string true. |
PR #66 does not currently reference either spec directly (indirectly through RFC8259 and JCS).
Note that xsd:decimal is neither adequate to represent all JSON numbers nor consistent with JSON-LD. If defined in terms of XSD types, it should stick with what JSON-LD does and use either xsd:integer or xsd:double depending on the existence of a fractional part.
Ordering should be consistent with ordering the JCS representation. This implies that:
This really needs to be JCS due to wide implementation in JSON-LD processors already. |
Updates for #62 delved into updating the definition of the value space of the
rdf:JSON
datatype to use more primitive concepts from INFRA (arrays, maps, strings, booleans, and null) as well as number from ECMAScript.#62The existing value space is based on the JCS representation of the JSON literal value. The proposed update could look like the following:
Two JSON values are considered equal if they are the same string, number, boolean, or null; if they are both arrays with entries which are pairwise equal; or if they are both maps with equal map entries.
The text was updated successfully, but these errors were encountered: