Updates rdf:JSON value space. #66

gkellogg · 2023-09-26T20:02:02Z

This updates the rdf:JSON value space as discussed in #65 to be the result of parsing the lexical form into a JSON value (array, object, string, number, true, false, or null). It does not try to suggest what any of these values mean outside of the JSON definitions themselves.

Fixes #14, and fixes #65.

Preview | Diff

spec/index.html

pfps · 2023-09-28T17:11:04Z

Differences between I-JSON and JSON:

I-JSON does not allow any UTF-16 or UTF-32 encoding

These are not I-JSON but are JSON

{ "a": 1, "a", 2 }

"\uDEAD"

This might be I-JSON because I-JSON only says that numbers with too much precision for IEEE floating point double SHOULD not be used

3.141592653589793238462643383279

This might be I-JSON even the SHOULD above is MUST because it is unclear whether this has too much precision

1.2417634328206376

TallTed · 2023-09-28T17:12:50Z

For reference: RFC7493: The I-JSON Message Format

Also note RFC7493 Errata

spec/index.html

afs · 2023-10-03T19:07:45Z

As parsers don't necessarily agree on the cases outside I-JSON, it makes sense to me to stick to that.

If we go beyond it, we need to define the outcome of those cases and we are expecting a custom written JSON system which is too much to expect.

pfps · 2023-10-03T20:53:54Z

A big problem with this PR is that it also updates the lexical space. There is an official grammar for JSON and the lexical space in this PR does not correspond with the grammar.

gkellogg · 2023-10-04T19:39:07Z

This was discussed on the JSON-LD CG call today. The general consensus seems to be that using I-JSON as the value space makes sense, as it promotes interoperability, which is after all the primary purpose of these specifications.

Changes to the value space are acceptable if the net consequences are the same (meaning SPARQL results sort order, effectively).

Regarding the potential for duplicate keys, RFC8259 says they SHOULD be unique. I believe this is as strong as they could go given the history of the format. If they are not unique, it leads to unpredictable behavior. This also suggests that I-JSON serves the purposes of interoperability.

afs · 2023-11-02T17:34:37Z

Proposal: The lexical space is anything, JSON-ish.

The value space is the standard (appropriate RFC) that defines the web-community agreed subset + a extra value which is "undef".

"undef" is like SQL null or IEEE NaN. It does not value-equal itself.

pfps · 2023-11-02T18:26:04Z

Proposal: The lexical space is JSON.
The value space is JSON parse trees. The lexical-to-value mapping is parsing.

spec/index.html

pfps · 2023-11-02T18:31:47Z

spec/index.html

+        Two values are considered equal
+        if they are the same <a data-cite="RFC8259#section-7">string</a>,
+        <a data-cite="RFC8259#section-6">number</a>, or
+        literal value;


So "\u0020" and " " are different.

I believe parsing strings with escape sequences resolves those escapes (as it does in RDF). There is a canonical form when serialized via JCS.

If you want a data structure that is some transformation of the result of parsing a JSON string then you have to say what this data structure is and what the transformation is. This is going to involve, for JSON strings, things like handling surrogate pairs (and unpaired surrogates), or maybe just using UTF 16 code units, as well as handling escape sequences.

spec/index.html

pfps · 2023-11-02T18:33:27Z

spec/index.html

-            with the constraints of the value space described above.</li>
-        </ol>
+      <dd>maps every element of the lexical space to the result of parsing it into a
+        <a data-cite="RFC4627#section-2.1">JSON value</a>.


So the value for "1.0"^^rdf:JSON is different from the value for "1.00"^^rdf:JSON. This does not work.

From my interpretation, they are different terms having the same value.

Not so. There is a missing jump from parsed JSON texts to some sort of data structure. JSON strings, numbers, objects, and arrays are strings, or maybe parse trees.

I'm not against the value space of rdf:JSON being recursively defined as the union of TRUE, FALSE, NULL, finite sequences of Unicode UTF 16 code units, the finite numeric part of IEEE double floating point numbers, finite sequences of value space elements, and finite partial maps from finite sequences of Unicode UTF 16 code units to value space elements but that is not what is being said here.

afs · 2023-11-02T18:42:52Z

Proposal: The lexical space is JSON. The value space is JSON parse trees. The lexical-to-value mapping is parsing.

What's "parse" for multiple fields? For long numbers? Where is this implemented?

gkellogg · 2023-11-02T20:48:50Z

Proposal: The lexical space is JSON. The value space is JSON parse trees. The lexical-to-value mapping is parsing.

The problem with this is interoperability and what rdf:JSON is actually used for. JSON is a format used by developers derived from JavaScript (JavaScript Object Notation). The grammar for it is underspecified, but for various reasons can't really be changed. The way JSON is actually used is to parse into native structures (hashes, arrays, strings, numbers, boolean, and null) represented in whatever native system is being used by a developer. This practically means that numbers are limited to their [IEE754](https://datatracker.ietf.org/doc/html/rfc8259#ref-IEEE754( representations (basically xsd:double without NaN or +-Inf). The syntax allows number outside of this range to be represented, but this is of no practical use and is not interoperable.

Similarly, JSON Object keys need to be unique, for all practical purposes, as the underlying Hash/Map data structures they parse to typically do not support multiple map entries with the same key.

For RDF purposes, literals are restricted as described in RDF string. While we could define a separate space for JSON strings, this also does not serve a practical purpose.

Trying to stick to the letter of the JSON grammar does not serve a practical purpose, is not what developers actually use JSON for, and cannot be adequately tested for real systems.

I-JSON (RFC7493) was created to describe a profile of JSON closer to what is actually used; particularly important, as RFC8259 (JSON) effectively cannot be updated.

Furthermore, as discussed in the meeting, JSON-LD has effectively used JCS (RFC8785) as the serialization format for rdf:JSON literals. Within JSON-LD, JSON value objects are expressed using any legal JSON, but when creating an RDF Literal effectively depend on JCS (or the equivalent) for creating the serialized form. It so happens that this was also used as the value space for practical purposes. A value space based on INFRA representations was considered in previous commits, and the main problem that emerges is establishing a sort order.

Me feeling is that the rdf:JSON lexical space should effectively include the limitations described in I-JSON (and further used by JCS). While we could define the literal as straight JSON, and maintain the value space as some form of abstract parse tree, this would not fulfill the original purpose of having a JSON datatype.

pfps · 2023-11-02T21:02:03Z

Proposal: The lexical space is JSON. The value space is JSON parse trees. The lexical-to-value mapping is parsing.

What's "parse" for multiple fields? For long numbers? Where is this implemented?

JSON parsing. This is the same as specified in this PR.

"The lexical-to-value mapping
maps every element of the lexical space to the result of parsing it into a JSON value."

afs · 2023-11-02T22:13:22Z

parsing it into a [JSON value] https://www.rfc-editor.org/rfc/rfc4627#section-2.1.

RFC 4627 is formally obsoleted by RFC 7158, 7159 and now RFC 8259.
They don't define the parsing output JSON value.

RFC 8259 leaves it to the implementation. https://datatracker.ietf.org/doc/html/rfc8259#section-4 and other sections.

pfps · 2023-11-02T23:01:37Z

From RFC 8259:

A JSON value MUST be an object, array, number, or string, or one of the following three literal names:
  false
  null
  true

Later on:

array = begin-array [ value *( value-separator value ) ] end-array

And there are also definitions for string, number, and object.

afs · 2023-11-02T23:34:17Z

object - "The names within an object SHOULD be unique." gives ambiguity as recognized by "When the names within an object are not unique, the behavior of software that receives such an object is unpredictable."

object - "An object structure is represented" - is the representation unique? By the text, dropping all the pairs is legal!

When the names within an object are not unique, the behavior of software that receives such an object is unpredictable.

number = [ minus ] int [ frac ] [ exp ]

is the "value" the recognized parser rule? So +1 is not equal to 1.
Or is it the value of the number? of so, where does it say that?

This is why there are other RFC - the exact formalization of JSON syntax in 8259 does not give a unique lexical to parsed form mapping.

pfps · 2023-11-03T02:07:56Z

@afs Welcome to the world of JSON.

pfps · 2024-03-21T16:54:42Z

https://www.w3.org/TR/xmlschema11-2/#f-doubleCanmap is a definition of a canonical lexical space for XML schema double. It might fill the bill if it is acceptable for JSON people.

gkellogg · 2024-03-25T19:54:50Z

A proposed update includes defining the Lexical Space as "the set of all RDF strings that conform to the JSON Grammar as described in Section 2 JSON Grammar of [[RFC8259]]". Adding a note that "[[[RFC8259]]] [[RFC8259]] allows strings to include surrogate code points which are not allowed in RDF strings, thus the lexical representation of JSON literals excludes those including surrogate code points." The current PR also requires that the lexical representation also conform to I-JSON, which could obviate the need for that note.

To address the grammar/value considerations of using the JSON definitions of value primitives in the current PR, we should probably again differ to INFRA for the definition of arrays (lists), maps, null, and boolean true and false. JSON Strings are RDF Strings, and number values are defined by referencing the value space for XSD doubles and XSD integers (derived from decimal). We will need to define how to compare each primitive, deferring to XPath functions for string, number, and boolean.

Also, note that aligning the value space with I-JSON (required for JCS) means that some values in the lexical space may not have values (e.g., JSON including surrogate code points), may loose precision, or may result in the same value from two different lexical representations (e.g, 1 vs 01). If the lexical space continues to require literal values conform with I-JSON, we're left with potential loss of precision and merger of values, which is common to other numeric datatypes.

We also need to define how two JSON literals are compared, which would be by comparing their canonical representations as strings by performing the lexical-to-value mapping and canonicalizing the value as described using JCS.

I believe this definition satisfies the needs for JSON-LD. If the lexical space is not limited to I-JSON there can be undefined behavior in the lexlcal-to-value mapping for lexical values which are not I-JSON.

gkellogg · 2024-03-25T20:12:50Z

https://www.w3.org/TR/xmlschema11-2/#f-doubleCanmap is a definition of a canonical lexical space for XML schema double. It might fill the bill if it is acceptable for JSON people.

JCS should be sufficient for us to define a canonical lexical representation. JCS defers to ECMA-262 for serialization of numbers. If there is a mismatch between the value space for XSD double and ECMA-262 we may need to add some clarification along the lines of doubleCanmap.

pfps · 2024-06-19T12:49:54Z

Somehow I have forgotten how to put in a suggested change to the PR. Here is what I think needs to be added at the end of the section.

The lexical-to-value mapping MUST produce the same results as
doubleLexicalMap, including rounding,
so that both 9007199254740991.5 and 9007199254740992.5 are mapped to 9,007,199,254,740,992.
Although they are discouraged in JSON texts,
numbers with precision greater than that that can be represented,
such as 3.141592653589793238462643383279, and numbers with magnitude
greater than that that can be represented, such as 1E400, MUST be accepted
although implementations MAY emit warnings for them.
The latter MUST result in positive infinity.

Because the value space for rdf:JSON includes positive and negative infinity both the
JSON Canonicalization Scheme and
ECMAscript's JSON.stringify
MUST NOT be used directly to serialize rdf:JSON values as the former will terminate with an error and the
latter produces an inappropriate value.

pfps · 2024-06-19T12:52:32Z

There is still the problem with the use of ordered maps. Saying that they are to be treated as unordered is not sufficient as that is ambiguous between using an ordered map but not being able to access the order and using (the undefined) unordered map. Either some clean definition of a map must be found or a definition for unordered map must be included or some wording that does not need a separate definition must be used.

gkellogg · 2024-06-19T17:14:22Z

The lexical-to-value mapping MUST produce the same results as doubleLexicalMap, including rounding, so that both 9007199254740991.5 and 9007199254740992.5 are mapped to 9,007,199,254,740,992. Although they are discouraged in JSON texts, numbers with precision greater than that that can be represented, such as 3.141592653589793238462643383279, and numbers with magnitude greater than that that can be represented, such as 1E400, MUST be accepted although implementations MAY emit warnings for them. The latter MUST result in positive infinity.

We already say that a JSON Number is mapped to an xsd:double using a method consistent with doubleLexicalMap Why do you feel we need to repeat this, including with specific examples?

Because the value space for rdf:JSON includes positive and negative infinity both the JSON Canonicalization Scheme and ECMAscript's JSON.stringify MUST NOT be used directly to serialize rdf:JSON values as the former will terminate with an error and the latter produces an inappropriate value.

JCS already has such a requirement

Note: Since Not a Number (NaN) and Infinity are not permitted in JSON, occurrences of NaN or Infinity MUST cause a compliant JCS implementation to terminate with an appropriate error.

gkellogg · 2024-06-19T17:19:47Z

There is still the problem with the use of ordered maps. Saying that they are to be treated as unordered is not sufficient as that is ambiguous between using an ordered map but not being able to access the order and using (the undefined) unordered map. Either some clean definition of a map must be found or a definition for unordered map must be included or some wording that does not need a separate definition must be used.

If there's a way to add wording to make use of the existing (ordered) map definition, I'd be inclined to do that. Alternatively, we could define our own unordered map (#rdf-unordered-map) to be a set of map entries such that no two entries share the same key. but I don't think that's as practically useful as using the INFRA map definition

pfps · 2024-06-19T17:35:55Z

The lexical-to-value mapping MUST produce the same results as doubleLexicalMap, including rounding, so that both 9007199254740991.5 and 9007199254740992.5 are mapped to 9,007,199,254,740,992. Although they are discouraged in JSON texts, numbers with precision greater than that that can be represented, such as 3.141592653589793238462643383279, and numbers with magnitude greater than that that can be represented, such as 1E400, MUST be accepted although implementations MAY emit warnings for them. The latter MUST result in positive infinity.

We already say that a JSON Number is mapped to an xsd:double using a method consistent with doubleLexicalMap Why do you feel we need to repeat this, including with specific examples?

Because this is a difference from other JSON-related documents.

Because the value space for rdf:JSON includes positive and negative infinity both the JSON Canonicalization Scheme and ECMAscript's JSON.stringify MUST NOT be used directly to serialize rdf:JSON values as the former will terminate with an error and the latter produces an inappropriate value.

JCS already has such a requirement

Note: Since Not a Number (NaN) and Infinity are not permitted in JSON, occurrences of NaN or Infinity MUST cause a compliant JCS implementation to terminate with an appropriate error.

The point is precisely because JCS has this requirement thus it can't be used to serialize rdf:JSON values.

pfps · 2024-06-19T17:41:35Z

There is still the problem with the use of ordered maps. Saying that they are to be treated as unordered is not sufficient as that is ambiguous between using an ordered map but not being able to access the order and using (the undefined) unordered map. Either some clean definition of a map must be found or a definition for unordered map must be included or some wording that does not need a separate definition must be used.

If there's a way to add wording to make use of the existing (ordered) map definition, I'd be inclined to do that. Alternatively, we could define our own unordered map (#rdf-unordered-map) to be a set of map entries such that no two entries share the same key. but I don't think that's as practically useful as using the INFRA map definition

My point is that we need to be unambiguous and the current wording is not. I don't see how ordered map can be used because has different values for { "a": 1, "b": 2} and {"b":2, "a":1}. A SPARQL implementer would be justified to have the values compare as not equal even though there was no way of extracting the ordering. I suppose that one could write something effectively saying that these two different values must compare as equal (and identical and whatever else might distinguish between them) but at this point it is much better (and safer) just to start with the correct datatype.

gkellogg · 2024-06-19T22:25:51Z

JCS already has such a requirement

Note: Since Not a Number (NaN) and Infinity are not permitted in JSON, occurrences of NaN or Infinity MUST cause a compliant JCS implementation to terminate with an appropriate error.

The point is precisely because JCS has this requirement thus it can't be used to serialize rdf:JSON values.

I believe that the appropriate thing to do is, in fact, to terminate with an error. In any case, as we don't get into canonicalization, I don't think it's appropriate for us to place restrictions on other specifications. The best we can do is to have a note that explains that due to such considerations, not all values can be serialized back to JSON.

If there's a way to add wording to make use of the existing (ordered) map definition, I'd be inclined to do that. Alternatively, we could define our own unordered map (#rdf-unordered-map) to be a set of map entries such that no two entries share the same key. but I don't think that's as practically useful as using the INFRA map definition

My point is that we need to be unambiguous and the current wording is not. I don't see how ordered map can be used because has different values for { "a": 1, "b": 2} and {"b":2, "a":1}. A SPARQL implementer would be justified to have the values compare as not equal even though there was no way of extracting the ordering. I suppose that one could write something effectively saying that these two different values must compare as equal (and identical and whatever else might distinguish between them) but at this point it is much better (and safer) just to start with the correct datatype.

Well, we are explicit in how to determine if two JSON values are equivalent, which considers the unordered nature of maps. We could add a note that highlights these cases and how they are equivalent, according to our definition. I believe that our definition of map equality normatively makes this unambiguous, but an informative note may help drive this point home. If there's other specific language to clarify our intention please suggest something. I think that defining our own definition for unordered map is probably not a good idea, as who would go looking to RDF Concepts for a definition of an unordered map? Better to qualify what we mean by saying that we treat the INFRA map as unordered.

spec/index.html

pfps · 2024-06-20T01:10:19Z

I just noticed that the value space definition is completely messed up as it does not specify the target space of maps nor values of arrays. This needs to be fixed.

afs · 2024-06-20T08:54:47Z

A SPARQL implementer would be justified to have the values compare as not equal

If we say it's an unordered map then it's an unordered map.

pfps · 2024-06-20T18:25:24Z

A SPARQL implementer would be justified to have the values compare as not equal

If we say it's an unordered map then it's an unordered map.

The definition points to the following text

5.2. Maps

An ordered map, or sometimes just "map", is a specification type consisting of a finite ordered sequence of tuples, each consisting of a key and a value, with no key appearing twice. Each such tuple is called an entry.

So a map has an order.

gkellogg · 2024-06-20T19:09:40Z

I just noticed that the value space definition is completely messed up as it does not specify the target space of maps nor values of arrays. This needs to be fixed.

You're going to need to provide some more context for this. It's not clear to me what is missing from the L2V description on how objects and arrays are mapped to lists and maps. Is there some specific requirement you can point to? Specifically what needs to be added to the L2V entries for map and array to make them acceptable?

pfps · 2024-06-20T21:38:49Z

I just noticed that the value space definition is completely messed up as it does not specify the target space of maps nor values of arrays. This needs to be fixed.

You're going to need to provide some more context for this. It's not clear to me what is missing from the L2V description on how objects and arrays are mapped to lists and maps. Is there some specific requirement you can point to? Specifically what needs to be added to the L2V entries for map and array to make them acceptable?

A map to what? Integers only? Emoticons? Rocks? People?
A list of what? ...

The value space is recursive and needs a recursive definition.

gkellogg · 2024-06-20T23:45:22Z

A map to what? Integers only? Emoticons? Rocks? People? A list of what? ...

The value space is recursive and needs a recursive definition.

It seems properly recursively defined to me:

A JSON Object is mapped to a map by transforming each object member into a map entry with the key taken from the member name and value taken by performing this mapping to the member value. Map entries are treated as being unordered.

A JSON Array is mapped to a list by performing this mapping on each array value.

How can that be made more clear? Implicit in the object mapping is that the member name is mapped to a string which then forms the key of map entry. Is the phrase "by performing this mapping" not clear that it recursively invokes the L2V mapping on that value?

pfps · 2024-06-21T09:28:04Z

A map to what? Integers only? Emoticons? Rocks? People? A list of what? ...
The value space is recursive and needs a recursive definition.

It seems properly recursively defined to me:

A JSON Object is mapped to a map by transforming each object member into a map entry with the key taken from the member name and value taken by performing this mapping to the member value. Map entries are treated as being unordered.

A JSON Array is mapped to a list by performing this mapping on each array value.

How can that be made more clear? Implicit in the object mapping is that the member name is mapped to a string which then forms the key of map entry. Is the phrase "by performing this mapping" not clear that it recursively invokes the L2V mapping on that value?

The problem is not in the lexical-to-value mapping but is instead in the value space definition that needs to say what the maps are to (and from, as well) and what elements of lists are. It would be perfectly consistent with the above to have maps be from integers to integers and lists to only have integers as values.

gkellogg · 2024-06-21T14:54:42Z

The problem is not in the lexical-to-value mapping but is instead in the value space definition that needs to say what the maps are to (and from, as well) and what elements of lists are. It would be perfectly consistent with the above to have maps be from integers to integers and lists to only have integers as values.

Perhaps if the value space were defined as follows:

The value space
is the set of strings, numbers (xsd:double), maps (mapping strings to values in the value space where the order of map entries is not significant), lists (of values in the value space), and literal values (true, false, and null)

spec/index.html

Co-authored-by: Ted Thibodeau Jr <[email protected]>

gkellogg · 2024-06-26T21:18:43Z

The JSON-LD WG today resolved to adopt this definition for the rdf:JSON datatype.

pfps · 2024-06-27T15:48:47Z

That's concerning as the current definition in this PR is still inadequate.

The value space contains elements like the list that is its only member. There is nothing wrong with this, but I don't believe that such values are part of JSON and they certainly cause problems in serialization. If these values are not wanted a datatypes expert should be consulted to provide the best wording for excluding them.

The value space still contains orderd maps (with insignicant order). It would be much better to start out with a map datatype that is unordered instead of going through all the contortions to try to eliminate all vestiges of the order.

The lexical-to-value mapping does not specify the result of the map for Objects, Arrays, Strings, or Literal Names. For example, the literal name true could be mapped to any of true, false, or nil. For another example, there is no description of how JSON character escapes are handled. The JSON string "\u1234" could be mapped to the value "A" and still be compliant with the lexical-to-value mapping.

There may be other problems that are obscured by these.

pfps

Approved just so that the PR can be merged. This is not an endorsement of the currrent state of the PR.

pfps · 2024-06-20T01:04:56Z

spec/index.html

+        <a>strings</a>,
+        numbers (<a data-cite="XMLSCHEMA11-2#double"><strong>xsd:double</strong></a>),
+        <a data-cite="INFRA#ordered-map">maps (unordered)</a>,
+        <a data-cite="INFRA#list">lists</a>, and


What are the members of lists?

spec/index.html

gkellogg added the spec:substantive Issue or proposed change in the spec that changes its normative content label Sep 26, 2023

gkellogg requested review from afs, domel, hartig and pfps September 26, 2023 20:02

pfps reviewed Sep 27, 2023

View reviewed changes

spec/index.html Show resolved Hide resolved

gkellogg mentioned this pull request Sep 27, 2023

Value space of rdf:JSON datatype #65

Closed

TallTed suggested changes Sep 28, 2023

View reviewed changes

spec/index.html Outdated Show resolved Hide resolved

spec/index.html Outdated Show resolved Hide resolved

pfps reviewed Nov 2, 2023

View reviewed changes