Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop the requirement to support ill-typed literals with recognized datatype IRIs #60

Open
wouterbeek opened this issue Aug 30, 2023 · 5 comments
Labels
needs discussion Proposed for discussion in an upcoming meeting spec:enhancement Issue or proposed change to enhance the spec without changing the normative content substantively

Comments

@wouterbeek
Copy link

Observation

RDF 1.1 requires that implementations support ill-typed literals, including ill-typed literals with recognized datatype IRIs.

Ill-typed literals with recognized datatype IRIs do not have any known use cases. They are semantically inconsistent, do not denote anything, have no value, and any triple that contains them is false in every interpretation.

Notice that there is nothing wrong with requiring implementations to support ill-typed literals with unrecognized datatype IRIs. For example, it is good that RDF implementations are required to support literals like [1] that have a datatype IRI that is not broadly recognized.

[1] '### Header'^^<https://example.com/markdown>

However, it is unclear why implementations are allowed to support, let alone are required to support, ill-typed literals with recognized datatype IRIs.

Example

Suppose a triple store recognizes the RDF datatype IRIs + the XSD datatype IRIs + the GeoSPARQL datatype IRIs. Such a triple store can upon data ingest immediately detect that [2] and [3] are ill-typed literals with recognized datatype IRI.

[2] 'Yes'^^xsd:boolean
[3] 'The sea is everything. It covers seven tenths of the terrestrial globe.'^^xsd:boolean

The RDF 1.1 standard forbids triple stores to throw an error upon encountering data that contains [2] or [3], even though this may be the preferred data quality approach for many users.

Suggestion

In RDF 1.2, let's weaken the RDF 1.1 phrase "Implementations MUST accept ill-typed literals" to:

  1. "Implementations MUST support ill-typed literals with unrecognized datatype IRIs."
  2. "Implementations MAY support ill-typed literals with recognized datatype IRIs."

Implementations MUST support the RDF datatype IRIs, and MAY support any other datatype IRIs that they believe important enough for their users. The notion "recognized datatype IRI" is used as defined in RDF 1.1 Semantics.

Ramifications

The proposed change makes it possible for RDF 1.2 data to be accepted in one implementation, but not in another implementation. For example, it is possible to upload data that contains literals [2] and [3] into an implementation that does not recognize the xsd:boolean datatype IRI. But it is not possible to upload the same data into an implementation that does recognize the xsd:boolean datatype IRI.

This differentiation is a good thing, because it allows stricter implementations to be created, rather than requiring all implementations to support the exact same ill-typed nonsense data.

Notice that RDF 1.1 Semantics already allows implementations to differ from one another in their support for more/fewer recognized datatype IRIs. Implementations that differ in their recognized datatype IRIs already differ in their behavior in RDF 1.1.

@wouterbeek wouterbeek changed the title Allow implementations that only support s Drop the requirement to support ill-typed literals with recognized datatype IRIs Aug 30, 2023
@afs
Copy link
Contributor

afs commented Aug 31, 2023

The current text is a bit strange.

Implementations MUST accept ill-typed literals and produce RDF graphs from them.

I don't think that the "MUST" can be meaningful if the literals are outside RDF-semantics. In RDF concepts, the text
can be dropped, or replaced with non-defining descriptive/advice text (after the numbered list), and refer to RDF Semantics.

Implementations MAY produce warnings when encountering ill-typed literals.

Any system can issue warning for anything regardless of this text so it can be dropped or made advice text as encouragement to do that.

For RDF Concepts , can we just say:
"Implementations SHOULD accept ill-typed literals"

which allows variation when there's justification.

("support" is stronger than "accept". "Accept" is about RDF terms (correct syntax). I would read "Support" is about acting, e.g. on the values c.f. D-entailment.)

@afs afs added the discuss-f2f Proposed for discussion during the next face-to-face meeting label Sep 6, 2023
@ktk ktk removed the discuss-f2f Proposed for discussion during the next face-to-face meeting label Oct 3, 2023
@pchampin
Copy link
Contributor

Implementations MUST accept ill-typed literals and produce RDF graphs from them.

I don't think that the "MUST" can be meaningful if the literals are outside RDF-semantics. In RDF concepts, the text can be dropped, or replaced with non-defining descriptive/advice text (after the numbered list), and refer to RDF Semantics.

+1

Actually, I consider this bit of RDF Concepts to contradict RDF Semantics §7.2, which says:

RDF processors MAY treat an unsatisfiable graph as signaling an error condition, but this is not required.

and in fact some implementations already do :)

This makes a strong case for replacing this MUST with a MAY in RDF-syntax, IMO.

@gkellogg gkellogg added the spec:enhancement Issue or proposed change to enhance the spec without changing the normative content substantively label Jan 30, 2024
@afs
Copy link
Contributor

afs commented Jan 30, 2024

MAY is weak IMO.

It would be nice to encourage the behavior of passing through syntactically correct data with "SHOULD accept ill-typed literals".

@csarven
Copy link
Member

csarven commented Jan 30, 2024

This can be expressed as an advisory in the specification as a Note or within the Considerations section providing additional context for implementations to evaluate advantages and pitfalls.

@pfps pfps added the needs discussion Proposed for discussion in an upcoming meeting label Jun 27, 2024
@pchampin
Copy link
Contributor

pchampin commented Nov 4, 2024

This was discussed during the rdf-star meeting on 31 October 2024.

View the transcript

Drop the requirement to support ill-typed literals with recognized datatype IRIs 2

pfps: I agree with what Andy says in the issue
… the wording should change from MUST to SHOULD

AndyS: it depends what "support" really means here
… I don't think ill typed literal making the whole graph invalid is very useful

AZ: I also want to ask what is ment by "support". If you have a system that does not recognize a datatype IRI
… if you want to move that to another triplestore, you might lose something.
… I'm not sure what support means. It should pass as syntactically correct.
… By the semantics of illtyped literals, since RDF 1.1, if you have an ill-typed literal in a graph, it makes the graph inconsistent, unsatisfiable.

<AndyS> RDF concepts -- "The list of datatypes supported by an implementation is determined by its recognized datatype IRIs." seems to be the nearest to defining "support".

AZ: If you say this kind of graphs may not be supported, what about other kinds of inconsistencies. Should any such graph not be supported?
… I'm not sure if I agree with this proposal.

pfps: one option would be to tweak the wording

<pfps> One option is that implementations MUST accept input documents with ill-typed literals and SHOULD include the resultant triple in the RDF graph.

gkellogg: it makes no sense to talk about an ill-type literal for non-recognised datatypes
… it all depends on what "support" means

<pfps> That is - parsing MUST NOT stop at an ill-typed literal but the system MAY choose to not include the triple in the resultant graph.

gkellogg: I think the idea is to be able to only retain well-typed literals

<pfps> I would add that if an implementation drops the triple then it MUST produce a warning.

gkellogg: it would be reasonable for RDF systems to not deal with ill-typed literals

TallTed: the current text is "MUST accept", not "MUST support"
… "accept" means it can evolve
… triplestores should be able to take any literals
… but then it may deal with the literal for some processes adequatel
… you can do almost anything with RDF and unless there is a strong argument against that, we should keep it like this

ktk: how are different implementations dealing with this?

AndyS: in SPARQL, there are cases when you need to assign a value, so it does not work with ill-typed literals but that a SPARQL process
… there could be wording to make this a little more flexible with "MAY"
… it's difficult to make it a "MUST"

<pfps> agreed that it is difficult to require a warning

james: We are very accepting (in our implem) and it has been very useful
… I think it should be a "MUST" for reasons of interoperability

<AndyS> "SHOULD accept" -- MUST for warnings is a bit strange. We don't have a "warning" mechanism in the specs.

james: but it's personal opinion

Souri: when we find an ill-typed literal, we separate it
… we continue even if we find error and they get reported
… we do not accept it in that form, so for us, a MUST would not work

AndyS: choosing the datatypes you choose to handle is something you do when you use the data
… at loading time, you may not have decided

TallTed: I'm concerned to hear that some implementations are not conformant
… It's blocking evolution, because there may be new datatypes supporting in the future
… The reasoning I see is that the proposal is done because there are implementations that are not conformant

<niklasl> +1 for evolution (with the caveat that I prefer opt-in "drop unrecognized" modes to avoid sending inexplicable data onward).

Souri: if we have an xsd:integer with "abc" lexical form, we don't accept it, but if you have ex:mytype, we don't do anything
… we report the problem and users can decide what to do with this problem

<Zakim> pfps, you wanted to say that implementations that reject unrecognized datatypes are broken but ones that do not fully accept known ill-typed literals are not so bad

james: we do 2 kind of things, one on the values to do efficient operation, and one that just take any literal transparently
… in the past, we did not do anything with time, then it evolved to handle it appropriately

ktk: what do we do with this issue? We don't really have a conclusion

<TallTed> "MUST accept" is current text

Souri: in Oracle, we don't want to have, e.g., 31st February, so we reject it
… we do not hide it, we report it
… I would not like to have "MUST accept"

TallTed: not accepting data is bad but you can handle the ill-typed literals after they are loaded
… in the future, there could be a change that makes a lexical form acceptable

tl: I like the idea that there are several phases, 1st you parse and put in store, then other processes
… then the user can be informed of problematic literals
… you would get an error if you use reasoner on the data

AndyS: I find the use cases of rejecting or not rejecting both reasonable
… the problem is when an entire graph is rejected

<AndyS> My pref is change "MUST accept" to "SHOULD accept". All the described handling cases seem reasonable for their different cases.

Souri: w do not reject entire graph, just the triples with ill-typed literals

<Dominik_T> +1 for SHOULD

Souri: the earlier the problems can be pointed out the better
… customers are also happy with this

<ktk> Strawpoll: "Implementations MUST accept ill-typed literals" gets changed to "Implementations SHOULD accept ill-typed literals"

<Dominik_T> +1

<gtw> +1

<ktk> +1

<pfps> +1

<AndyS> +1

<Souri> +1

<gkellogg> +1

<AZ> -0.3141592

<enrico> 0

<TallTed> -0.5

<james> 0

<niklasl> +0.5 (I might prefer some "SHOULD by default, MUST if asked to accept"...)

TallTed: if we make this change, we have to be really clear how errors are dealt with

AndyS: I don't think we should go into how errors and warnings are handled

<TallTed> An ill typed literal is not a syntax error.

<TallTed> An ill typed literal conforms to syntax.

<Souri> +1 to AndyS

AndyS: there's an historical example (??) where specs mentioned what to do with errors and it took a large space, and was eventually dropped

niklasl: I had experienced cases of systems that reject things that I would have like be accepted because things evolved
… although I'm sympathetic to the arguments (thus my +0.5 vote)
… it could be something that users can opt-in or out

ktk: there could be a note that explain what pitfalls etc occur and how to deal with them


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs discussion Proposed for discussion in an upcoming meeting spec:enhancement Issue or proposed change to enhance the spec without changing the normative content substantively
Projects
None yet
Development

No branches or pull requests

7 participants