Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify use of SAMPLE #165

Open
Tpt opened this issue Oct 30, 2024 · 2 comments
Open

Clarify use of SAMPLE #165

Tpt opened this issue Oct 30, 2024 · 2 comments
Labels
ErratumRaised Errata management: proposed erratum

Comments

@Tpt
Copy link
Contributor

Tpt commented Oct 30, 2024

Paragraph 11.4 Aggregate Projection Restrictions states that only expressions consisting of aggregates and constants may be projected, with one exception, then gives an example of a disallowed expression with Note that it would not be legal to project.

However, just after it states that Other expressions, not using GROUP BY variables, or aggregates may have non-deterministic values projected from their groups using the SAMPLE aggregate.. Is seems contradictory to me.
Moreover, the algorithm on section 18.2.4.1 Grouping and Aggregation suggest this automated insertion of SAMPLE, leading to think this insertion of SAMPLE is mandatory (line Replace V with Sample(V)).

Example of affected query:

SELECT ?s WHERE {
  VALUES (?s ?o) { (0 0) (1 0) }
} GROUP BY ?o

Jena, Blazegraph and Oxigraph all return an error for this query whereas Virtuoso returns results using SAMPLE.

I see multiple way forward:

  1. State the behavior (error or SAMPLE) is implementation defined
  2. Mandate error behavior
  3. Mandate sample behavior

A proposal: take the "implementation defined" way to keep backward compatibility, but strongly nudge in favor of returning an error:

  • in the algorithm, replace Replace V with Sample(V) by Raise an error or Replace V with Sample(V)
  • rephrase Other expressions, not using GROUP BY variables, or aggregates may have non-deterministic values projected from their groups using the SAMPLE aggregate. into Note that some implementation do not raise an error when not using GROUP BY variables, or aggregates in exception but project non-deterministic values from their groups using the SAMPLE aggregate.
@Tpt Tpt added the ErratumRaised Errata management: proposed erratum label Oct 30, 2024
@afs
Copy link
Contributor

afs commented Oct 30, 2024

this automated insertion of SAMPLE

It is not meant to imply automatic insertion of SAMPLE, it's suggesting that SAMPLE is used explicitly. (IIRC SQL is different). A wording change here would help. Maybe "by using the SAMPLE aggregate."

section 18.2.4.1 Grouping and Aggregation says:

For each (X AS Var) in SELECT, each HAVING(X), and each ORDER BY X in Q
For each unaggregated variable V in X

(X being an expression, including just a variable) only V in X, not any variable.

What do does the test suite have?

@Tpt
Copy link
Contributor Author

Tpt commented Oct 30, 2024

It is not meant to imply automatic insertion of SAMPLE, it's suggesting that SAMPLE is used explicitly. (IIRC SQL is different). A wording change here would help. Maybe "by using the SAMPLE aggregate."

Thank you! This wording would make things way clearer.

What do does the test suite have?

agg08, agg09, agg10 and agg11 tests check that the SPARQL parser fails if a non-aggregated variable is used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ErratumRaised Errata management: proposed erratum
Projects
None yet
Development

No branches or pull requests

2 participants