Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #4532: Secondary Window Orderings #4536

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/sql/aggregates.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,12 @@ SELECT LIST(region ORDER BY amount DESC) FROM sales;
Aggregates are functions that *combine* multiple rows into a single value. Aggregates are different from scalar functions and window functions because they change the cardinality of the result. As such, aggregates can only be used in the `SELECT` and `HAVING` clauses of a SQL query.

When the `DISTINCT` clause is provided, only distinct values are considered in the computation of the aggregate. This is typically used in combination with the `COUNT` aggregate to get the number of distinct elements; but it can be used together with any aggregate function in the system.
There are some aggregates that are insensitive to duplicate values (e.g., `min` and `max`) and for them this clause is parsed and ignored.

When the `ORDER BY` clause is provided, the values being aggregated are sorted before applying the function.
Usually this is not important, but there are some order-sensitive aggregates that can have indeterminate results
(e.g., `first`, `last`, `list` and `string_agg`). These can be made deterministic by ordering the arguments.
For order-insensitive aggregates, this clause is parsed and applied, which is inefficient, but still produces the same result.
For order-insensitive aggregates, this clause is parsed and ignored.

## General Aggregate Functions

Expand Down
45 changes: 45 additions & 0 deletions docs/sql/window_functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,50 @@ The table below shows the available general window functions.

All [aggregate functions](aggregates) can be used in a windowing context.

## DISTINCT Arguments

All aggregate window functions support using a `DISTINCT` clause for the arguments. When the `DISTINCT` clause is
provided, only distinct values are considered in the computation of the aggregate. This is typically used in combination
with the `COUNT` aggregate to get the number of distinct elements; but it can be used together with any aggregate
function in the system. There are some aggregates that are insensitive to duplicate values (e.g., `min`, `max`) and for
them this clause is parsed and ignored.

```sql
-- Count the number of distinct users at a given point in time
SELECT COUNT(DISTINCT name) OVER (ORDER BY time) FROM sales;
-- Concatenate those distinct users into a list
SELECT LIST(DISTINCT name) OVER (ORDER BY time) FROM sales;
```

### ORDER BY Arguments

All aggregate window functions support using an `ORDER BY` clause that is different from the window ordering. When the
`ORDER BY` clause is provided, the values being aggregated are sorted before applying the function. Usually this is not
important, but there are some order-sensitive aggregates that can have indeterminate results (e.g., `mode`, `list` and
`string_agg`). These can be made deterministic by ordering the arguments. For order-insensitive aggregates, this clause
is parsed and ignored.

```sql
-- Compute the modal value up to each time, breaking ties in favour of the most recent value.
SELECT MODE(value ORDER BY time DESC) OVER (ORDER BY time) FROM sales;
```

The SQL standard does not provide for using `ORDER BY` with general-purpose window functions, but we have extended all
of these functions (except `dense_rank`) to accept this syntax and use framing to restrict the range that the secondary
ordering applies to.

```sql
-- Compare each athlete's time in an event with the best time to date
SELECT event, date, athlete, time
first_value(time ORDER BY time DESC) OVER w AS record_time,
first_value(athlete ORDER BY time DESC) OVER w AS record_athlete,
FROM meet_results
WINDOW w AS (PARTITION BY event ORDER BY datetime)
ORDER BY ALL
```

Note that there is no comma separating the arguments from the `ORDER BY` clause.

## Ignoring NULLs

The following functions support the `IGNORE NULLS` specification:
Expand All @@ -63,6 +107,7 @@ The following functions support the `IGNORE NULLS` specification:
| `nth_value(expr any, nth integer)` | Skips `NULL` values when counting. | `nth_value(column, 2 IGNORE NULLS)` |

Note that there is no comma separating the arguments from the `IGNORE NULLS` specification.
Also note that the `IGNORE NULLS` specification must follow any `ORDER BY` argument specification.

The inverse of `IGNORE NULLS` is `RESPECT NULLS`, which is the default for all functions.

Expand Down