Skip to content

Commit

Permalink
Issue duckdb#4532: Secondary Window Orderings
Browse files Browse the repository at this point in the history
* Document new `ORDER BY` argument support for window functions
* Update aggregate documentation for `DISTINCT` and `ORDER BY` simplification

fixes: duckdb#4532
  • Loading branch information
Richard Wesley committed Jan 9, 2025
1 parent 5a828f1 commit 9d56e27
Show file tree
Hide file tree
Showing 2 changed files with 47 additions and 1 deletion.
3 changes: 2 additions & 1 deletion docs/sql/aggregates.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,12 @@ SELECT LIST(region ORDER BY amount DESC) FROM sales;
Aggregates are functions that *combine* multiple rows into a single value. Aggregates are different from scalar functions and window functions because they change the cardinality of the result. As such, aggregates can only be used in the `SELECT` and `HAVING` clauses of a SQL query.

When the `DISTINCT` clause is provided, only distinct values are considered in the computation of the aggregate. This is typically used in combination with the `COUNT` aggregate to get the number of distinct elements; but it can be used together with any aggregate function in the system.
There are some aggregates that are insensitive to duplicate values (e.g., `min` and `max`) and for them this clause is parsed and ignored.

When the `ORDER BY` clause is provided, the values being aggregated are sorted before applying the function.
Usually this is not important, but there are some order-sensitive aggregates that can have indeterminate results
(e.g., `first`, `last`, `list` and `string_agg`). These can be made deterministic by ordering the arguments.
For order-insensitive aggregates, this clause is parsed and applied, which is inefficient, but still produces the same result.
For order-insensitive aggregates, this clause is parsed and ignored.

## General Aggregate Functions

Expand Down
45 changes: 45 additions & 0 deletions docs/sql/window_functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,50 @@ The table below shows the available general window functions.

All [aggregate functions](aggregates) can be used in a windowing context.

## DISTINCT Arguments

All aggregate window functions support using a `DISTINCT` clause for the arguments. When the `DISTINCT` clause is
provided, only distinct values are considered in the computation of the aggregate. This is typically used in combination
with the `COUNT` aggregate to get the number of distinct elements; but it can be used together with any aggregate
function in the system. There are some aggregates that are insensitive to duplicate values (e.g., `min`, `max`) and for
them this clause is parsed and ignored.

```sql
-- Count the number of distinct users at a given point in time
SELECT COUNT(DISTINCT name) OVER (ORDER BY time) FROM sales;
-- Concatenate those distinct users into a list
SELECT LIST(DISTINCT name) OVER (ORDER BY time) FROM sales;
```

### ORDER BY Arguments

All aggregate window functions support using an `ORDER BY` clause that is different from the window ordering. When the
`ORDER BY` clause is provided, the values being aggregated are sorted before applying the function. Usually this is not
important, but there are some order-sensitive aggregates that can have indeterminate results (e.g., `mode`, `list` and
`string_agg`). These can be made deterministic by ordering the arguments. For order-insensitive aggregates, this clause
is parsed and ignored.

```sql
-- Compute the modal value up to each time, breaking ties in favour of the most recent value.
SELECT MODE(value ORDER BY time DESC) OVER (ORDER BY time) FROM sales;
```

The SQL standard does not provide for using `ORDER BY` with general-purpose window functions, but we have extended all
of these functions (except `dense_rank`) to accept this syntax and use framing to restrict the range that the secondary
ordering applies to.

```sql
-- Compare each athlete's time in an event with the best time to date
SELECT event, date, athlete, time
first_value(time ORDER BY time DESC) OVER w AS record_time,
first_value(athlete ORDER BY time DESC) OVER w AS record_athlete,
FROM meet_results
WINDOW w AS (PARTITION BY event ORDER BY datetime)
ORDER BY ALL
```

Note that there is no comma separating the arguments from the `ORDER BY` clause.

## Ignoring NULLs

The following functions support the `IGNORE NULLS` specification:
Expand All @@ -63,6 +107,7 @@ The following functions support the `IGNORE NULLS` specification:
| `nth_value(expr any, nth integer)` | Skips `NULL` values when counting. | `nth_value(column, 2 IGNORE NULLS)` |

Note that there is no comma separating the arguments from the `IGNORE NULLS` specification.
Also note that the `IGNORE NULLS` specification must follow any `ORDER BY` argument specification.

The inverse of `IGNORE NULLS` is `RESPECT NULLS`, which is the default for all functions.

Expand Down

0 comments on commit 9d56e27

Please sign in to comment.