Skip to content

Commit

Permalink
Merge pull request #3815 from szarnyasg/duckdb-tricks-2
Browse files Browse the repository at this point in the history
DuckDB tricks pt2 blog post
  • Loading branch information
szarnyasg authored Oct 11, 2024
2 parents 7cbfae5 + 5abdb6d commit dfcd59c
Show file tree
Hide file tree
Showing 6 changed files with 359 additions and 1 deletion.
3 changes: 2 additions & 1 deletion _posts/2024-08-19-duckdb-tricks-part-1.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
layout: post
title: "DuckDB Tricks – Part 1"
author: "Gabor Szarnyas"
thumb: "/images/blog/thumbs/240819.svg"
thumb: "/images/blog/thumbs/duckdb-tricks-1.svg"
image: "/images/blog/thumbs/duckdb-tricks-1.png"
excerpt: "We use a simple example data set to present a few tricks that are useful when using DuckDB."
---

Expand Down
273 changes: 273 additions & 0 deletions _posts/2024-10-11-duckdb-tricks-part-2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,273 @@
---
layout: post
title: "DuckDB Tricks – Part 2"
author: "Gabor Szarnyas"
thumb: "/images/blog/thumbs/duckdb-tricks-2.svg"
image: "/images/blog/thumbs/duckdb-tricks-2.png"
excerpt: "We continue our β€œDuckDB tricks” series, focusing on queries that clean, transform and summarize data."
---

## Overview

This post is the latest installment of the [DuckDB Tricks series]({% post_url 2024-08-19-duckdb-tricks-part-1 %}), where we show you nifty SQL tricks in DuckDB.
Here’s a summary of what we’re going to cover:

| Operation | SQL instructions |
|-----------|---------|
| [Fixing timestamps in CSV files](#fixing-timestamps-in-csv-files) | `regexp_replace` and `strptime` |
| [Filling in missing values](#filling-in-missing-values) | `CROSS JOIN`, `LEFT JOIN` and `coalesce` |
| [Repeated data transformation steps](#repeated-data-transformation-steps) | `CREATE OR REPLACE TABLE t AS … FROM t …` |
| [Computing checksums for columns](#computing-checksums-for-columns) | `bit_xor(md5_number(COLUMNS(*)::VARCHAR))` |
| [Creating a macro for the checksum query](#creating-a-macro-for-the-checksum-query) | `CREATE MACRO checksum(tbl) AS TABLE …` |

## Dataset

For our example dataset, we’ll use `schedule.csv`, a hand-written CSV file that encodes a conference schedule. The schedule contains the timeslots, the locations and the events scheduled.

```csv
timeslot,location,event
2024-10-10 9am,room Mallard,Keynote
2024-10-10 10.30am,room Mallard,Customer stories
2024-10-10 10.30am,room Fusca,Deep dive 1
2024-10-10 12.30pm,main hall,Lunch
2024-10-10 2pm,room Fusca,Deep dive 2
```

## Fixing Timestamps in CSV Files

As usual in real use case, the input CSV is messy with irregular timestamps such as `2024-10-10 9am`.
Therefore, if we load the `schedule.csv` file using DuckDB’s CSV reader, the CSV sniffer will detect the first column as a `VARCHAR` field:

```sql
CREATE TABLE schedule_raw AS
SELECT * FROM 'https://duckdb.org/data/schedule.csv';

SELECT * FROM schedule_raw;
```

```text
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ timeslot β”‚ location β”‚ event β”‚
β”‚ varchar β”‚ varchar β”‚ varchar β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 2024-10-10 9am β”‚ room Mallard β”‚ Keynote β”‚
β”‚ 2024-10-10 10.30am β”‚ room Mallard β”‚ Customer stories β”‚
β”‚ 2024-10-10 10.30am β”‚ room Fusca β”‚ Deep dive 1 β”‚
β”‚ 2024-10-10 12.30pm β”‚ main hall β”‚ Lunch β”‚
β”‚ 2024-10-10 2pm β”‚ room Fusca β”‚ Deep dive 2 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

Ideally, we would like the `timeslot` column to have the type `TIMESTAMP` so we can treat it as a timestamp in the queries later. To achieve this, we can use the table we just loaded and fix the problematic entities by using a regular expression-based search and replace operation, which unifies the format to `hours.minutes` followed by `am` or `pm`. Then, we convert the string to timestamps using [`strptime`]({% link docs/sql/functions/dateformat.md %}#strptime-examples) with the `%p` format specifier capturing the `am`/`pm` part of the string.

```sql
CREATE TABLE schedule_cleaned AS
SELECT
timeslot
.regexp_replace(' (\d+)(am|pm)$', ' \1.00\2')
.strptime('%Y-%m-%d %H.%M%p') AS timeslot,
location,
event
FROM schedule_raw;
```

Note that we use the [dot operator for function chaining]({% link docs/sql/functions/overview.md %}#function-chaining-via-the-dot-operator) to improve readability. For example, `regexp_replace(string, pattern, replacement)` is formulated as `string.regexp_replace(pattern, replacement)`. The result is the following table:

```text
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ timeslot β”‚ location β”‚ event β”‚
β”‚ timestamp β”‚ varchar β”‚ varchar β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 2024-10-10 09:00:00 β”‚ room Mallard β”‚ Keynote β”‚
β”‚ 2024-10-10 10:30:00 β”‚ room Mallard β”‚ Customer stories β”‚
β”‚ 2024-10-10 10:30:00 β”‚ room Fusca β”‚ Deep dive 1 β”‚
β”‚ 2024-10-10 12:30:00 β”‚ main hall β”‚ Lunch β”‚
β”‚ 2024-10-10 14:00:00 β”‚ room Fusca β”‚ Deep dive 2 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

## Filling in Missing Values

Next, we would like to derive a schedule that includes the full picture: *every timeslot* for *every location* should have its line in the table. For the timeslot-location combinations, where there is no event specified, we would like to explicitly add a string that says `<empty>`.

To achieve this, we first create a table `timeslot_location_combinations` containing all possible combinations using a `CROSS JOIN`. Then, we can connect the original table on the combinations using a `LEFT JOIN`. Finally, we replace `NULL` values with the `<empty>` string using the [`coalesce` function]({% link docs/sql/functions/utility.md %}#coalesceexpr-).

> The `CROSS JOIN` clause is equivalent to simply listing the tables in the `FROM` clause without specifying join conditions. By explicitly spelling out `CROSS JOIN`, we communicate that we intend to compute a Cartesian product – which is an expensive operation on large tables and should be avoided in most use cases.
```sql
CREATE TABLE timeslot_location_combinations AS
SELECT timeslot, location
FROM (SELECT DISTINCT timeslot FROM schedule_cleaned)
CROSS JOIN (SELECT DISTINCT location FROM schedule_cleaned);

CREATE TABLE schedule_filled AS
SELECT timeslot, location, coalesce(event, '<empty>') AS event
FROM timeslot_location_combinations
LEFT JOIN schedule_cleaned
USING (timeslot, location)
ORDER BY ALL;

SELECT * FROM schedule_filled;
```

```text
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ timeslot β”‚ location β”‚ event β”‚
β”‚ timestamp β”‚ varchar β”‚ varchar β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 2024-10-10 09:00:00 β”‚ main hall β”‚ <empty> β”‚
β”‚ 2024-10-10 09:00:00 β”‚ room Fusca β”‚ <empty> β”‚
β”‚ 2024-10-10 09:00:00 β”‚ room Mallard β”‚ Keynote β”‚
β”‚ 2024-10-10 10:30:00 β”‚ main hall β”‚ <empty> β”‚
β”‚ 2024-10-10 10:30:00 β”‚ room Fusca β”‚ Deep dive 1 β”‚
β”‚ 2024-10-10 10:30:00 β”‚ room Mallard β”‚ Customer stories β”‚
β”‚ 2024-10-10 12:30:00 β”‚ main hall β”‚ Lunch β”‚
β”‚ 2024-10-10 12:30:00 β”‚ room Fusca β”‚ <empty> β”‚
β”‚ 2024-10-10 12:30:00 β”‚ room Mallard β”‚ <empty> β”‚
β”‚ 2024-10-10 14:00:00 β”‚ main hall β”‚ <empty> β”‚
β”‚ 2024-10-10 14:00:00 β”‚ room Fusca β”‚ Deep dive 2 β”‚
β”‚ 2024-10-10 14:00:00 β”‚ room Mallard β”‚ <empty> β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 12 rows 3 columns β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

We can also put everything together in a single query using a [`WITH` clause]({% link docs/sql/query_syntax/with.md %}):

```sql
WITH timeslot_location_combinations AS (
SELECT timeslot, location
FROM (SELECT DISTINCT timeslot FROM schedule_cleaned)
CROSS JOIN (SELECT DISTINCT location FROM schedule_cleaned)
)
SELECT timeslot, location, coalesce(event, '<empty>') AS event
FROM timeslot_location_combinations
LEFT JOIN schedule_cleaned
USING (timeslot, location)
ORDER BY ALL;
```

## Repeated Data Transformation Steps

Data cleaning and transformation usually happens as a sequence of transformations that shape the data into a form that’s best fitted to later analysis.
These transformations are often done by defining newer and newer tables using [`CREATE TABLE … AS SELECT` statements]({% link docs/sql/statements/create_table.md %}#create-table--as-select-ctas).

For example, in the sections above, we created `schedule_raw`, `schedule_cleaned`, and `schedule_filled`. If, for some reason, we want to skip the cleaning steps for the timestamps, we have to reformulate the query computing `schedule_filled` to use `schedule_raw` instead of `schedule_cleaned`. This can be tedious and error-prone, and it results in a lot of unused temporary data – data that may accidentally get picked up by queries that we forgot to update!

In interactive analysis, it’s often better to use the same table name by running [`CREATE OR REPLACE` statements]({% link docs/sql/statements/create_table.md %}#create-or-replace):

```sql
CREATE OR REPLACE TABLE ⟨table_name⟩ AS
…
FROM ⟨table_name⟩
…;
```

Using this trick, we can run our analysis as follows:

```sql
CREATE OR REPLACE TABLE schedule AS
SELECT * FROM 'https://duckdb.org/data/schedule.csv';

CREATE OR REPLACE TABLE schedule AS
SELECT
timeslot
.regexp_replace(' (\d+)(am|pm)$', ' \1.00\2')
.strptime('%Y-%m-%d %H.%M%p') AS timeslot,
location,
event
FROM schedule;

CREATE OR REPLACE TABLE schedule AS
WITH timeslot_location_combinations AS (
SELECT timeslot, location
FROM (SELECT DISTINCT timeslot FROM schedule)
CROSS JOIN (SELECT DISTINCT location FROM schedule)
)
SELECT timeslot, location, coalesce(event, '<empty>') AS event
FROM timeslot_location_combinations
LEFT JOIN schedule_cleaned
USING (timeslot, location)
ORDER BY ALL;

SELECT * FROM schedule;
```

Using this approach, we can skip any step and continue the analysis without adjusting the next one.

What’s more, our script can now be re-run from the beginning without explicitly deleting any tables: the `CREATE OR REPLACE` statements will automatically replace any existing tables.

## Computing Checksums for Columns

It’s often beneficial to compute a checksum for each column in a table, e.g., to see whether a column’s content has changed between two operations.
We can compute a checksum for the `schedule` table as follows:

```sql
SELECT bit_xor(md5_number(COLUMNS(*)::VARCHAR))
FROM schedule;
```

What’s going on here?
We first list columns ([`COLUMNS(*)`]({% link docs/sql/expressions/star.md %}#columns-expression)) and cast all of them to `VARCHAR` values.
Then, we compute the numeric MD5 hashes with the [`md5_number` function]({% link docs/sql/functions/utility.md %}#md5_numberstring) and aggregate them using the [`bit_xor` aggregate function]({% link docs/sql/functions/aggregates.md %}#bit_xorarg).
This produces a single `HUGEINT` (`INT128`) value per column that can be used to compare the content of tables.

If we run this query in the script above, we get the following results:

```text
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ timeslot β”‚ location β”‚ event β”‚
β”‚ int128 β”‚ int128 β”‚ int128 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ -134063647976146309049043791223896883700 β”‚ 85181227364560750048971459330392988815 β”‚ -65014404565339851967879683214612768044 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

```text
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ timeslot β”‚ location β”‚ event β”‚
β”‚ int128 β”‚ int128 β”‚ int128 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 62901011016747318977469778517845645961 β”‚ 85181227364560750048971459330392988815 β”‚ -65014404565339851967879683214612768044 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

```text
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ timeslot β”‚ location β”‚ event β”‚
β”‚ int128 β”‚ int128 β”‚ int128 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ -162418013182718436871288818115274808663 β”‚ 0 β”‚ -135609337521255080720676586176293337793 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

## Creating a Macro for the Checksum Query

We can turn the checksum query into a [table macro]({% link docs/sql/statements/create_macro.md %}#table-macros) with the new [`query_table` function]({% post_url 2024-09-09-announcing-duckdb-110 %}#query-and-query_table-functions):

```sql
CREATE MACRO checksum(table_name) AS TABLE
SELECT bit_xor(md5_number(COLUMNS(*)::VARCHAR))
FROM query_table(table_name);
```

This way, we can simply invoke it on the `schedule` table as follows (also leveraging DuckDB’s [`FROM`-first syntax]({% link docs/sql/query_syntax/from.md %})):

```sql
FROM checksum('schedule');
```

```text
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ timeslot β”‚ location β”‚ event β”‚
β”‚ int128 β”‚ int128 β”‚ int128 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ -134063647976146309049043791223896883700 β”‚ 85181227364560750048971459330392988815 β”‚ -65014404565339851967879683214612768044 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

## Closing Thoughts

That’s it for today!
We’ll be back soon with more DuckDB tricks and case studies. =
In the meantime, if you have a trick that would like to share, please share it with the DuckDB team on our social media sites, or submit it to the [DuckDB Snippets site](https://duckdbsnippets.com/) (maintained by our friends at MotherDuck).
Binary file added images/blog/thumbs/duckdb-tricks-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit dfcd59c

Please sign in to comment.