diff --git a/pdf/ICDE2024-kuiper-boncz-muehleisen-out-of-core.pdf b/pdf/ICDE2024-kuiper-boncz-muehleisen-out-of-core.pdf new file mode 100644 index 00000000000..354cdeddcc9 Binary files /dev/null and b/pdf/ICDE2024-kuiper-boncz-muehleisen-out-of-core.pdf differ diff --git a/why_duckdb.md b/why_duckdb.md index 93852aff22c..6b8f66daad9 100644 --- a/why_duckdb.md +++ b/why_duckdb.md @@ -29,7 +29,7 @@ DuckDB is deeply integrated into Python and R for efficient interactive data ana DuckDB is designed to support **analytical query workloads**, also known as [online analytical processing (OLAP)](https://en.wikipedia.org/wiki/Online_analytical_processing). These workloads are characterized by complex, relatively long-running queries that process significant portions of the stored dataset, for example aggregations over entire tables or joins between several large tables. Changes to the data are expected to be rather large-scale as well, with several rows being appended, or large portions of tables being changed or added at the same time. -To efficiently support this workload, it is critical to reduce the amount of CPU cycles that are expended per individual value. The state of the art in data management to achieve this are either [vectorized or just-in-time query execution engines](https://www.vldb.org/pvldb/vol11/p2209-kersten.pdf). DuckDB contains a **columnar-vectorized query execution engine**, where queries are still interpreted, but a large batch of values (a "vector") are processed in one operation. This greatly reduces overhead present in traditional systems such as PostgreSQL, MySQL or SQLite which process each row sequentially. Vectorized query execution leads to far better performance in OLAP queries. +To efficiently support this workload, it is critical to reduce the amount of CPU cycles that are expended per individual value. The state of the art in data management to achieve this are either [vectorized or just-in-time query execution engines](https://www.vldb.org/pvldb/vol11/p2209-kersten.pdf). DuckDB uses a **columnar-vectorized query execution engine**, where queries are still interpreted, but a large batch of values (a "vector") are processed in one operation. This greatly reduces overhead present in traditional systems such as PostgreSQL, MySQL or SQLite which process each row sequentially. Vectorized query execution leads to far better performance in OLAP queries. ## Extensible @@ -47,6 +47,8 @@ In addition to this test suite, we run various tests that stress DuckDB under he ## Peer-Reviewed Papers and Thesis Works +* [Runtime-Extensible Parsers](https://www.cidrdb.org/cidr2025/papers.html) (CIDR 2025) +* [Robust External Hash Aggregation in the Solid State Age]({% link pdf/ICDE2024-kuiper-boncz-muehleisen-out-of-core.pdf %}) (ICDE 2024) * [These Rows Are Made for Sorting and That's Just What We'll Do]({% link pdf/ICDE2023-kuiper-muehleisen-sorting.pdf %}) (ICDE 2023) * [Join Order Optimization with (Almost) No Statistics](https://blobs.duckdb.org/papers/tom-ebergen-msc-thesis-join-order-optimization-with-almost-no-statistics.pdf) (Master thesis, 2022) * [DuckDB-Wasm: Fast Analytical Processing for the Web]({% link pdf/VLDB2022-kohn-duckdb-wasm.pdf %}) (VLDB 2022 Demo)