-
Notifications
You must be signed in to change notification settings - Fork 941
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nfiann-qs-for-dbtcore-duckdb #5783
base: current
Are you sure you want to change the base?
Changes from 167 commits
fcdb45d
d99bbe8
6c406b5
daa3664
dfd2d7d
9e124e9
82f1fab
b5e9e4e
b5f877e
8e87af0
487683b
b93d5a7
1a263c0
737826a
4636a97
71a295a
ecc4644
198b9ed
596f913
2cbd39d
d510e86
433a233
95da0ef
bc67c43
a85abe7
4bed10f
aa47478
77a9a56
3595de9
67df5d1
d5a484d
3f84a70
9bb262a
ce37ba1
67b118d
cbc6d9f
014c807
35da071
3bef381
4fe47e0
16f30e9
5be2a99
8f41cd5
4a6229a
4f044b5
45570bc
13ab426
5e11419
01a44ba
6447c77
a7adfbc
c99ee45
6b7c7f3
d1a6688
50dadb3
838e0b2
9eedf93
4d880c5
7cf068b
695d074
c12c427
5da6811
1b290b1
52df759
5318fed
f66c476
8076fee
9a90e9f
6581ca4
a1a964d
6086855
9f05b09
77bc71d
ac1dfe9
d8b3160
2ff2a2b
c4f939c
10d6a20
854db28
eccb75b
5de815a
41e1912
54800e2
2362f8f
1aa255c
c2fea76
6b3f4dc
449e38c
f4e204b
2f7ec40
9d25f13
6997420
4d68595
7dabecf
4c1303a
7787011
ab914d5
4d886b4
f7db06a
2b30ec9
2458acc
00b70da
7990ade
eb33964
fdfec1d
28db5a4
727e442
1a1ae4b
c2c23f8
b1030c9
1233c91
4881a8c
59c81d5
438ff11
54df96f
8def9ac
ccabf79
0a38455
02cb68f
5019f26
078ff3a
180a391
3313417
7c8c13c
530073c
28937ab
0dc252d
72fef03
0c32afa
c040e8d
b94f3aa
c6e5be2
db47202
352ae56
9dca2bb
d0bb622
8b6a171
ba3a2a1
a279b70
faa2190
dadbcc4
64772e0
cd486de
1d3f007
0478509
e963124
164657c
023ba09
2d26373
9e3fe77
87558ce
e8ace06
aba8b9f
2eb07d0
049b205
e6d5bdb
7a66355
2539105
af6956c
82094b9
f45aced
d840eac
9dcbb36
aa25afd
1882055
6d935f0
5f1b12e
c7857ed
5f7b929
814802f
105f31e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
This file was deleted.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,226 @@ | ||
--- | ||
title: Quickstart for dbt Core using DuckDB | ||
id: duckdb | ||
description: "Learn to use dbt Core using DuckDB." | ||
hoverSnippet: "Learn to use dbt Core using DuckDB." | ||
platform: 'dbt-core' | ||
icon: 'duckdb-seeklogo' | ||
level: 'Beginner' | ||
hide_table_of_contents: true | ||
tags: ['dbt Core','Quickstart'] | ||
--- | ||
|
||
<div style={{maxWidth: '900px'}}> | ||
|
||
## Introduction | ||
|
||
In this quickstart guide, you'll learn how to use dbt Core with DuckDB, enabling you to get set up quickly and efficiently. [DuckDB](https://duckdb.org/) is an open-source database management system which is designed for analytical workloads. It is designed to provide fast and easy access to large datasets, making it well-suited for data analytics tasks. | ||
|
||
|
||
This guide will demonstrate how to: | ||
|
||
- Create a virtual development environment using a template provided by dbt Labs. | ||
- This sets up a fully functional dbt environment with an operational and executable project. The codespace automatically connects to the DuckDB database and loads a year's worth of data from our fictional Jaffle Shop café, which sells food and beverages in several US cities. | ||
- For additional information, refer to the [README](https://github.com/gwenwindflower/octocatalog) for the Jaffle Shop template. It includes instructions on how to do this, along with animated GIFs. | ||
- Run any dbt command from the environment’s terminal. | ||
- Generate a larger dataset for the Jaffle Shop café (for example, five years of data instead of just one). | ||
|
||
You can learn more through high-quality [dbt Learn courses and workshops](https://learn.getdbt.com). | ||
|
||
|
||
### Related content | ||
|
||
|
||
- [DuckDB setup](/docs/core/connect-data-platform/duckdb-setup) | ||
- [Create a GitHub repository](/guides/manual-install?step=2) | ||
- [Build your first models](/guides/manual-install?step=3) | ||
- [Test and document your project](/guides/manual-install?step=4) | ||
- [Schedule a job](/guides/manual-install?step=5) | ||
|
||
|
||
## Prerequisites | ||
|
||
- When using DuckDB with dbt Core, you'll need to use the dbt command-line interface (CLI). Currently, DuckDB is not supported in dbt Cloud. | ||
- It's important that you know some basics of the terminal. In particular, you should understand `cd`, `ls` , and `pwd` to navigate through the directory structure of your computer easily. | ||
- You have a [GitHub account](https://github.com/join). | ||
|
||
## Set up DuckDB for dbt Core | ||
|
||
This section will provide a step-by-step guide for setting up DuckDB for use in local (Mac and Windows) environments and web browsers. | ||
|
||
In the repository, there's a [`requirements.txt`](https://github.com/dbt-labs/jaffle_shop_duckdb) file which is used to install dbt Core, DuckDB, and all other necessary dependencies. You can check this file to see what will be installed on your machine. It's typically located in the root directory of your project. | ||
|
||
The `requirements.txt` file is placed at the top level of your dbt project directory, alongside other key files like `dbt_project.yml`: | ||
|
||
|
||
```shell | ||
|
||
/my_dbt_project/ | ||
├── dbt_project.yml | ||
├── models/ | ||
│ ├── my_model.sql | ||
├── tests/ | ||
│ ├── my_test.sql | ||
└── requirements.txt | ||
|
||
``` | ||
|
||
For more information on the setup of DuckDB, you can refer to [DuckDB setup](/docs/core/connect-data-platform/duckdb-setup). | ||
|
||
<Tabs> | ||
|
||
<TabItem value="local" label="Local"> | ||
|
||
1. First, [clone](https://git-scm.com/docs/git-clone/en) the Jaffle Shop git repository by running the following command in your terminal: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this link returns a 404
nataliefiann marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
|
||
```bash | ||
nataliefiann marked this conversation as resolved.
Show resolved
Hide resolved
|
||
git clone https://github.com/dbt-labs/jaffle_shop_duckdb.git | ||
|
||
``` | ||
nataliefiann marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
nataliefiann marked this conversation as resolved.
Show resolved
Hide resolved
|
||
2. Change into the docs-duckdb directory from the command line: | ||
|
||
```shell | ||
|
||
cd jaffle_shop_duck_db | ||
|
||
``` | ||
|
||
|
||
3. Install dbt Core and DuckDB in a virtual environment. | ||
|
||
<Expandable alt_header="Example for Mac" > | ||
|
||
```shell | ||
|
||
python3 -m venv venv | ||
source venv/bin/activate | ||
python3 -m pip install --upgrade pip | ||
python3 -m pip install -r requirements.txt | ||
nataliefiann marked this conversation as resolved.
Show resolved
Hide resolved
|
||
source venv/bin/activate | ||
|
||
``` | ||
</Expandable> | ||
|
||
<Expandable alt_header="Example for Windows" > | ||
|
||
```shell | ||
|
||
python -m venv venv | ||
venv\Scripts\activate.bat | ||
python -m pip install --upgrade pip | ||
python -m pip install -r requirements.txt | ||
venv\Scripts\activate.bat | ||
|
||
``` | ||
|
||
</Expandable> | ||
|
||
<Expandable alt_header="Example for Windows PowerShell" > | ||
|
||
```shell | ||
|
||
python -m venv venv | ||
venv\Scripts\Activate.ps1 | ||
python -m pip install --upgrade pip | ||
python -m pip install -r requirements.txt | ||
venv\Scripts\Activate.ps1 | ||
|
||
``` | ||
</Expandable> | ||
|
||
|
||
4. Ensure your profile is setup correctly from the command line by running the following: | ||
|
||
|
||
- [dbt compile](https://docs.getdbt.com/reference/commands/compile) — generates executable SQL from your project source files | ||
- [dbt run](https://docs.getdbt.com/reference/commands/run) — compiles and runs your project | ||
- [dbt test](https://docs.getdbt.com/reference/commands/test) — compiles and tests your project | ||
- [dbt build](https://docs.getdbt.com/reference/commands/build) — compiles, runs, and tests your project | ||
- [dbt docs generate](/reference/commands/cmd-docs#dbt-docs-generate) — generates your project's documentation. | ||
- [dbt docs serve](/reference/commands/cmd-docs#dbt-docs-serve) — starts a webserver on port 8080 to serve your documentation locally and opens the documentation site in your default browser. | ||
|
||
nataliefiann marked this conversation as resolved.
Show resolved
Hide resolved
|
||
For complete details, refer to the [dbt command reference](/reference/dbt-commands). | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it might be beneficial for both tabs (local and web browser) to add a screenshot or code snippet of what successful output looks like. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. e.g. a shorter version of this (its too long)
|
||
:::note | ||
|
||
The steps will fail if you decide to run this project in your data warehouse (outside of this DuckDB demo). You will need to reconfigure the project files for your warehouse. Definitely consider this if you are using a community-contributed adapter. | ||
|
||
::: | ||
|
||
|
||
### Troubleshoot | ||
|
||
|
||
|
||
<Expandable alt_header="Could not set lock on file error" > | ||
|
||
```Jinja | ||
|
||
IO Error: Could not set lock on file "jaffle_shop.duckdb": Resource temporarily unavailable | ||
|
||
``` | ||
|
||
This is a known issue in DuckDB. Try disconnecting from any sessions that are locking the database. If you are using DBeaver, this means shutting down DBeaver (disconnecting doesn't always work). | ||
|
||
As a last resort, deleting the database file will get you back in action (_but_ you will lose all your data). | ||
|
||
</Expandable> | ||
|
||
|
||
</TabItem> | ||
|
||
<TabItem value="web" label="Web browser"> | ||
|
||
1. Go to the `jaffle-shop-template` [repository](https://github.com/dbt-labs/jaffle-shop-template) after you log in to your GitHub account. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this the right template we want to share? it looks like its archived? should this be used instead? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if the link is incorrect, then do the steps in the 'web browser' tab need changing? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hey @nataliefiann , i noticed you changed the link but i would recommend triple checking this with the sme
nataliefiann marked this conversation as resolved.
Show resolved
Hide resolved
|
||
1. Click **Use this template** at the top of the page and choose **Create new repository**. | ||
1. Click **Create repository from template** when you’re done setting the options for your new repository. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. are the instructions here updated? it says 'codespace' environment but this is for duckdb? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for this Mirna, I'll shoot a message across to Anders about this |
||
1. Click **Code** (at the top of the new repository’s page). Under the **Codespaces** tab, choose **Create codespace on main**. Depending on how you've configured your computer's settings, this either opens a new browser tab with the Codespace development environment with VSCode running in it or opens a new VSCode window with the codespace in it. | ||
1. Wait for the codespace to finish building by waiting for the `postCreateCommand` command to complete; this can take several minutes: | ||
|
||
<Lightbox src="/img/codespace-quickstart/postCreateCommand.png" title="Wait for postCreateCommand to complete" /> | ||
|
||
When this command completes, you can start using the codespace development environment. The terminal the command ran in will close and you will get a prompt in a brand new terminal. | ||
|
||
1. At the terminal's prompt, you can execute any dbt command you want. For example: | ||
|
||
```shell | ||
/workspaces/test (main) $ dbt build | ||
``` | ||
|
||
You can also use the [duckcli](https://github.com/dbcli/duckcli) to write SQL against the warehouse from the command line or build reports in the [Evidence](https://evidence.dev/) project provided in the `reports` directory. | ||
|
||
For complete information, refer to the [dbt command reference](https://docs.getdbt.com/reference/dbt-commands). Common commands are: | ||
|
||
- [dbt compile](https://docs.getdbt.com/reference/commands/compile) — generates executable SQL from your project source files | ||
- [dbt run](https://docs.getdbt.com/reference/commands/run) — compiles and runs your project | ||
- [dbt test](https://docs.getdbt.com/reference/commands/test) — compiles and tests your project | ||
- [dbt build](https://docs.getdbt.com/reference/commands/build) — compiles, runs, and tests your project | ||
|
||
|
||
</TabItem> | ||
|
||
</Tabs> | ||
|
||
|
||
## Generate a larger data set | ||
|
||
If you'd like to work with a larger selection of Jaffle Shop data, you can generate an arbitrary number of years of fictitious data from within your codespace. | ||
|
||
1. Install the Python package called [jafgen](https://pypi.org/project/jafgen/). At the terminal's prompt, run: | ||
|
||
```shell | ||
/workspaces/test (main) $ python -m pip install jafgen | ||
``` | ||
|
||
1. When installation is done, run: | ||
```shell | ||
/workspaces/test (main) $ jafgen --years NUMBER_OF_YEARS | ||
``` | ||
Replace `NUMBER_OF_YEARS` with the number of years you want to simulate. This command builds the CSV files and stores them in the `jaffle-data` folder, and is automatically sourced based on the `sources.yml` file and the [dbt-duckdb](/docs/core/connect-data-platform/duckdb-setup) adapter. | ||
|
||
As you increase the number of years, it takes exponentially more time to generate the data because the Jaffle Shop stores grow in size and number. For a good balance of data size and time to build, dbt Labs suggests a maximum of 6 years. | ||
|
||
</div> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.