Using Materializations with Dremio

Seeds

For an overview of what seeds are, refer to Seeds in the dbt documentation.

Where to Store Them

Object storage
Scratch storage

Configuration	Description
`object_storage_source`	The name of the filesystem in which to store seeds. The dbt alias is `datalake`. The default value is `$scratch`. In Dremio Software, the `$scratch` location points by default to the `/scratch` directory under Dremio's configured distributed cache location (`paths.dist` property in dremio.conf). There are no security protocols or permissions associated with this location -- it is readable and writable by all users. If you are using Dremio Cloud, `$scratch` always points to the `/scratch` directory. You can find more information about this directory here. Non-default values correspond to the name of a source in the Object Storage section of the Datasets page in Dremio:
`object_storage_path`	The path in the filesystem in which to store seeds. The dbt alias is `root_path`. The default is the root level of the filesystem, represented as the value `no_schema`. Nested folders in the path are separated with periods. Non-default values correspond to the path in this location in the Datasets page in Dremio:

Naming Seeds

By default, seeds are given the same name as the file used to create them. Use the file configuration to give a seed a non-default name.

Materializations

Tables

A Dremio table (also known as a physical dataset) is a table materialization that is either created directly from a source or is created from a seed.

Where to Create Them

Object storage
Scratch storage

Configuration	Description
`object_storage_source`	The name of the filesystem in which to create tables. The dbt alias is `datalake`. The default value is `$scratch`. In Dremio Software, the `$scratch` location points by default to the `/scratch` directory under Dremio's configured distributed cache location (`paths.dist` property in dremio.conf). There are no security protocols or permissions associated with this location -- it is readable and writable by all users. If you are using Dremio Cloud, `$scratch` always points to the `/scratch` directory. You can find more information about this directory here. Non-default values correspond to the name of a source in the Object Storage section of the Datasets page in Dremio:
`object_storage_path`	The path in the filesystem in which to create tables. The dbt alias is `root_path`. The default is the root level of the filesystem, represented as the value `no_schema`. Nested folders in the path are separated with periods. Non-default values correspond to the path in this location in the Datasets page in Dremio:

Naming Tables

By default, tables are given the same name as the model used to create them. Use the file configuration to give a table a non-default name. For example, a model named my_basic_table.sql creates by default a table named my_basic_table. With the file configuration, you could create instead a table named my_differently-named-basic-table.

(Optional) Formatting Tables

The format configuration is an optional configuration that determines the format of a persisted table.

For table materializations created from sources, you can specify the format configuration in a project file or a property file. For table materializations created from seeds, you can specify the format configuration in a project file, a property file or a config block. See Model Confgurations in the dbt documentation.

Possible values for both types of table materializations:

iceberg: Formats persisted tables as Apache Iceberg tables.
json: Formats persisted tables as JSON files.
parquet: Formats persisted tables as Parquet files.
text: Formats persisted tables as CSV files.

Additional possible values for table materializations created directly from sources:

delta: Formats source tables as Delta Lake tables.
excel: Formats source tables as Microsoft Excel files.

Additional Settings for the `excel` Value

If using the excel format, there are additional settings; however, these do not show in PDS but rather only applied when running the VDS generated from the twin strategy.

Setting	Description
`extract_header`	Extract the column names from the first line of the file.
`xls`	If the file is in the xls format instead of the xlsx format. Note: This option only works for xls files.
`has_merged_cells`	Expands cells that have been merged in the Excel sheet.

Additional Settings for the `text` Value

If using the text format, there are additional settings; however, these do not show in PDS but rather only applied when running the VDS generated from the twin strategy.

Settings	Description
`field_delimiter`	Field Delimiter Character
`line_delimiter`	Line Delimiter Character
`quote`	Character used to quote. e.g. if the field delimiter character is , and the quote character is ", using "hello, there" would ensure hello and there would not be separated.
`comment`	Character used for comments.
`skip_first_line`	Skip the first line of the file.
`extract_header`	Extract the column names from the first line of the file.
`trim_header`	Trims column names to a currently unknown number of characters.
`auto_generated_column_names`	Create automatic columns if no column names are specified.

(Optional) Partitioning Tables

The partitioning configuration partitions a table materialization.

Option	Value	Default	Comment
`partition_by`	Names of columns, separated by commas	N/A	Mutually exclusive with `distribute_by`
`partition_method`	`striped`, `hash`, `roundrobin`	`striped`	Valid only when used with `partition_by`
`distribute_by`	Names of columns, separated by commas	N/A	Mutually exclusive with `partition_by`
`localsort_by`	Names of columns, separated by commas	N/A	Used for sorting within each partition

Example of Using `partition_by`

{{ config(
    materialized='table', 
    partition_by='cd_credit_rating',
    localsort_by='cd_marital_status',
    partition_method='striped'
) }}

Example of Using `partition_method`

{{ config(
    materialized='table', 
    partition_by='cd_credit_rating',
    localsort_by='cd_marital_status',
    partition_method='roundrobin'
) }}

Example of Using `distribute_by`

{{ config(
    materialized='table', 
    distribute_by=[ 'cd_credit_rating', 'cd_marital_status' ],
    localsort_by='cd_marital_status'
) }}

Examples of `localsort_by`

{{ config(
    materialized='table', 
    partition_by='cd_credit_rating',
    localsort_by='cd_marital_status',
    partition_method='hash'
) }}

{{ config(
    materialized='table', 
    distribute_by=[ 'cd_credit_rating', 'cd_marital_status' ],
    localsort_by='cd_marital_status'
) }}

Incremental Materializations

For an overview of what incremental models are, see Incremental Models in the dbt documentation.

Where to Create Them

Object storage
Scratch storage

Configuration	Description
`object_storage_source`	The name of the filesystem in which to create tables from incremental models. The dbt alias is `datalake`. The default value is `$scratch`. In Dremio Software, the `$scratch` location points by default to the `/scratch` directory under Dremio's configured distributed cache location (`paths.dist` property in dremio.conf). There are no security protocols or permissions associated with this location -- it is readable and writable by all users. If you are using Dremio Cloud, `$scratch` always points to the `/scratch` directory. You can find more information about this directory here. Non-default values correspond to the name of a source in the Object Storage section of the Datasets page in Dremio:
`object_storage_path`	The path in the filesystem in which to create tables from incremental models. The dbt alias is `root_path`. The default is the root level of the filesystem, represented as the value `no_schema`. Nested folders in the path are separated with periods. Non-default values correspond to the path in this location in the Datasets page in Dremio:

Naming Incremental Materializations

By default, incremental materializations are given the same name as the model used to create them. Use the file configuration to give an incremental materialization a non-default name. For example, a model named my_basic_incremental.sql creates by default an incremental materialization named my_basic_incremental. With the file configuration, you could create instead an incremental materialization named my_differently-named-basic-incremental.

Specifying How to Increment

The optional incremental_strategy config tells the incremental model how to insert data into tables that are already built from the model.

append: Insert new records without updating or overwriting any existing data. This is the option that is set by default.
merge: Insert new records and update existing records that match the unique ID provided.

Specifying What to Do When a Schema Changes

The on_schema_change configuration specifies how to update a table built from an incremental model when one or more columns are added to or removed from a schema.

The possible values are:

sync_all_columns
append_new_columns
fail
ignore

(Optional) Formatting Incremental Materializations

The format configuration is an optional configuration that determines the format of an incremental materialization. You can specify the format configuration in a project file, a property file or a config block. See Model Confgurations in the dbt documentation.

iceberg: Formats an incremental materialization as an Apache Iceberg table. This is the only option and is set by default.

(Optional) Partitioning Incremental Materializations

The partitioning configuration partitions an incremental materialization.

Option	Value	Default	Comment
`partition_by`	Names of columns, separated by commas	N/A	Mutually exclusive with `distribute_by`
`partition_method`	`striped`, `hash`, `roundrobin`	`striped`	Valid only when used with `partition_by`
`distribute_by`	Names of columns, separated by commas	N/A	Mutually exclusive with `partition_by`
`localsort_by`	Names of columns, separated by commas	N/A	Used for sorting within each partition

Example of Using `partition_by`

{{ config(
    materialized='table', 
    partition_by='cd_credit_rating',
    localsort_by='cd_marital_status',
    partition_method='striped'
) }}

Example of Using `partition_method`

{{ config(
    materialized='table', 
    partition_by='cd_credit_rating',
    localsort_by='cd_marital_status',
    partition_method='roundrobin'
) }}

Example of Using `distribute_by`

{{ config(
    materialized='table', 
    distribute_by=[ 'cd_credit_rating', 'cd_marital_status' ],
    localsort_by='cd_marital_status'
) }}

Examples of `localsort_by`

{{ config(
    materialized='table', 
    partition_by='cd_credit_rating',
    localsort_by='cd_marital_status',
    partition_method='hash'
) }}

{{ config(
    materialized='table', 
    distribute_by=[ 'cd_credit_rating', 'cd_marital_status' ],
    localsort_by='cd_marital_status'
) }}

Views

A Dremio view (also known as a virtual dataset) is a view materialization that is based on a Dremio table.

Where to Create Them

In Dremio spaces.

Configuration	Description
`dremio_space`	The value of the Dremio space in which to create views. The dbt alias is `database`. The default value is `@<username>`, where `<username>` is the username in Dremio. Non-default values correspond to the name in this location in the Spaces section of the Datasets page in Dremio:
`dremio_space_folder`	The folder in the Dremio space in which to create views. The dbt alias is `schema`. The default is the root level of the space, and is represented by the value `no_schema`. Non-default values correspond to the path in this location in the Datasets page in Dremio: Nested folders are separated with periods. Do not include the root `no_schema` in the paths of views.

Paths in space are recursive, like filesystem folders : dbt.internal."my folder's name". Dots are not allowed in folder names. The root level of a space is represented as no_schema. To materialize a model contained in the folder my_dbt_project/models/examples at the root folder of the space mySpace, you would set the examples configuration in my_dbt_project/dbt_project.yml like this:

models:
  my_dbt_project:
    examples:
      +dremio_space: mySpace
      +space_folder: no_schema

Note that you can also use the equivalent dbt terminology i.e. +database: mySpace, +schema: no_schema

Folders that do not exist are created if the user running a model has the appropriate permission. However, the space must already exist in Dremio before the model is run.

Do not include the root folder no_schema in the paths of views.

Dremio accepts almost any string character in the objects' names. Therefore, the adapter double-quotes each part of the space.folder.name path of a view. If you specify nested folders in the path of a view, each folder's name is double-quoted. Example: "space"."folder"."sub-folder"."sub-sub-folder"."identifier"

Naming Views

By default, views are given the same name as the model used to create them. Use the alias configuration to give a view a non-default name. For example, a model named my_basic_view.sql creates by default a view named my_basic_view. With the alias configuration, you could create instead a view named my_differently-named-basic-view.

Reflections

A reflection is a materialization that corresponds to a reflection in Dremio.

Where to Create Them

You do not specify where to create reflections. Reflections are automatically created in Dremio's reflection store.

Configuring Reflections

To take reflections into account when it comes to metadata, add the following to bottom of your dbt_project.yml file:

vars:
      dremio:reflections_metadata_enabled: true

Please keep in mind that this requires an admin user

The table of configuration options below can be set under vars in your dbt_project.yml file. They can also be set in the reflection model’s sql file. In these two examples, my_anchor is a placeholder for the name of the dataset that the reflection is built from.

Example 1

{{ config(materialized='reflection',
          name='this is the name',
          reflection_type='aggregate', 
          dimensions=['Datetime0'],
          measures=['Num3', 'Int2'], 
          computations=['COUNT','COUNT'])}}
-- depends_on: {{ ref('my_anchor') }}

Example 2

{{ config(materialized='reflection', 
reflection_type='aggregate', 
dimensions=['Datetime0'], 
date_dimensions=['Datetime0'], 
measures=['Num3', 'Int2'], 
computations=['MIN,SUM','COUNT'], 
arrow_cache=true) }}

-- depends_on: {{ ref('my_anchor') }}

The model definition will not contain a SELECT statement, but a simple :

 -- depends_on: {{ ref('my_anchor') }}

Configuration	Reflection type	Value	Default
`name`	Both	Name of the reflection	Model alias if set, else model name
`reflection_type`	Both	`raw`, `aggregate`	raw
`display`	`raw`	Names of columns to include in a reflection, separated by commas	All columns
`dimensions`	`aggregate`	Names of columns to use as dimension columns, separated by commas	All non-decimal, float, and double columns
`date_dimensions`	`aggregate`	Names of the columns to use with a `DATE` granularity	None
`distribute_by`	`aggregate`	Names of the columns to use as Distribution Fields	None
`measures`	`aggregate`	Names of columns to use as measure columns, separated by commas	All decimal, float, and double columns
`computations`	`aggregate`	The list of computations to use. The supported computations are SUM, MIN, MAX, COUNT, and APPROX_COUNT_DISTINCT. This list maps directly to the list of measures.	SUM, COUNT for each numeric measure, COUNT for others

(Optional) Partitioning Reflections

The partitioning configuration partitions a reflection.

Option	Value	Default	Comment
`partition_by`	Names of columns, separated by commas	N/A
`partition_transform`	The list of transformations to use. The supported transformations are YEAR, MONTH, DAY, HOUR, BUCKET(n) and TRUNCATE(n). This list maps directly to the list of partitions.	Original/Identity
`partition_method`	`striped`, `consolidated`	`striped`
`localsort_by`	Names of columns, separated by commas	N/A	Used for sorting within each partition

Example of `partition_by`

{{ config(
    materialized='table', 
    partition_by='cd_credit_rating',
    localsort_by='cd_marital_status',
    partition_method='striped'
) }}

{{ config(
    materialized='reflection', 
    reflection_type='raw', 
    partition_by='cd_credit_rating'
) }}

Example of `partition_method` for Reflections

{{ config(
    materialized='reflection', 
    reflection_type='raw', 
    partition_by='cd_credit_rating',
    partition_method='consolidated'
) }}

(Optional) Twin-Strategy Configuration

Dremio tables can be created only in object storage and Dremio views can be created only in Dremio spaces. Therefore, the adapter uses a twin strategy to determine how to handle creating a view with the same name as an existing table and vice versa.

Configuration	Applicable Materializations	Possible Values	Default
twin_strategy	every materialization but reflection	allow, prevent, clone	clone

The `allow` Strategy

Allows materializations of different types to have the same names. For example, if the materialized view "mySpace"."myFolder"."customers" exists in a space, it is possible to materialize the table "mySource"."myFolder"."customers" in object storage. And vice-versa.

The `prevent` Strategy

Prevents materializations of different types from having the same names. For example, if the materialized view "mySpace"."myFolder"."customers" exists in a space, then materializaing the table "mySource"."myFolder"."customers" in object storage drops "mySpace"."myFolder"."customers". And vice-versa.

The `clone` Strategy

If you materialize a table that has the same name as an existing view, the definition of the view is changed to a select * on the table. For example, if the materialized view "mySpace"."myFolder"."customers" exists in a space, and you materialize the table "mySource"."myFolder"."customers" in object storage, the definition of the view changes to select * from {{ "mySource"."myFolder"."customers" }}.
If you materialize a view that has the same name as an existing table, the view is defined as a select * on the table. For example, if the materialized table "mySource"."myFolder"."customers" exists in object storage, and you materialize the view "mySpace"."myFolder"."customers", the view is defined as select * from {{ "mySource"."myFolder"."customers" }}.

Example of Setting the Twin Strategy in a Model's config Block

{{ config(materialized='table', twin_strategy='prevent') }}

Example of Setting the Twin Strategy in a Project's YAML File

models:
 test_proj:
  # Config indicated by + and applies to all files under models/example/
  example:
   +materialized: view
   +twin_strategy: prevent

Querying External Sources

Dremio can send native SQL queries to external sources set up in Dremio as database sources. Such queries are known as "External Queries".

External queries can be set in models with the following materializations:

view
table
incremental

Prerequisite Steps

Ensure that the external_query configuration option is set to true in the config block in the model. The SQL being ran must not use ref() or source(), instead only [schema.]table paths.
Set which Database Source the external query uses, at least one -- depends_on : {{ source("<source_name>", "<table_name>") }} must be added to the model's SQL.
Create a schema.yml file that specifies the source and source tables.

Example Model

{{ config(
    external_query="true",
    sql_header="/*External Query*/"
) }}

SELECT *
FROM public.countries

-- depends_on : {{ source( "external_source", "external_table") }}

Example `schema.yml` file

version: 2

sources:
    - name: external_source
      database: "postgres"
      tables:
          - name: external_table

If you were to use these two examples, running the model would cause dbt to run this query in Dremio:

create or replace view "@dremio"."select_external_query" as (
    select *
      from table("postgres".external_query('

SELECT *
FROM public.countries

-- depends_on : "postgres"."external_source"."external_table"'))
  )

Using Multiple Environments

A same dremio installation could handle several data environments (dev, prod, etc.). Here is how the profile will look with two different object storage paths:

multiple_targets:
  outputs:
    dev:
      dremio_space: '@user'
      dremio_space_folder: no_schema
      object_storage_path: firstpath
      object_storage_source: myobjectstorage
      password: 
      port: 9047
      software_host: my-test-host
      threads: 1
      type: dremio
      use_ssl: false
      user: 
    prod:
      dremio_space: '@user'
      dremio_space_folder: no_schema
      object_storage_path: secondpath
      object_storage_source: myobjectstorage
      password: 
      port: 9047
      software_host: my-test-host
      threads: 1
      type: dremio
      use_ssl: false
      user: 
  target: dev

Pass in --target prod in the dbt run command to choose the prod target, since dev is the default in the profile above:

 `dbt run --target prod`

Using Materializations with Dremio

Seeds

Where to Store Them

Naming Seeds

Materializations

Tables

Where to Create Them

Naming Tables

(Optional) Formatting Tables

Additional Settings for the excel Value

Additional Settings for the text Value

(Optional) Partitioning Tables

Example of Using partition_by

Example of Using partition_method

Example of Using distribute_by

Examples of localsort_by

Incremental Materializations

Where to Create Them

Naming Incremental Materializations

Specifying How to Increment

Specifying What to Do When a Schema Changes

(Optional) Formatting Incremental Materializations

(Optional) Partitioning Incremental Materializations

Example of Using partition_by

Example of Using partition_method

Example of Using distribute_by

Examples of localsort_by

Views

Where to Create Them

Naming Views

Reflections

Where to Create Them

Configuring Reflections

(Optional) Partitioning Reflections

Example of partition_by

Example of partition_method for Reflections

(Optional) Twin-Strategy Configuration

The allow Strategy

The prevent Strategy

The clone Strategy

Example of Setting the Twin Strategy in a Model's config Block

Example of Setting the Twin Strategy in a Project's YAML File

Querying External Sources

Prerequisite Steps

Example Model

Example schema.yml file

Using Multiple Environments

Clone this wiki locally

Additional Settings for the `excel` Value

Additional Settings for the `text` Value

Example of Using `partition_by`

Example of Using `partition_method`

Example of Using `distribute_by`

Examples of `localsort_by`

Example of Using `partition_by`

Example of Using `partition_method`

Example of Using `distribute_by`

Examples of `localsort_by`

Example of `partition_by`

Example of `partition_method` for Reflections

The `allow` Strategy

The `prevent` Strategy

The `clone` Strategy

Example `schema.yml` file