Skip to content

Commit

Permalink
fix(tests): make re_assert available (#2)
Browse files Browse the repository at this point in the history
**Features**
/

**Fixes**
- fix `re_assert` argument in `generic_assertions()` test.
  `assertions()` macros supports private `_node` argument for non-default model parsing.
- add a basic test example.
- models are not enabled by default.
- doc: finalize first `README.md` version.

**Breaking Changes**
/

Issue #2
  • Loading branch information
axel_thevenot authored and AxelThevenot committed Jan 22, 2024
1 parent ec144f5 commit 34d92d6
Show file tree
Hide file tree
Showing 10 changed files with 199 additions and 81 deletions.
75 changes: 65 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,10 @@ Granular row-by-row error detection identifies and flags specific rows that fail

Easy-to-use macros `assertions()` and `assertions_filter()` empower users to customize without barriers data quality checks within the model YAML definition, adapting to specific data validation needs.

🚀 **(coming soon) An Easy Shift from your Actual Workflows**
🚀 **An Easy Shift from your Actual Workflows**

A generic test `xxx()` to perform dbt tests as usual, testing the package easily without compromising your current workflows.
A generic test `generic_assertions()` to perform dbt tests as usual, testing the package easily without compromising your current workflows.
**you can test the package with this generic test easily without having to rebuild you table**


## Content
Expand All @@ -46,7 +47,7 @@ A generic test `xxx()` to perform dbt tests as usual, testing the package easily
- [assertions](#assertions)
- [assertions\_filter](#assertions_filter)
- [Tests](#tests)
- [what\_ever\_the\_test\_name](#what_ever_the_test_name)
- [generic\_assertions](#generic_assertions)
- [Model definition](#model-definition)
- [Yaml general definition](#yaml-general-definition)
- [Custom assertions](#custom-assertions)
Expand All @@ -61,7 +62,7 @@ A generic test `xxx()` to perform dbt tests as usual, testing the package easily

## Install

`dbt-assertions` currently supports `dbt 1.2.x` or higher.
`dbt-assertions` currently supports `dbt 1.7.x` or higher.


Check [dbt github package](https://hub.getdbt.com/calogica/dbt_expectations/latest/) for the latest installation instructions, or [read the docs](https://docs.getdbt.com/docs/package-management) for more information on installing packages.
Expand All @@ -71,15 +72,14 @@ Include in `packages.yml`
```yaml
packages:
- git: https://github.com/AxelThevenot/dbt-assertions.git
revision: 0.1.0a
revision: 0.1.1
# <see https://github.com/AxelThevenot/dbt-assertions/releases/latest> for the latest version tag
```

This package supports:

* BigQuery
* default (not tested on other databases, do not hesitate to contribute! ❤️
)
* default (not tested on other databases, do not hesitate to contribute! ❤️)

For latest release, see [https://github.com/AxelThevenot/dbt-assertions/releases](https://github.com/AxelThevenot/dbt-assertions/releases)

Expand Down Expand Up @@ -146,7 +146,7 @@ FROM final
`assertions_filter()` macro generates an expression to filter rows based on errors generated with the [`assertions()`](#assertions) macro.

**Arguments:**
- **from_column (optional[str])**: column to read the failed assertions from.
- **from_column (optional[str]):** column to read the failed assertions from.
- **whitelist (optional[list[str]]):** A list of error IDs to whitelist.
If provided, only rows with with no error, ignoring whitelist error IDs, will be included.
- **blacklist (optional[list[str]]):** A list of error IDs to blacklist.
Expand Down Expand Up @@ -174,10 +174,60 @@ FROM {{ ref('my_model') }}
WHERE {{ dbt_assertions.assertions_filter(whitelist=['assertions_id']) }}
```


### Tests

#### [what_ever_the_test_name](tests/generic/what_ever_the_test_name.sql)
#### [generic_assertions](tests/generic/generic_assertions.sql)

Generates a test to get rows based on errors.

It will returns the rows without any error by default.
You can change this default behaviour specifying a whitelist or blacklist (not both).

You must defined beforehand the assertions for the model. [More on YAML definition for assertions](#yaml-general-definition).

**Arguments:**
- **from_column (optional[str]):** column to read the failed assertions from.
- **whitelist (optional[list[str]]):** A list of error IDs to whitelist.
If provided, only rows with with no error, ignoring whitelist error IDs, will be included.
- **blacklist (optional[list[str]]):** A list of error IDs to blacklist.
If provided, rows with at least one of these error IDs will be excluded.
- **re_assert (optional[bool]):** to set to `true` if your assertion field do not exists yet in your table.

Configure the generic test in schema.yml with:

```yml
model:
name: my_model
tests:
- dbt_assertions.generic_assertions:
[from_column: <column_name>]
[whitelist: <list(str_to_filter)>]
[blacklist: <list(str_to_filter)>]
[re_assert: true | false]

columns:
...
```
`[]` represents optional parts. Yes everything is optional but let's see it by examples.

In the [basic test example](./models/examples/basic_test_example/) you can easily create your test as follows then run your `dbt test` command.

```yml
models:
- name: basic_test_example_d_site
tests:
- dbt_assertions.generic_assertions:
from_column: errors
blacklist:
- site_id_is_not_null
# `re_assert: true` to use only if your assertion's column
# is not computed and saved in your table.
re_assert: true

columns:
...
```
### Model definition
Expand Down Expand Up @@ -406,8 +456,13 @@ FROM {{ ref('my_model') }}

## Contribution

If you want to contribute, please open a Pull Request or an Issue on this repo.
Feel free to reach me [Linkedin](https://www.linkedin.com/in/axel-thevenot/).

## Acknowledgments

Special thank to @vvaneeclo for its help !!

## Contact

If you have any question, please open a new Issue or feel free to reach out to [Linkedin](https://www.linkedin.com/in/axel-thevenot/)
Expand Down
9 changes: 6 additions & 3 deletions macros/assertions.sql
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{%- macro assertions(from_column='errors') %}
{%- macro assertions(from_column='errors', _node=none) %}
{#-
Generates row-level assertions based on the schema model YAML for error tracking.

Expand All @@ -8,6 +8,8 @@

Args:
from_column (optional[str]): column to read the assertions from.
_node (dict): any other node to read the columns from.
This argument is reserved to `dbt-assertions`'s developers.
Returns:
str: An ARRAY<STRING> SELECT expression containing error ID for rows that violate assertions.
Expand Down Expand Up @@ -71,13 +73,14 @@
- `null_as_error` (default to false) the trigger error rule if the expression is evaluated to NULL.
- The resulting array is named with `from_column` and can be used in subsequent transformations.
#}
{{- adapter.dispatch('assertions', 'dbt_assertions') (from_column) }}
{{- adapter.dispatch('assertions', 'dbt_assertions') (from_column, _node) }}
{%- endmacro %}
{%- macro default__assertions(from_column) %}
{%- macro default__assertions(from_column, _node) %}
{#- Parses the assertions if exists. #}
{%- set model = model if _node is none else _node %}
{%- set columns = model.columns if ('columns' in model) else {} %}
{%- set assertions_column = columns[from_column] if (from_column in columns) else {} %}
{%- set assertions = assertions_column.assertions if ('assertions' in assertions_column) else {} %}
Expand Down
50 changes: 0 additions & 50 deletions models/examples/basic_example/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,53 +76,3 @@ WHERE {{ dbt_assertions.assertions_filter(blacklist=['site_id_is_not_null']) }}


![basic_example_d_site](../../../img/basic_example_downstream_model.png)

### Combine assertions & generic tests

#### Example usage

Suppose we are working with the `d_site` table - you want to use generic tests.

Configure the generic test in schema.yml with:

```yml
model:
name: my_model
tests:
- dbt_assertions.generic_assertions:
from_column: <column_name>
whitelist: [list(str_to_filter)]
blacklist: [list(str_to_filter)]
re_assert: true | false

columns:
...
```
Note: 'generic_assertions()' is the name given to the test in 'tests/generic' - you can change its name if needed.
For instance, the blacklist argument will filter all rows containing at least a "key_1_not_null" error:
```yml
model:
name: my_model
tests:
- dbt_assertions.generic_assertions:
from_column:
whitelist:
blacklist: ["key_1_not_null"]
re_assert:

columns:
...
assertions:
key_1_not_null:
description: "key_1 is not null."
expression: "key_1 IS NOT NULL"

key_2_not_null:
description: "key_2 is not null."
expression: "key_2 IS NOT NULL"
```
You can also use the whitelist & from_columns arguments, or use the function without arguments (and thus filtering).
2 changes: 1 addition & 1 deletion models/examples/basic_example/basic_example_d_site.sql
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{{
config(alias='d_site', materialized='table', enable=false)
config(alias='d_site', materialized='table', enabled=false)
}}

WITH
Expand Down
6 changes: 1 addition & 5 deletions models/examples/basic_example/basic_example_d_site.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,6 @@ version: 2

models:
- name: basic_example_d_site
tests:
- dbt_assertions.generic_assertions:
blacklist: ["site_id_is_not_null"]

columns:
- name: site_id
- name: country_trigram
Expand All @@ -20,4 +16,4 @@ models:
description: 'Site trigram must contain 3 upper digits'
expression: |
LENGTH(site_trigram) = 3
AND site_trigram = UPPER(site_trigram)
AND site_trigram = UPPER(site_trigram)
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{{
config(alias='downstream_model', materialized='table', enable=false)
config(alias='downstream_model', materialized='table', enabled=false)
}}

SELECT
Expand Down
40 changes: 40 additions & 0 deletions models/examples/basic_test_example/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
### Combine assertions & generic tests

#### Example usage

Suppose we are working with the `d_site` table - you want to use generic tests.

For instance, the blacklist argument will filter all rows containing at least a "key_1_not_null" error:

```yml
version: 2

models:
- name: basic_test_example_d_site
tests:
- dbt_assertions.generic_assertions:
from_column: errors
blacklist:
- site_id_is_not_null
# `re_assert: true` to use only if your assertion's column
# is not computed and saved in your table.
re_assert: true

columns:
- name: site_id
- name: country_trigram
- name: open_date
- name: errors
assertions:
site_id_is_not_null:
description: 'Site ID is not null.'
expression: site_id IS NOT NULL

site_trigram_format:
description: 'Site trigram must contain 3 upper digits'
expression: |
LENGTH(site_trigram) = 3
AND site_trigram = UPPER(site_trigram)
```
You can also use the `whitelist` and `from_columns`` arguments, or use the function without arguments (and thus filtering each row based on every assertion).
15 changes: 15 additions & 0 deletions models/examples/basic_test_example/basic_test_example_d_site.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{{
config(alias='d_site_test', materialized='table', enabled=false)
}}

WITH
final AS (
SELECT 1 AS site_id, 'FRA' AS site_trigram, DATE('2023-01-01') AS open_date
UNION ALL
SELECT 2 AS site_id, 'France' AS site_trigram, DATE('2023-01-01') AS open_date
UNION ALL
SELECT NULL AS site_id, 'Belgium' AS site_trigram, DATE('2023-01-01') AS open_date
)
SELECT
*,
FROM `final`
28 changes: 28 additions & 0 deletions models/examples/basic_test_example/basic_test_example_d_site.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
version: 2

models:
- name: basic_test_example_d_site
tests:
- dbt_assertions.generic_assertions:
from_column: errors
blacklist:
- site_id_is_not_null
# `re_assert: true` to use only if your assertion's column
# is not computed and saved in your table.
re_assert: true

columns:
- name: site_id
- name: country_trigram
- name: open_date
- name: errors
assertions:
site_id_is_not_null:
description: 'Site ID is not null.'
expression: site_id IS NOT NULL

site_trigram_format:
description: 'Site trigram must contain 3 upper digits'
expression: |
LENGTH(site_trigram) = 3
AND site_trigram = UPPER(site_trigram)
Loading

0 comments on commit 34d92d6

Please sign in to comment.