Teghan/0.3.0 (#9)

tnightengale · web-flow · commit aaa7f7ccaec5 · 2021-05-03T23:14:08.000-04:00
* fix ephemeral issue, fix models errors, add gitignore

* revert fetch_configured_models

* rm validate required tests

* add regex test filters

* update README

* update README, check data test naming

* grammar on README
diff --git a/README.md b/README.md
@@ -7,8 +7,8 @@ This dbt package contains macros to assert test and documentation coverage from
 ## Table of Contents
   - [Install](#install)
   - [Configurations](#configurations)
-    - [**Required Tests**](#required-tests)
-    - [**Required Docs**](#required-docs)
+    - [Required Tests](#required-tests)
+    - [Required Docs](#required-docs)
   - [Usage](#usage)
     - [required_tests (source)](#required_tests-source)
     - [required_docs (source)](#required_docs-source)
@@ -22,7 +22,7 @@ Include in `packages.yml`:
 ```yaml
 packages:
   - package: tnightengale/dbt_meta_testing
-    version: 0.2.1
+    version: 0.3.0
 ```
 For latest release, see
 https://github.com/tnightengale/dbt-meta-testing/releases.
@@ -33,44 +33,59 @@ This package features two meta configs that can be applied to a dbt project:
 [here](https://docs.getdbt.com/reference/model-configs) to learn more about
 model configurations in dbt.
 
-### **Required Tests**
+### Required Tests
 To require test coverage, define the `+required_tests` configuration on a model
 path in `dbt_project.yml`:
 ```yaml
 # dbt_project.yml
 ...
 models:
-    project:
-        staging:
-            +required_tests: {"unique": 1, "not_null": 1}
-        marts:
-            +required_tests: {"unique": 1}
+  project:
+    +required_docs: true
+    marts:
+      +required_tests: {"unique.*|not_null": 1}
+      model_2:
+        +required_tests:
+          "mocker.*|unique": 1
+          "mock_schema_test": 1
+          ".*data_test": 1 
 ```
 
-The `+required_tests` config must be either a `dict` or `None`. All the regular
+The `+required_tests` config must be `None` or a `dict` with `str` keys and `int`
+values. YAML dictionaries are accepted.
+
+All the regular
 dbt configuration hierarchy rules apply. For example, individual model configs
 will override configs from the `dbt_project.yml`:
 ```sql
--- /models/marts/core/your_model.sql
-{{
-    config(required_tests=None)
-}}
+# /models/marts/core/your_model.sql
+
+-- This overrides the config in dbt_project.yml, and this model will not require tests
+{{ config(required_tests=None) }}
 
 SELECT
 ...
 ```
-The provided dictionary can contain any column schema test as a key, followed by
-the minimum number of occurances which must be included on the model. In the
-example above, every model in the `models/marts/` path must include at least one
-`unique` test.
-
-Custom column-level schema tests are supported. However, in order to appear in
-the `graph` context variable (which this package parses), they must be applied
-to at least one model in the project prior to compilation. 
-
-Model-level schema tests are currently _not supported_. For example the
-following model-level `dbt_utils.equal_rowcount` test _cannot_ currently be
-asserted via the configuration:
+> **_New in Version 0.3.0_**
+
+The keys of the config are evaluated against both data and schema tests
+(including any custom tests) using the
+[re.match](https://docs.python.org/3/library/re.html#re.match) function.
+
+Therefore, any test restriction which can be expressed in regex can  be
+evaluated. 
+
+For example, in the `dbt_project.yml` above, the path configuration on the `marts` model path
+requires each model in that path to have at least one test that either _starts
+with_ `unique` **or** is an _exact match_ for the `not_null` test.
+
+Schema tests are matched against their common names, (eg. `not_null`,
+`accepted_values`). 
+
+Data tests are matched against their macro name. 
+
+Custom schema tests are matched against their name, without the `test_` prefix, eg. `mock_schema_test`:
+
 ```yaml   
 # models/schema.yml
 ...
@@ -88,7 +103,8 @@ asserted via the configuration:
                 - mock_schema_test
 ```
 
-Models that do not meet their configured test minimums will be listed in the
+Models that do not meet their configured test minimums, because they either lack
+the tests or are not documented, will be listed in the
 error when validated via a `run-operation`:
 ```
 usr@home dbt-meta-testing $ dbt run-operation required_tests
@@ -104,7 +120,7 @@ Encountered an error while running operation: Compilation Error in macro require
 usr@home dbt-meta-testing $ 
 ```
 
-### **Required Docs**
+### Required Docs
 To require documentation coverage, define the `+required_docs` configuration on
 a model path in `dbt_project.yml`:
 ```yaml
@@ -114,7 +130,9 @@ models:
     project:
         +required_docs: true
 ```
-The `+required_docs` config must be a `bool`. It also **does not check ephemeral
+The `+required_docs` config must be a `bool`. 
+
+It also **does not check ephemeral
 models**. This is because it cannot leverage `adapter.get_columns_in_relation()`
 macro on ephemeral models, which it uses to fetch columns from the data
 warehouse and detect columns without documentation. 
diff --git a/integration_tests/dbt_project.yml b/integration_tests/dbt_project.yml
@@ -22,6 +22,11 @@ models:
       +required_tests: true
     marts:
       +required_tests: {"unique": 1}
+      model_2:
+        +required_tests: 
+          "mocker.*|unique": 1
+          "mock_schema_test": 1
+          ".*data_test": 1 
 
 vars:
   running_intergration_tests: true
diff --git a/integration_tests/tests/mocker_data_test.sql b/integration_tests/tests/mocker_data_test.sql
@@ -0,0 +1,6 @@
+
+
+select
+    *
+from {{ ref("model_2") }} 
+where new != 'a'
diff --git a/macros/utils/errors/error_invalid_config_missing_test.sql b/macros/utils/errors/error_invalid_config_missing_test.sql
diff --git a/macros/utils/required_tests/evaluate_required_tests.sql b/macros/utils/required_tests/evaluate_required_tests.sql
@@ -7,61 +7,32 @@
     {# /*
     Evaluate if each model meets +required_tests minimum.
     */ #}
+    
     {% set tests_per_model = dbt_meta_testing.tests_per_model() %}
     {% set test_errors = [] %}
 
-
-
-    {{ dbt_meta_testing.logger("models_to_evaluate: " ~ models_to_evaluate | map(attribute="name") | list) }}
     {% for model in models_to_evaluate %}
+        {% for test_key in model.config.required_tests.keys() %}
 
-        -- If required_tests is dictionary
-        {% if model.config.required_tests is mapping %}
-        {{ dbt_meta_testing.logger(model.name ~ " if reached") }}
-
-            {% for test_key in model.config.required_tests.keys() %}
-
-                -- If the model has less tests than required by the config
-                {% set full_model = model.unique_id %}
-                
-                {{ dbt_meta_testing.logger('tests per model: ' ~ tests_per_model) }}
-
-                -- models that are not declared in properties files will not have keys in tests_per_model
-                {% set provided_test_count = tests_per_model.get(full_model, {}).get(test_key, []) | length %}
-                {{ dbt_meta_testing.logger('provided_test_count: ' ~ provided_test_count) }}
-
-                {% set required_test_count = model.config.required_tests[test_key] %}
-
-                {{ dbt_meta_testing.logger(
-                    "test_key_loop | test_key: " ~ test_key ~ 
-                    " model: " ~ model.name ~
-                    " provided_test_count: " ~ provided_test_count ~
-                    " required_test_count: " ~ required_test_count) }}
-
-                {% if provided_test_count < required_test_count %} 
+            {% set provided_test_list = tests_per_model[model.unique_id] %}
 
-                    {% do test_errors.append((model.name, test_key, provided_test_count, required_test_count)) %}
-
-                {% endif %}
+            {% set required_test_count = model.config.required_tests[test_key] %}
+            {% set matching_test_count = dbt_meta_testing.get_regex_match_count(provided_test_list, test_key) %}
             
-            {% endfor %}
-        
-        {% endif %}
-
+            {% if matching_test_count < required_test_count %} 
+                {% do test_errors.append((model.name, test_key, matching_test_count, required_test_count)) %}
+            {% endif %}
+            
+        {% endfor %}
     {% endfor %}
 
 
     {% if test_errors | length > 0 %}
-
         {% set result = dbt_meta_testing.error_required_tests(test_errors) %}
-
     {% else %}
-
         {% set result = none %}
-
     {% endif %}
 
-
     {{ return(result) }}
 
 {% endmacro %}
diff --git a/macros/utils/required_tests/get_regex_match_count.sql b/macros/utils/required_tests/get_regex_match_count.sql
@@ -0,0 +1,16 @@
+{% macro get_regex_match_count(list_of_strings, regex_to_check) %}
+	{{ return(adapter.dispatch("get_regex_match_count", packages=dbt_meta_testing._get_meta_test_namespaces())(list_of_strings, regex_to_check))}}
+{% endmacro %}
+
+{% macro default__get_regex_match_count(list_of_strings, regex_to_check) %}
+
+    {# Return count of strings in list_of_strings that match regex_to_check #}
+    {% set matches = [] %}
+    {% for string in list_of_strings %}
+        {% set match = modules.re.match(regex_to_check, string) %}
+        {% if match %}{% do matches.append(match) %}{% endif %}
+    {% endfor %}
+
+    {% do return(matches | length) %}
+
+{% endmacro %}
diff --git a/macros/utils/required_tests/tests_per_model.sql b/macros/utils/required_tests/tests_per_model.sql
@@ -9,52 +9,23 @@
     Construct a dict of all models and their schema tests in the current project.
     */ #}
 
-    {% set tests_per_model = {} %}
-    {% set all_tests = dbt_meta_testing.fetch_configured_models("enabled", resource_type="test") %}
-
-    -- currently only parsing schema tests
-    {% set schema_tests = all_tests | selectattr("test_metadata", "defined") | list %}
-
-    {% for test_node in schema_tests %}
-
-        {{ dbt_meta_testing.logger('loop ' ~ loop.index ~ ' test_node ' ~ test_node["test_metadata"]["name"]) }}
-
-        {% for dependent_model in test_node.depends_on.nodes %}
-            {% if dependent_model.startswith('model.') %}
-
-                -- if the model has been encountered before
-                {% if dependent_model in tests_per_model.keys() %}
-
-                    -- If the test on this model has been encountered before
-                    {% if test_node["test_metadata"]["name"] in tests_per_model[dependent_model].keys() %}
-                        {% do tests_per_model[dependent_model][test_node["test_metadata"]["name"]].append(test_node.unique_id) %}
-                    {% else %} -- Add this test to the list of encountered tests
-                        {% do tests_per_model[dependent_model].setdefault(test_node["test_metadata"]["name"], [test_node.unique_id]) %}
-                    {% endif %}
-
-                {% else %}
-
-                    {% do tests_per_model.setdefault(dependent_model, {test_node["test_metadata"]["name"]: [test_node.unique_id]}) %}
-
-                {% endif %}
-
+    {% set enabled_model_names = dbt_meta_testing.fetch_configured_models("enabled", resource_type="model") | map(attribute="unique_id") | list %}
+    {% set enabled_test_nodes = dbt_meta_testing.fetch_configured_models("enabled", resource_type="test") %}
+    
+    -- Create `result` dict with all enabled models unique_id's as keys and empty lists as values
+    {% set result = {} %}
+    {% for id in enabled_model_names %}{% do result.update({id: []}) %}{% endfor %}
+    
+    {% for test_node in enabled_test_nodes %}
+        {% for dependent_node in test_node.depends_on.nodes %}
+            {% if dependent_node.startswith('model.') %}
+                -- Use common names for schema tests, (e.g. "unique") under the "test_metadata" key
+                {% set test_identifier = test_node.get("test_metadata",{}).get("name") or test_node["name"] %}
+                {% do result[dependent_node].append(test_identifier) %}
             {% endif %}
-
         {% endfor %}
-
     {% endfor %}
 
-     -- Create dict of empty dict with model unique_id as key to ensure all models are included
-    {% set model_unique_ids = dbt_meta_testing.fetch_configured_models("enabled", resource_type="model") | 
-            map(attribute="unique_id") | list %}
-    {% set result = {} %}
-    {% for id in model_unique_ids %}
-        {% do result.update({id: {}}) %}
-    {% endfor %}
-
-    -- overwrite empty dicts if they have tests
-    {% do result.update(tests_per_model) %}
-
     {% do return(result) %}
 
 {% endmacro %}
diff --git a/macros/utils/required_tests/validate_required_tests.sql b/macros/utils/required_tests/validate_required_tests.sql
@@ -65,18 +65,6 @@
 
     {% endfor %}
 
-
-    -- Validate all configured tests are defined 
-    {% for required_test in unique_required_tests %}
-
-        {% if required_test not in unique_defined_tests %}
-
-            {{ return(dbt_meta_testing.error_invalid_config_missing_test(required_test)) }}
-
-        {% endif %}
-
-    {% endfor %}
-
     {{ return(none) }}
 
 {% endmacro %}