2022-12-30

dbt test

Introduction

dbt provides the ability to test models. Tests are used to detect the inclusion of low-quality data and to maintain data quality.

There are two ways to define tests

  • Singular tests
  • Generic tests

After defining a test, run the command dbt test to execute the test.

Singular tests

Singular tests are to write test code as a .sql file in the directory specified in test-paths in dbt_project.yml (/tests by default).

The test code should contain SELECT statements that search for rows that should not exist.

For example, if you have a model called fct_payments and you want to test if amount is a positive value, you would write the following SQL.

tests/assert_amount_is_positive.sql
-- Refunds have a negative amount, so the total amount should always be >= 0.
-- Therefore return records where this isn't true to make the test fail
select
  order_id,
  sum(amount)
from {{ ref('fct_payments')}}
group by 1
having not(amount >= 0)

The dbt test command executes the test code, and if a row is retrieved, the test fails. If the test fails, it will look like this.

$ dbt test
14:19:07  Running with dbt=1.0.4
14:19:07  Found 1 model, 1 test, 0 snapshots, 0 analyses, 165 macros, 0 operations, 0 seed files, 2 sources, 0 exposures, 0 metrics
14:19:07
14:19:07  Concurrency: 1 threads (target='dev')
14:19:07
14:19:07  1 of 1 START test assert_amount_is_positive........................ [RUN]
14:19:07  1 of 1 FAIL 1 assert_amount_is_positive............................ [FAIL 1 in 0.05s]
14:19:07
14:19:07  Finished running 1 test in 0.23s.
14:19:07
14:19:07  Completed with 1 error and 0 warnings:
14:19:07
14:19:07  Failure in test assert_fct_payments (tests/assert_amount_is_positive.sql)
14:19:07    Got 1 result, configured to fail if != 0
14:19:07
14:19:07    compiled SQL at target/compiled/my_dbt_proj/tests/assert_amount_is_positive.sql
14:19:07
14:19:07  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

A successful test will result in the following.

$ dbt test

14:21:33  Running with dbt=1.0.4
14:21:33  Found 1 model, 1 test, 0 snapshots, 0 analyses, 165 macros, 0 operations, 0 seed files, 2 sources, 0 exposures, 0 metrics
14:21:33
14:21:33  Concurrency: 1 threads (target='dev')
14:21:33
14:21:33  1 of 1 START test assert_amount_is_positive....................... [RUN]
14:21:33  1 of 1 PASS assert_amount_is_positive............................. [PASS in 0.05s]
14:21:33
14:21:33  Finished running 1 test in 0.23s.
14:21:33
14:21:33  Completed successfully
14:21:33
14:21:33  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1

Generic Tests

Generic tests use the four standard schema tests provided by dbt.

  • unique
    whether the column is unique or not.
  • not_null
    whether the column has no null values
  • accepted_values
    Whether the column has the specified values or not.
  • relationships
    Whether referential integrity is satisfied.

For example, in a .yml file as follows.

models/schema.yml
version: 2

models:
  - name: customers
    columns:
      - name: customer_id
        tests:
          - unique
          - not_null

  - name: stg_customers
    columns:
      - name: customer_id
        tests:
          - unique
          - not_null

  - name: stg_orders
    columns:
      - name: order_id
        tests:
          - unique
          - not_null
      - name: status
        tests:
          - accepted_values:
              values: ['placed', 'shipped', 'completed', 'return_pending', 'returned']
      - name: customer_id
        tests:
          - not_null
          - relationships:
              to: ref('stg_customers')
              field: customer_id

Running dbt test will run a test of the contents of the above yml file.

The schema test can also be extended by yourself by using external packages such as dbt-utils.

Custom Generic tests

Custom Generic tests can be created if the standard Generic tests lack functionality.

A custom Generic test is a SQL file with a test block in the tests/generic or macros/ directory.

tests/generic/test_is_even.sql
{% test is_even(model, column_name) %}

with validation as (
    select
        {{ column_name }} as even_field
    from {{ model }}
),
validation_errors as (
    select even_field
    from validation
    where (even_field % 2) = 1
)
select *
from validation_errors

{% endtest %}

Create a .yml file on the same level as the data model file and put the test names in the tests section as follows.

models/schema.yml
version: 2

models:
  - name: users
    columns:
      - name: favorite_number
        tests:
          - is_even

References

https://docs.getdbt.com/docs/build/tests

Ryusei Kakujo

researchgatelinkedingithub

Focusing on data science for mobility

Bench Press 100kg!