2022-12-30

Materializations

What is materialization

Materialization is the definition of data when creating a dbt model. dbt allows you to choose between the following four types of materialization:

  • View
  • Table
  • Ephemeral
  • Incremental

When choosing a materialization, it is important to consider downstream effects such as query execution time and macro functions.

View

The View model is rebuilt via a CREATE statement at each run.In dbt, if you do not specify materialization, it defaults to View. The View model has the following characteristics:

  • Fast model building because there is no data movement.
  • Views with large transformations or views stacked on top of other views can be slow to query.

Table

The Table model is rebuilt at each run via CREATE TABLE as statement.

  • Generally faster to query than a View
  • Cost and runtime issues arise if large amounts of data are re-populated each time.

Ephemeral

The Ephemeral model is treated as a CTE (common table expression) and interpolates to dependent models. In other words, Ephemeral model is not generated as a data model by itself, but is recognized as a reference from other data models.

Incremental

The Incremental model is generated as a Table for the first time and incrementally updated thereafter without rebuilding; the Incremental model requires the logic to determine the difference from the previous run and the is_incremental() macro to be set.

Configure materialization

dbt models are materialized as View by default. The model is materialized with the materialized configuration parameter as follows.

dbt_project.yml
# The following dbt_project.yml configures a project that looks like this:
# .
# └── models
#     ├── csvs
#     │   ├── employees.sql
#     │   └── goals.sql
#     └── events
#         ├── stg_event_log.sql
#         └── stg_event_sessions.sql

name: my_project
version: 1.0.0
config-version: 2

models:
  my_project:
    events:
      # materialize all models in models/events as tables
      +materialized: table
    csvs:
      # this is redundant, and does not need to be set
      +materialized: view

You can also set up materialization directly in the model's SQL file.

hoge.sql
{{ config(
    materialized='table',
  )
}}

SELECT *
FROM ...

For Incremental

Since the Incremental model is an incremental update, it is necessary to incorporate logic to determine the difference. In the following example, the logic is to incrementally update only records with dates newer than the latest date in the created_at column of the current data model.

hoge.sql
{{
    config(
        materialized='incremental'
    )
}}

SELECT
  xxx,
  yyy,
  zzz,
  created_at
FROM `my_table`

{% if is_incremental() %}
  where created_at > (select max(created_at)) from {{ this }})
{% endif %}

On the first run, is_incremental() is false and the query is executed without a WHERE clause; on the second and subsequent runs, is_incremental() is true and incremental updates are performed.

References

https://docs.getdbt.com/docs/build/materializations
https://docs.getdbt.com/terms/table
https://docs.getdbt.com/terms/view

Ryusei Kakujo

researchgatelinkedingithub

Focusing on data science for mobility

Bench Press 100kg!