What is materialization
Materialization is the definition of data when creating a dbt model. dbt allows you to choose between the following four types of materialization:
- View
- Table
- Ephemeral
- Incremental
When choosing a materialization, it is important to consider downstream effects such as query execution time and macro functions.
View
The View model is rebuilt via a CREATE
statement at each run.In dbt, if you do not specify materialization, it defaults to View. The View model has the following characteristics:
- Fast model building because there is no data movement.
- Views with large transformations or views stacked on top of other views can be slow to query.
Table
The Table model is rebuilt at each run via CREATE TABLE as
statement.
- Generally faster to query than a View
- Cost and runtime issues arise if large amounts of data are re-populated each time.
Ephemeral
The Ephemeral model is treated as a CTE (common table expression) and interpolates to dependent models. In other words, Ephemeral model is not generated as a data model by itself, but is recognized as a reference from other data models.
Incremental
The Incremental model is generated as a Table for the first time and incrementally updated thereafter without rebuilding; the Incremental model requires the logic to determine the difference from the previous run and the is_incremental()
macro to be set.
Configure materialization
dbt models are materialized as View by default. The model is materialized with the materialized
configuration parameter as follows.
# The following dbt_project.yml configures a project that looks like this:
# .
# └── models
# ├── csvs
# │ ├── employees.sql
# │ └── goals.sql
# └── events
# ├── stg_event_log.sql
# └── stg_event_sessions.sql
name: my_project
version: 1.0.0
config-version: 2
models:
my_project:
events:
# materialize all models in models/events as tables
+materialized: table
csvs:
# this is redundant, and does not need to be set
+materialized: view
You can also set up materialization directly in the model's SQL file.
{{ config(
materialized='table',
)
}}
SELECT *
FROM ...
For Incremental
Since the Incremental model is an incremental update, it is necessary to incorporate logic to determine the difference. In the following example, the logic is to incrementally update only records with dates newer than the latest date in the created_at
column of the current data model.
{{
config(
materialized='incremental'
)
}}
SELECT
xxx,
yyy,
zzz,
created_at
FROM `my_table`
{% if is_incremental() %}
where created_at > (select max(created_at)) from {{ this }})
{% endif %}
On the first run, is_incremental()
is false and the query is executed without a WHERE
clause; on the second and subsequent runs, is_incremental()
is true and incremental updates are performed.
References