2022-06-06

Transform in Meltano

Meltano Transform

In Meltano, the T (Transform) part of ELT is complemented by dbt. In this article, I will introduce how to perform transforms using Meltano.

Installing dbt

To check the available dbt installations, use the following command:

bash
$ meltano discover transformers

dbt-bigquery
dbt-postgres
dbt-redshift
dbt-snowflake
dbt

While it is possible to install dbt, it is recommended to install specific dbt plugins such as dbt-snowflake. For this demonstration, we will be using dbt-bigquery for executing the transform. Install dbt-bigquery with the following command:

bash
$ meltano add transformer dbt-bigquery

Verify the configuration for dbt-bigquery:

bash
$ meltano config dbt-bigquery list

Alternatively, you can refer to the official documentation:

https://hub.meltano.com/transformers/dbt-bigquery

Configure the settings for dbt-bigquery, including project, dataset, auth_method, and keyfile, using the following commands. (You can also directly edit the meltano.yml file.)

bash
$ meltano config dbt-bigquery set project "YOUR_PROJECT_ID"
$ meltano config dbt-bigquery set dataset "YOUR_DATASET"
$ meltano config dbt-bigquery set auth_method "service-account"
$ meltano config dbt-bigquery set keyfile "PATH/TO/YOUR/KEYFILE"

Your meltano.yml file should be like:

meltano.yml
.
.
.
environments:
- name: dev
  config:
    plugins:
      transformers:
      - name: dbt-bigquery
        config:
          project: YOUR_PROJECT_ID
          dataset: YOUR_DATASET
          auth_method: service-account
          keyfile: PATH/TO/YOUR/KEYFILE

Coding dbt Processing

The /transform directory contains the following files:

bash
$ ls /transform

dbt_project.yml
profile
models

Write your SQL statements in the /transform/models/ directory and optionally edit the /transform/dbt_project.yml file.

Running dbt

You can run dbt in two ways:

  • Execute the run command as part of the ELT pipeline, such as meltano run tap-github target-bigquery dbt-bigquery:run.
  • Execute the invoke command as a standalone operation, like meltano invoke dbt-bigquery run.

Additionally, you can create custom commands by editing the meltano.yml file as follows:

meltano.yml
.
.
.
plugins:
  transformers:
    - name: dbt-bigquery
      commands:
        my_models:
          args: run --select +YOUR_MODEL_NAME
          description: Run dbt, selecting model `YOUR_MODEL_NAME` and all upstream models.
bash
$ meltano run tap-github target-bigquery dbt-bigquery:my_models
$ meltano invoke dbt-bigquery:my_models

Referencing a dbt Project from an External Repository

In addition to writing dbt code directly in the /transform/ directory, you can separate the Meltano project and the dbt project by creating a package.yml file in the /transform/ directory. By including the following configuration, you can reference a dbt project from an external repository:

transform/package.yml
packages:
  - git: https://github.com/your_repo/your-dbt-project.git
    revision: 1.0.0

References

https://docs.meltano.com/getting-started
https://docs.meltano.com/guide/transformation

Ryusei Kakujo

researchgatelinkedingithub

Focusing on data science for mobility

Bench Press 100kg!