2022-06-05

Meltano

What is Meltano

Meltano is software for ELT pipelines by using Singer. It handles the EL (Extract and Load) part of ELT, while the T (Transform) part is covered by dbt to achieve ELT.

Installation of Meltano

Local Installation

  1. Set up a virtual environment.
bash
$ python -m venv venv
$ source venv/bin/activate
$ pip install --upgrade pip
  1. Install Meltano.
bash
$ pip install meltano
  1. Create a project.
bash
$ meltano init my_project

A my_project directory will be created, and you can find the following files inside:

my_project/
   |-- .meltano
   |-- meltano.yml
   |-- README.md
   |-- requirements.txt
   |-- output/.gitignore
   |-- .gitignore
   |-- extract/.gitkeep
   |-- load/.gitkeep
   |-- transform/.gitkeep
   |-- analyze/.gitkeep
   |-- notebook/.gitkeep
   |-- orchestrate/.gitkeep

Using Docker Containers

  1. Pull the Docker image.
bash
$ docker pull meltano/meltano
  1. Create a project.
bash
$ mkdir meltano-docker && cd meltano-docker
$ docker run -v "$(pwd)":/projects \
             -w /projects \
             meltano/meltano init my_project

Once you run the following command, you can access the Meltano UI at http://localhost:5000.

bash
$ docker run -v "$(pwd)":/project \
             -w /project \
             -p 5000:5000 \
             meltano/meltano

Upgrading Meltano

To upgrade to the latest version, run the following command:

bash
$ meltano upgrade

Running a Project

The meltano.yml file looks like this:

meltano.yml
version: 1
default_environment: dev
project_id: <UUID>
environments:
- name: dev
- name: staging
- name: prod

You can define the execution by running commands or editing the meltano.yml file directly.

Environment Configuration

You can retrieve a list of environments with the following command. By default, the dev environment is active.

bash
$ meltano environment list

2022-05-15T23:31:35.498763Z [info     ] Environment 'dev' is active
dev
staging
prod

You can change the environment using the following command.

bash
$ export MELTANO_ENVIRONMENT=prod

To add a new environment, execute the following command.

bash
$ meltano environment add <NEW ENVIRONMENT NAME>

Adding Extractors

"Extractors" are plugins that retrieve data from data sources. You can check the list of data sources available for extraction with the following command:

bash
$ meltano discover extractors

To add an Extractor plugin, use the meltano add command. In this case, we will add tap-slack.

bash
$ meltano add extractor tap-slack

If the data source is not present in the Extractor list, check if there is a Singer tap for the data source. If not, you can add a custom plugin.

https://hub.meltano.com/singer
https://docs.meltano.com/concepts/plugins#custom-plugins

We can confirm that tap-slack is added to the meltano.yml.

meltano.yml
plugins:
  extractors:
    - name: tap-slack
      variant: mashey
      pip_url: git+https://github.com/Mashey/tap-slack.git

You can use the following command to learn how to use the added Extractor:

bash
$ meltano invoke tap-slack --help

Configuring Extractor

To execute the Extractor, you need to configure its settings. Use the following command to see the available configuration options:

bash
$ meltano config tap-slack list

Alternatively, you can check the configuration options from the following link:

https://hub.meltano.com/extractors/tap-slack

Configure the settings:

bash
$ meltano config tap-slack set channeles '[YOUR CHANNEL ID]'
$ meltano config tap-slack set token <YOUR API TOKEN>
$ meltano config tap-slack set start_date 2022-05-02T00:00:00Z

Selecting Data to Extract

Once the configuration is complete, select the data you want to extract. Use the following command to see the list of available data to extract:

bash
$ meltano select tap-slack --list --all

Choose the desired categories to extract. In this case, we will retrieve all data under the "channels" category:

bash
$ meltano select tap-slack channels "*"

Confirm the selected data:

bash
$ meltano select tap-slack --list

Adding Loaders

"Loaders" are plugins used for the "L" (Load) part of ELT. Use the following command to check the list of available Loaders:

bash
$ meltano discover loaders

In this case, we will load the extracted data from Slack into BigQuery. Run the following command to add the BigQuery Loader:

https://hub.meltano.com/loaders/target-bigquery

bash
$ meltano add loader target-bigquery

If the destination is not present in the Loader list, check if there is a Singer target for the destination. If not, you can add a custom plugin.

https://www.singer.io/#targets
https://docs.meltano.com/concepts/plugins#custom-plugins

Configuring Loader

Set the credentials_path. The credentials_path refers to the full path of the client secret file of the service account with access to BigQuery. For example, ~/mymeltano/my_project/client_secrets.json.

https://github.com/adswerve/target-bigquery#step-1-activate-the-google-bigquery-api

bash
$ meltano config target-bigquery set credentials_path <FULL/PATH/TO/CREDENTIAL FILE>

Configure other settings:

bash
$ meltano config target-bigquery set location asia-northeast1
$ meltano config target-bigquery set project_id <PROJECT ID>

The meltano.yml file will look like this:

meltano.yml
version: 1
default_environment: dev
project_id: 6e1e3687-a9eb-4c67-8be1-493b37cf4f5d
plugins:
  extractors:
  - name: tap-slack
    variant: mashey
    pip_url: git+https://github.com/Mashey/tap-slack.git
  loaders:
  - name: target-bigquery
    variant: adswerve
    pip_url: git+https://github.com/adswerve/target-bigquery.git@0.11.3
environments:
- name: dev
  config:
    plugins:
      extractors:
      - name: tap-slack
        config:
          channeles: '[<CHANNEL ID>]'
          start_date: '2022-06-01T00:00:00Z'
        select:
        - channels.*
      loaders:
      - name: target-bigquery
        config:
          credentials_path: <FULL/PATH/TO/CREDENTIAL FILE>
          location: asia-northeast1
          project_id: <Project ID>
- name: staging
- name: prod

Running the EL Pipeline

By executing the following command, the EL pipeline will extract data from Slack and load it into BigQuery:

bash
$ meltano run tap-slack target-bigquery

References

https://docs.meltano.com/getting-started

Ryusei Kakujo

researchgatelinkedingithub

Focusing on data science for mobility

Bench Press 100kg!