What is Meltano
Meltano is software for ELT pipelines by using Singer. It handles the EL (Extract and Load) part of ELT, while the T (Transform) part is covered by dbt to achieve ELT.
Installation of Meltano
Local Installation
- Set up a virtual environment.
$ python -m venv venv
$ source venv/bin/activate
$ pip install --upgrade pip
- Install Meltano.
$ pip install meltano
- Create a project.
$ meltano init my_project
A my_project
directory will be created, and you can find the following files inside:
my_project/
|-- .meltano
|-- meltano.yml
|-- README.md
|-- requirements.txt
|-- output/.gitignore
|-- .gitignore
|-- extract/.gitkeep
|-- load/.gitkeep
|-- transform/.gitkeep
|-- analyze/.gitkeep
|-- notebook/.gitkeep
|-- orchestrate/.gitkeep
Using Docker Containers
- Pull the Docker image.
$ docker pull meltano/meltano
- Create a project.
$ mkdir meltano-docker && cd meltano-docker
$ docker run -v "$(pwd)":/projects \
-w /projects \
meltano/meltano init my_project
Once you run the following command, you can access the Meltano UI at http://localhost:5000
.
$ docker run -v "$(pwd)":/project \
-w /project \
-p 5000:5000 \
meltano/meltano
Upgrading Meltano
To upgrade to the latest version, run the following command:
$ meltano upgrade
Running a Project
The meltano.yml
file looks like this:
version: 1
default_environment: dev
project_id: <UUID>
environments:
- name: dev
- name: staging
- name: prod
You can define the execution by running commands or editing the meltano.yml
file directly.
Environment Configuration
You can retrieve a list of environments with the following command. By default, the dev environment is active.
$ meltano environment list
2022-05-15T23:31:35.498763Z [info ] Environment 'dev' is active
dev
staging
prod
You can change the environment using the following command.
$ export MELTANO_ENVIRONMENT=prod
To add a new environment, execute the following command.
$ meltano environment add <NEW ENVIRONMENT NAME>
Adding Extractors
"Extractors" are plugins that retrieve data from data sources. You can check the list of data sources available for extraction with the following command:
$ meltano discover extractors
To add an Extractor plugin, use the meltano add
command. In this case, we will add tap-slack
.
$ meltano add extractor tap-slack
If the data source is not present in the Extractor list, check if there is a Singer tap for the data source. If not, you can add a custom plugin.
We can confirm that tap-slack
is added to the meltano.yml
.
plugins:
extractors:
- name: tap-slack
variant: mashey
pip_url: git+https://github.com/Mashey/tap-slack.git
You can use the following command to learn how to use the added Extractor:
$ meltano invoke tap-slack --help
Configuring Extractor
To execute the Extractor, you need to configure its settings. Use the following command to see the available configuration options:
$ meltano config tap-slack list
Alternatively, you can check the configuration options from the following link:
Configure the settings:
$ meltano config tap-slack set channeles '[YOUR CHANNEL ID]'
$ meltano config tap-slack set token <YOUR API TOKEN>
$ meltano config tap-slack set start_date 2022-05-02T00:00:00Z
Selecting Data to Extract
Once the configuration is complete, select the data you want to extract. Use the following command to see the list of available data to extract:
$ meltano select tap-slack --list --all
Choose the desired categories to extract. In this case, we will retrieve all data under the "channels" category:
$ meltano select tap-slack channels "*"
Confirm the selected data:
$ meltano select tap-slack --list
Adding Loaders
"Loaders" are plugins used for the "L" (Load) part of ELT. Use the following command to check the list of available Loaders:
$ meltano discover loaders
In this case, we will load the extracted data from Slack into BigQuery. Run the following command to add the BigQuery Loader:
$ meltano add loader target-bigquery
If the destination is not present in the Loader list, check if there is a Singer target for the destination. If not, you can add a custom plugin.
Configuring Loader
Set the credentials_path
. The credentials_path
refers to the full path of the client secret file of the service account with access to BigQuery. For example, ~/mymeltano/my_project/client_secrets.json
.
$ meltano config target-bigquery set credentials_path <FULL/PATH/TO/CREDENTIAL FILE>
Configure other settings:
$ meltano config target-bigquery set location asia-northeast1
$ meltano config target-bigquery set project_id <PROJECT ID>
The meltano.yml
file will look like this:
version: 1
default_environment: dev
project_id: 6e1e3687-a9eb-4c67-8be1-493b37cf4f5d
plugins:
extractors:
- name: tap-slack
variant: mashey
pip_url: git+https://github.com/Mashey/tap-slack.git
loaders:
- name: target-bigquery
variant: adswerve
pip_url: git+https://github.com/adswerve/target-bigquery.git@0.11.3
environments:
- name: dev
config:
plugins:
extractors:
- name: tap-slack
config:
channeles: '[<CHANNEL ID>]'
start_date: '2022-06-01T00:00:00Z'
select:
- channels.*
loaders:
- name: target-bigquery
config:
credentials_path: <FULL/PATH/TO/CREDENTIAL FILE>
location: asia-northeast1
project_id: <Project ID>
- name: staging
- name: prod
Running the EL Pipeline
By executing the following command, the EL pipeline will extract data from Slack and load it into BigQuery:
$ meltano run tap-slack target-bigquery
References