2022-06-08

Meltano Commands

Meltano Commands

In this article, I will introduce the main commands of Meltano.

meltano add

The meltano add command is used to add plugins to a project. It performs the following steps:

  1. References the plugin definitions available in Meltano Hub.
  2. Adds the plugin to the plugins section of the meltano.yml file.
  3. Saves the plugin definition in the ./plugins directory.
  4. Installs the plugin using the meltano install command.

Unlike the meltano install command, the meltano add command continuously installs plugins to avoid missing dependencies. For example, meltano add transformer will request installing the dbt plugin first.

Example of meltano add

bash
$ meltano add extractor tap-github
$ meltano add loader target-bigquery
$ meltano add transformer dbt-redshift

meltano elt

The meltano elt command is used to execute an ELT (Extract, Load, Transform) pipeline. Each ELT pipeline has a state ID that stores and references the state of incremental replication. This allows you to resume the pipeline from where it left off if you run subsequent pipelines with the same combination of Extractors and Loaders. If a stable identifier is not provided using the --state-id flag or the MELTANO_STATE_ID environment variable, data extraction always starts from scratch, and a one-time state ID is automatically generated using the current date and time. The output generated by this command is logged in the .meltano/logs/elt/{state_id}/{run_id}/elt.log file. The run_id is a automatically generated UUID.

Parameters

  • --state-id: Identifies the related ELT when saving or retrieving the state of incremental replication.
  • --transform:run: Executes the Transform step.
  • --transform:skip: Skips the Transform step (default).
  • --transform:only: Skips the EL step and executes the Transform step only.
  • --full-refresh: Executes a full refresh, ignoring previous state.
  • --force: Forces a new run.
  • --state: Manually provides a state file to the Extractor instead of searching for state based on the State ID.
  • --dump: Dumps the content of pipeline-specific generated files instead of executing the pipeline.

Example of meltano elt

bash
$ meltano elt tap-gitlab target-postgres --transform=run --state-id=gitlab-to-postgres
$ meltano elt tap-gitlab target-postgres --state-id=gitlab-to-postgres --full-refresh
$ meltano elt tap-gitlab target-postgres --state extract/tap-gitlab.state.json
$ meltano elt tap-gitlab target-postgres --state-id=gitlab-to-postgres --dump=state > extract/tap-gitlab.state.json

meltano install

The meltano install command is used to install project dependencies based on the meltano.yml file. If specific plugins are specified as arguments, only those plugins will be installed. Additionally, running meltano install multiple times will upgrade the plugins to their latest versions if they already exist.

Parameters

  • --include-related: Automatically installs Transformers related to the installed Extractors.
  • --clean: Completely uninstalls the plugins and reinstalls them.
  • --parallelism: Controls the number of plugins installed in parallel (default is the number of CPUs on the machine). Setting --parallelism=1 installs plugins one at a time.

Example of meltano install

bash
$ meltano install
$ meltano install extractors
$ meltano install extractor tap-gitlab
$ meltano install extractors tap-gitlab tap-adwords
$ meltano install --include-related
$ meltano install --parallelism=16
$ meltano install --clean

meltano run

The meltano run command is used to execute a series of command blocks sequentially. For example, when you specify meltano run foo_tap bar_target hoge_target, they will be executed in order from left to right. If any block fails, the entire execution is aborted.

If an active environment is defined, a State ID is automatically generated for each Extractor/Loader pair, which is used to store and retrieve the state of incremental replication in the system database. Therefore, when running the same Extractor and Loader combination next time, you can start from where the previous run left off. The generated ID has the format <environment_name>:<tap_name>-to-<target_name>. If no environment is active, meltano run does not generate a State ID and does not track the state.

Parameters

  • --dry-run: Analyzes, validates, and describes the call without executing it.
  • --no-state-update: Disables the saving of state for the call.
  • --full-refresh: Executes a full refresh, ignoring previous state.
  • --force: Forces the execution of the job even if there is a conflicting job with the same generated ID.

Example of meltano run

# Run the pipeline in series and
# The auto-generated ID for the first EL pair is 'dev:tap-gitlab-to-target-postgres'
# The auto-generated ID for the second EL pair is 'dev:tap-salesforce-to-target-mysql'
$ meltano --environment=prod run tap-gitlab target-postgres tap-salesforce target-mysql

# Run pipeline in series, full refresh run
$ meltano --environment=stg run --full-refresh tap-gitlab target-postgres tap-salesforce target-mysql ...

# Run pipelines in series, forcing each one to run if a strong job is found
$ meltano --environment=dev run --force tap-gitlab target-postgres tap-salesforce target-mysql ...

meltano job

The meltano job command is used to define the order of one or multiple tasks. By passing the job name as an argument to meltano run, you can execute the specified job.

Example of meltano job

# Add job "simple-demo" with two tasks
# Task 1: tap-gitlab target-mysql dbt-postgres:run
# Task 2: tap-gitlab target-csv
$ meltano job add simple-demo --tasks "[tap-gitlab target-postgres dbt-postgres:run, tap-gitlab target-csv]"

# list job "simple-demo"
$ meltano job list simple-demo --format=json

# Run job "simple-demo
$ meltano run simple-demo

# Run job "simple-demo" and other EL pairs
$ meltano run simple-demo tap-mysql target-bigquery

# Delete job "simple-demo"
$ meltano job remove simple-demo

meltano schedule

The meltano schedule command is used to define pipelines for ELT or jobs that are executed periodically. It requires the installation of the Orchestrator plugin.

Example of meltano schedule

# Add schedule "gitlab-sync" to run job "gitlab-to-mysql" daily
$ meltano schedule add gitlab-sync --job gitlab-to-mysql --interval "@daily"

# Dry run schedule "gitlab-sync" $ meltano schedule run gitlab-sync --job gitlab-to-mysql --interval "@daily
$ meltano schedule run gitlab-sync --dry-run

# Change schedule "gitlab-sync" job to "gitlab-to-postgres" $ meltano schedule set gitlab-sync --dry-run
$ meltano schedule set gitlab-sync --job gitlab-to-postgres

# Change schedule "gitlab-sync" to weekly
$ meltano schedule set gitlab-sync --interval "@weekly"

References

https://docs.meltano.com/reference/command-line-interface

Ryusei Kakujo

researchgatelinkedingithub

Focusing on data science for mobility

Bench Press 100kg!