2022-11-09

How to Use Singer

Introduction

Singer is an open-source ETL tool that helps you move data from a variety of sources to your desired destinations. This article introduces how to use Singer.

Setting up Singer Tap

A Tap in Singer is a script that pulls data out of a source and produces JSON-formatted data. Taps facilitate the extraction part of the ETL process. This chapter will guide you through choosing, installing, and configuring your first Singer Tap.

Choosing a Tap

Singer supports a wide variety of Taps, each tailored for a specific source, such as databases (Postgres, MySQL, MongoDB), APIs (Salesforce, Stripe), and files (CSV, Excel). You should choose a Tap that corresponds to the data source you wish to extract from.

Installing the Tap

Use the pip command to install the specific Tap you need. For instance, to install the Tap for Salesforce, you would type: pip install tap-salesforce

Configuring the Tap

Once your Tap is installed, you need to configure it to connect to your data source. Configurations generally include access credentials, such as API keys, usernames, passwords, hostnames, and other source-specific settings.

For instance, to configure the Salesforce Tap, you would create a config.json file with the following structure:

config.json
{
  "client_id": "your_salesforce_client_id",
  "client_secret": "your_salesforce_client_secret",
  "refresh_token": "your_salesforce_refresh_token",
  "start_date": "start_date_in_YYYY-MM-DD_format"
}

Setting up Singer Target

A Target in Singer consumes data from Taps and loads it into a destination, such as a database or a file. Targets facilitate the loading part of the ETL process. This chapter will guide you through choosing, installing, and configuring your first Singer Target.

Choosing a Target

Just like Taps, Singer supports a wide array of Targets, each for a specific destination. You should select a Target that corresponds to your desired data destination.

Installing the Target

Use the pip command to install the specific Target you need. For instance, to install the Target for Postgres, you would type: pip install target-postgres

Configuring the Target

Targets also need to be configured to connect to your desired destination. Just like with Taps, configurations generally include access credentials and other destination-specific settings.

For instance, to configure the Postgres Target, you would create a config.json file with the following structure:

config.json
{
  "host": "your_postgres_host",
  "port": "your_postgres_port",
  "user": "your_postgres_username",
  "password": "your_postgres_password",
  "dbname": "your_postgres_database"
}

Running Singer Pipeline

Now that you have configured your Tap and Target, you are ready to start running your Singer ETL pipeline. This process involves executing your Tap and Target scripts together so that data is seamlessly extracted from your source and loaded into your destination.

Combining Taps and Targets

Singer allows you to connect any Tap to any Target, giving you a great deal of flexibility in managing your ETL workflows. This connection is achieved by running the Tap and Target scripts concurrently and piping the output of the Tap (the extracted data) to the Target (the loading destination).

Running Pipeline

Once you've installed and configured your Tap and Target, you're ready to run your pipeline. Here's how you do it:

  1. Open Terminal or Command Line

First, open your terminal or command line interface. Navigate to the directory where your config.json files for the Tap and Target are located.

  1. Run the Singer Pipeline

Now, you can run your Singer pipeline with the following command:

bash
$ tap-salesforce -c tap_config.json | target-postgres -c target_config.json
  • tap-salesforce -c tap_config.json runs the Salesforce Tap using the configurations in tap_config.json.
  • The | (pipe) symbol takes the output of the Tap script (extracted data) and passes it as the input to the Target script.
  • target-postgres -c target_config.json runs the Postgres Target using the configurations in target_config.json.

When you run this command, the Tap script will start extracting data from Salesforce, and the Target script will load this data into your Postgres database.

References

https://www.singer.io/
https://github.com/singer-io/getting-started

Ryusei Kakujo

researchgatelinkedingithub

Focusing on data science for mobility

Bench Press 100kg!