Introduction
Singer is an open-source ETL tool that helps you move data from a variety of sources to your desired destinations. This article introduces how to use Singer.
Setting up Singer Tap
A Tap in Singer is a script that pulls data out of a source and produces JSON-formatted data. Taps facilitate the extraction part of the ETL process. This chapter will guide you through choosing, installing, and configuring your first Singer Tap.
Choosing a Tap
Singer supports a wide variety of Taps, each tailored for a specific source, such as databases (Postgres, MySQL, MongoDB), APIs (Salesforce, Stripe), and files (CSV, Excel). You should choose a Tap that corresponds to the data source you wish to extract from.
Installing the Tap
Use the pip command to install the specific Tap you need. For instance, to install the Tap for Salesforce, you would type: pip install tap-salesforce
Configuring the Tap
Once your Tap is installed, you need to configure it to connect to your data source. Configurations generally include access credentials, such as API keys, usernames, passwords, hostnames, and other source-specific settings.
For instance, to configure the Salesforce Tap, you would create a config.json file with the following structure:
{
"client_id": "your_salesforce_client_id",
"client_secret": "your_salesforce_client_secret",
"refresh_token": "your_salesforce_refresh_token",
"start_date": "start_date_in_YYYY-MM-DD_format"
}
Setting up Singer Target
A Target in Singer consumes data from Taps and loads it into a destination, such as a database or a file. Targets facilitate the loading part of the ETL process. This chapter will guide you through choosing, installing, and configuring your first Singer Target.
Choosing a Target
Just like Taps, Singer supports a wide array of Targets, each for a specific destination. You should select a Target that corresponds to your desired data destination.
Installing the Target
Use the pip command to install the specific Target you need. For instance, to install the Target for Postgres, you would type: pip install target-postgres
Configuring the Target
Targets also need to be configured to connect to your desired destination. Just like with Taps, configurations generally include access credentials and other destination-specific settings.
For instance, to configure the Postgres Target, you would create a config.json
file with the following structure:
{
"host": "your_postgres_host",
"port": "your_postgres_port",
"user": "your_postgres_username",
"password": "your_postgres_password",
"dbname": "your_postgres_database"
}
Running Singer Pipeline
Now that you have configured your Tap and Target, you are ready to start running your Singer ETL pipeline. This process involves executing your Tap and Target scripts together so that data is seamlessly extracted from your source and loaded into your destination.
Combining Taps and Targets
Singer allows you to connect any Tap to any Target, giving you a great deal of flexibility in managing your ETL workflows. This connection is achieved by running the Tap and Target scripts concurrently and piping the output of the Tap (the extracted data) to the Target (the loading destination).
Running Pipeline
Once you've installed and configured your Tap and Target, you're ready to run your pipeline. Here's how you do it:
- Open Terminal or Command Line
First, open your terminal or command line interface. Navigate to the directory where your config.json
files for the Tap and Target are located.
- Run the Singer Pipeline
Now, you can run your Singer pipeline with the following command:
$ tap-salesforce -c tap_config.json | target-postgres -c target_config.json
tap-salesforce -c tap_config.json
runs the Salesforce Tap using the configurations intap_config.json
.- The
|
(pipe) symbol takes the output of the Tap script (extracted data) and passes it as the input to the Target script. target-postgres -c target_config.json
runs the Postgres Target using the configurations intarget_config.json
.
When you run this command, the Tap script will start extracting data from Salesforce, and the Target script will load this data into your Postgres database.
References