2022-11-09

Singer

What is Singer

Singer is an open-source ETL tool designed with simplicity and versatility in mind. It's a product of Stitch Data, a company known for its data integration solutions. The tool adopts a unique modular approach to data extraction and loading, offering distinct components known as Taps and Targets. Taps pull data from a source while Targets receive and load data into a desired destination.

Components of Singer ETL

Singer employs a highly modular and efficient design in its approach to ETL, primarily divided into two components: Taps and Targets.

Taps in Singer

Taps in Singer serve as data extractors. They are responsible for connecting to the data source, extracting the necessary data, and formatting it into a standardized format compatible with Singer's schema. Taps can connect to a wide array of sources, such as databases, web APIs, or even files on a local filesystem.

Each Tap is designed for a specific data source and knows how to interact with it. For instance, there are different Taps for connecting to a PostgreSQL database, a MySQL database, or a Salesforce API. Singer's community has developed a rich ecosystem of Taps, covering a wide variety of data sources.

The primary output of a Tap is a stream of data records, but it also includes schema messages that describe the data structure and state messages that help track the extraction process's progress.

Targets in Singer

While Taps extract data, Targets are responsible for loading the extracted data into a chosen destination. Targets can be any kind of data storage or data processing system. The destination could be a data warehouse like Google BigQuery, Amazon Redshift, or a simple CSV file.

Like Taps, Targets are also designed for specific destinations, each understanding how to connect to and load data into its associated platform. They receive the output from a Tap (consisting of records, schema messages, and state messages), then load the data records into the destination and use the schema messages to ensure the data fits the destination's structure.

Advantages of Singer

  • Excellent core specification of Singer
    The specification of Singer is well-documented, well-maintained, and thoroughly documented. There are many taps available that adhere to the specification. The use of JSON for transferring data between taps and targets allows for avoiding incompatible output formats and enables the use of different taps interchangeably.

  • Abundance of Singer taps
    There are around 200 taps developed based on the Singer specification. Most of these taps are developed and submitted by developers who need specific data sources and have undergone review by Singer.

Disadvantages of Singer

  • Challenges in tap standardization
    Due to the unique nature of tap development projects, there is often a lack of consistency in supported features and use cases among taps. It may be necessary to manage custom forks in such cases.

  • Lack of maintenance
    After Stitch was acquired by Talend, the Singer project was left unattended, which could lead to API changes or the continued use of deprecated taps. Additionally, the dependencies of taps can conflict with each other, and the lack of versioning consistency requires setting up a standalone environment to avoid conflicts when running multiple taps simultaneously.

References

https://www.singer.io/
https://towardsdatascience.com/should-you-build-on-singer-taps-16bb1e45ef09
https://blog.panoply.io/etl-with-singer-a-tutorial
https://github.com/singer-io/getting-started

Ryusei Kakujo

researchgatelinkedingithub

Focusing on data science for mobility

Bench Press 100kg!