2023-06-30

How to Implement Separate Development and Production Environments in Snowflake

Separate Dev and Prod Environments in Snowflake

In the world of software development, the concepts of development and production environments are fundamental. A development environment is a space where developers work on new features or fix bugs, testing their changes without affecting the actual live system, or "production" environment. This separation helps ensure the stability and security of the production environment by preventing untested or potentially unstable code from reaching it.

Separate environments are also crucial in the realm of data management and data warehouse platforms like Snowflake. Establishing separate environments for development, testing, and production allows teams to experiment with data, test new code, and implement changes without risking disruption to the live production database. In Snowflake, this typically involves creating different databases or accounts for each environment.

One Database per Environment

One approach to separate environments in Snowflake is to use a single Snowflake account and create one database for each environment. Following this structure, you would create databases like DEV_MY_DATABASE, TEST_MY_DATABASE, and PROD_MY_DATABASE for development, testing, and production environments, respectively.

Creating separate databases for each environment provides a clear boundary between them, reducing the risk of accidental data manipulation in the wrong environment. It also offers an intuitive naming convention, helping teams easily identify and switch between databases.

One Schema per Environment

An alternative approach is to have a single database with a separate schema for each environment. In this scenario, you might have MY_DATABASE with schemas DEV_MY_SCHEMA, TEST_MY_SCHEMA, and PROD_MY_SCHEMA.

Creating separate schemas for each environment within a single database can help keep the data organized and accessible. Each schema acts as a separate namespace within the database, allowing different environments to coexist without interfering with each other.

However, there are significant drawbacks to this approach. There is a risk of accidentally running development code on the production schema, which could result in severe data issues. Also, securing each schema can be tedious and prone to errors. Consequently, this method is generally not recommended unless the team is highly disciplined and experienced in managing data security at the schema level.

Schema Naming Conventions

A well-thought-out naming convention makes it easier for developers, data analysts, and other team members to understand the purpose of a given schema at a glance.

Consider the following prefixes for schema names:

  • LND
    To indicate a landing schema used to hold newly ingested data.
  • RAW
    To suggest a raw staging area where data first arrives before any processing or transformation.
  • INT
    To denote an integration area where raw data is combined and cleaned before analysis.
  • MRT
    To signify a data mart holding conformed and cleaned data ready for reporting.

These prefixes provide clarity about the schema's role in the data lifecycle. While these prefixes are just examples, they represent common practices in data warehousing and can be a good starting point for designing your schema naming convention.

One Account per Environment

One Account per Environment approach might work for some very large systems as it maintains strict isolation. However, this approach has significant drawbacks:

  • Management of Duplicate Data
    When each environment is entirely independent, they each must be managed separately, which increases the database administrator's workload. Managing duplicate data across accounts becomes more challenging, potentially leading to inconsistency and redundancy.

  • Handling Separate Logins
    Each environment would require a separate login and password. This increases the administrative overhead and potentially slows down operations as users need to manage multiple sets of credentials.

  • Challenges in Data Cloning
    One of the severe drawbacks of having separate accounts for each environment is that rapid data cloning across accounts is not possible. This inability makes the task of data management, especially when transferring data from one environment to another, more challenging.

  • Duplicate Storage Issues
    Since data cloning isn't possible across accounts, data must be physically copied and therefore duplicated across environments. This duplication not only takes up valuable storage space but can also lead to data inconsistencies if the data is not correctly synchronized across all environments.

References

https://www.analytics.today/blog/snowflake-accounts-best-practice
https://www.propeldata.com/blog/how-to-set-up-development-and-production-environments-in-snowflake

Ryusei Kakujo

researchgatelinkedingithub

Focusing on data science for mobility

Bench Press 100kg!