2023-02-16

Building an AWS Environment for Hosting dbt Docs using Terraform

Building an AWS Environment for Hosting dbt Docs using Terraform

In this article, I will introduce the method of constructing an AWS environment for hosting dbt docs using Terraform. The goal is to make the dbt documents uploaded to S3 accessible through CloudFront. Additionally, I will implement Basic authentication on CloudFront using Lambda@Edge.

Terraform Code

You can find the Terraform code in the following GitHub repository:

https://github.com/ryuseikakujo/dbt-docs-terraform/tree/main

Directory Structure

The directory structure is as follows:

.
├── backend.tf
├── cloudfront.tf
├── iam.tf
├── lambda.tf
├── lambda_edge
│   └── index.py
├── lambda_edge_archive
├── provider.tf
├── s3.tf
└── variables.tf

Construction Steps

We will proceed with building an environment for hosting dbt Docs on AWS in the following order:

  1. Write the Lambda@Edge function for Basic authentication.
  2. Build the infrastructure with Terraform.

Writing the Lambda@Edge Function

The code for the Lambda@Edge function, responsible for Basic authentication, is written in `index.py as shown below:

lambda_edge/index.py
"""
Lambda@Edge for authentication
"""

import base64


def authenticate(user, password):
    return user == "admin" and password == "passW0rd"


def handler(event, context):
    request = event["Records"][0]["cf"]["request"]
    headers = request["headers"]

    error_response = {
        "status": "401",
        "statusDescription": "Unauthorized",
        "body": "Authentication Failed",
        "headers": {
            "www-authenticate": [
                {
                    "key": "WWW-Authenticate",
                    "value": 'Basic realm="Basic Authentication"',
                }
            ]
        },
    }

    if "authorization" not in headers:
        return error_response

    try:
        auth_values = headers["authorization"][0]["value"].split(" ")
        auth = base64.b64decode(auth_values[1]).decode().split(":")
        (user, password) = (auth[0], auth[1])
        return request if authenticate(user, password) else error_response
    except Exception:
        return error_response

Building the Infrastructure with Terraform

We will use Terraform to build the necessary infrastructure.

Writing the Source Code

Write the Terraform code to create the infrastructure. You can refer to the source code in the following repository:

https://github.com/ryuseikakujo/dbt-docs-terraform/tree/main

Here, I will explain only the key points.

Multi-Region Setup

The provider.tf file contains the AWS provider configuration. Since we are using Lambda@Edge, the Lambda function must be deployed to the Virginia region (us-east-1). Therefore, we define the provider for the us-east-1 region and reference it during Lambda definition.

provider.tf
provider "aws" {
  region = "ap-northeast-1"

  default_tags {
    tags = {
      Env       = "dev"
      App       = "my-app"
      ManagedBy = "Terraform"
    }
  }
}

provider "aws" {
  alias  = "virginia"
  region = "us-east-1"

  default_tags {
    tags = {
      Env       = "dev"
      App       = "my-app"
      ManagedBy = "Terraform"
    }
  }
}
lambda.tf
resource "aws_lambda_function" "cloudfront_auth" {
  provider         = aws.virginia
  .
  .
  .
}

https://io.traffine.com/en/articles/terraform-aws-multi-regions

Lambda Zipping

Using the archive_file data source in lambda.tf, you can generate a Zip file of the specified directory.

lambda.tf
data "archive_file" "function_source" {
  type        = "zip"
  source_dir  = "lambda_edge"
  output_path = "lambda_edge_archive/function.zip"
}

The generated Zip file is referenced during Lambda definition.

lambda.tf
resource "aws_lambda_function" "cloudfront_auth" {
  .
  .
  .
  filename         = data.archive_file.function_source.output_path
  source_code_hash = data.archive_file.function_source.output_base64sha256
}

Lambda Versioning

CloudFront requires specifying the version of the Lambda function. For information on Lambda versioning, please refer to the following article:

https://io.traffine.com/en/articles/version-and-alias-in-lambda

By setting publish=true during Lambda definition, versioning will be enabled.

lambda
resource "aws_lambda_function" "cloudfront_auth" {
  .
  .
  .
  publish          = true
}

CloudFront references the specified Lambda version.

cloudfront.tf
resource "aws_cloudfront_distribution" "main" {
  .
  .
  .
    lambda_function_association {
      .
      .
      lambda_arn   = aws_lambda_function.cloudfront_auth.qualified_arn
    }
}

Terraform Apply

Execute terraform apply to build the environment.

Storing dbt Docs in S3

In the dbt project directory, upload the dbt documents to S3 using the following command:

bash
$ dbt docs generate
$ aws s3 cp target/manifest.json s3://<s3 bucket name created by terraform>/
$ aws s3 cp target/run_results.json s3://<s3 bucket name created by terraform>/
$ aws s3 cp target/index.html s3://<s3 bucket name created by terraform>/
$ aws s3 cp target/catalog.json s3://<s3 bucket name created by terraform>/

Document Verification

Accessing the CloudFront distribution domain will show that Basic authentication is enforced.

CloudFront

After authentication, you will be able to view the documents.

Additionally, accessing S3 directly will result in denial.

S3

Ryusei Kakujo

researchgatelinkedingithub

Focusing on data science for mobility

Bench Press 100kg!