2022-10-09

How to apply SQL scripts in Terraform

Introduction

When building a database such as RDS with Terraform, you may want to execute SQL scripts such as creating tables with the CREATE command or adding data with the INSERT command. This article describes how to execute SQL commands in Terraform.

SQL statement you want to apply

For example, suppose you want to apply the following ddl.sql to a DB resource in order to build a database to store purchasing data.

ddl.sql
CREATE TABLE IF NOT EXISTS items
(
  id char(36) NOT NULL,
  name varchar(36) NOT NULL,
  description varchar(255) NOT NULL,
  price int NOT NULL,
  PRIMARY KEY (id)
)
ENGINE=InnoDB  DEFAULT CHARSET=utf8;

CREATE TABLE IF NOT EXISTS sales
(
  id int UNSIGNED NOT NULL AUTO_INCREMENT,
  item_id char(36) NOT NULL,
  amount int UNIQUE NOT NULL,
  purchased_at bigint(14) UNSIGNED NOT NULL
  PRIMARY KEY (id)
)
ENGINE=InnoDB  DEFAULT CHARSET=utf8;

ALTER TABLE items ADD FOREIGN KEY (item_id) REFERENCES items (id) ON DELETE RESTRICT ON UPDATE RESTRICT;

Also, suppose you want to add sample data like data.sql after creating the table.

data.sql
INSERT INTO items (id, name, description, price)
VALUES
	(
    '5b0338ac-b12e-42cb-b2c5-a5a4c6533403',
    'Pencil',
    'This is a pencil.',
    12
  ),
	(
    '839156ab-6320-4c4b-a38a-43d34aa57aaa',
    'Eraser',
    'This is an eraser.',
    11
  );

INSERT INTO sales (item_id, amount, purchased_at)
VALUES
	(
    '5b0338ac-b12e-42cb-b2c5-a5a4c6533403',
    5,
    1661089540000
  ),
	(
    '839156ab-6320-4c4b-a38a-43d34aa57aaa',
    1,
    1661089540000
  );

null_resource

In this article, I will introduce Terraform code to build Aurora Serverless v1 as a database and issue SQL from it via the Data API. The directory structure is as follows.

.
├── rds.tf
├── ddl.sql
└── data.sql

To execute SQL in Terraform, use the resource null_resource and specify the SQL file, execution command, and environment variables. In this case, SQL is executed via the Data API, so the code is as follows.

rds.tf
resource "aws_rds_cluster" "main" {
  .
  .
  .
}

resource "null_resource" "db_ddl" {
  triggers = {
    file = filesha1("ddl.sql")
  }
  provisioner "local-exec" {
    command = <<-EOF
			while read line; do
				echo "$line"
				aws rds-data execute-statement --resource-arn "$DB_ARN" --database  "$DB_NAME" --secret-arn "$SECRET_ARN" --sql "$line"
			done  < <(awk 'BEGIN{RS=";\n"}{gsub(/\n/,""); if(NF>0) {print $0";"}}' ddl.sql)
			EOF
    environment = {
      DB_ARN     = aws_rds_cluster.main.arn
      DB_NAME    = aws_rds_cluster.main.database_name
      SECRET_ARN = aws_secretsmanager_secret.rds.arn
    }
    interpreter = ["bash", "-c"]
  }
}

resource "null_resource" "sample_data" {
  triggers = {
    file = filesha1("data.sql")
  }
  provisioner "local-exec" {
    command = <<-EOF
			while read line; do
				echo "$line"
				aws rds-data execute-statement --resource-arn "$DB_ARN" --database  "$DB_NAME" --secret-arn "$SECRET_ARN" --sql "$line"
			done  < <(awk 'BEGIN{RS=";\n"}{gsub(/\n/,""); if(NF>0) {print $0";"}}' data.sql)
			EOF
    environment = {
      DB_ARN     = aws_rds_cluster.main.arn
      DB_NAME    = aws_rds_cluster.main.database_name
      SECRET_ARN = aws_secretsmanager_secret.rds.arn
    }
    interpreter = ["bash", "-c"]
  }

  depends_on = [
    null_resource.db_ddl
  ]
}

If you want to connect to MYSQL and issue SQL instead of via the Data API, the code would be as follows

resource "null_resource" "db_ddl" {
  triggers = {
    file = filesha1("ddl.sql")
  }
- provisioner "local-exec" {
-   command = <<-EOF
-		while read line; do
-			echo "$line"
-			aws rds-data execute-statement --resource-arn "$DB_ARN" --database  "$DB_NAME" --secret-arn "$SECRET_ARN" - -sql "$line"
-		done  < <(awk 'BEGIN{RS=";\n"}{gsub(/\n/,""); if(NF>0) {print $0";"}}' ddl.sql)
-		EOF
-   environment = {
-     DB_ARN     = aws_rds_cluster.main.arn
-     DB_NAME    = aws_rds_cluster.main.database_name
-     SECRET_ARN = aws_secretsmanager_secret.rds.arn
-   }
-   interpreter = ["bash", "-c"]
- }
+ provisioner "local-exec" {
+   command = "mysql --host=${aws_rds_cluster.main.endpoint} --port=3306 --user=${aws_rds_cluster.main.master_username} --password=${aws_rds_cluster.main.master_password} --database=${aws_rds_cluster.main.database_name} < ${file(${ddl.sql)}"
+ }
}

Also, if you want to run data.sql only in the dev environment, add count as follows

resource "null_resource" "sample_data" {
+  count = terraform.workspace == "dev" ? 1 : 0

  triggers = {
    file = filesha1("data.sql")
  }
  provisioner "local-exec" {
    command = <<-EOF
			while read line; do
				echo "$line"
				aws rds-data execute-statement --resource-arn "$DB_ARN" --database  "$DB_NAME" --secret-arn "$SECRET_ARN" --sql "$line"
			done  < <(awk 'BEGIN{RS=";\n"}{gsub(/\n/,""); if(NF>0) {print $0";"}}' data.sql)
			EOF
    environment = {
      DB_ARN     = aws_rds_cluster.main.arn
      DB_NAME    = aws_rds_cluster.main.database_name
      SECRET_ARN = aws_secretsmanager_secret.rds.arn
    }
    interpreter = ["bash", "-c"]
  }

  depends_on = [
    null_resource.db_ddl
  ]
}

Completed code

The final code will look like this

resource "aws_rds_cluster" "main" {
  cluster_identifier                  = local.aurora_cluster_identifier
  database_name                       = local.aurora_database_name
  master_username                     = local.aurora_master_username
  master_password                     = local.aurora_master_password
  db_cluster_parameter_group_name     = aws_rds_cluster_parameter_group.main.name
  db_subnet_group_name                = data.aws_db_subnet_group.main.name
  deletion_protection                 = true
  skip_final_snapshot                 = false
  engine                              = "aurora-mysql"
  engine_version                      = "5.7.mysql_aurora.2.07.1"
  engine_mode                         = "serverless"
  backup_retention_period             = 7
  storage_encrypted                   = true
  iam_database_authentication_enabled = false
  enable_http_endpoint                = true

  scaling_configuration {
    min_capacity = 2
    max_capacity = 256
    auto_pause   = true
  }

  lifecycle {
    ignore_changes = [
      availability_zones,
      engine_version,
      master_username,
      master_password,
    ]
  }
}

resource "null_resource" "db_ddl" {
  triggers = {
    file = filesha1("ddl.sql")
  }
  provisioner "local-exec" {
    command = <<-EOF
			while read line; do
				echo "$line"
				aws rds-data execute-statement --resource-arn "$DB_ARN" --database  "$DB_NAME" --secret-arn "$SECRET_ARN" --sql "$line"
			done  < <(awk 'BEGIN{RS=";\n"}{gsub(/\n/,""); if(NF>0) {print $0";"}}' ddl.sql)
			EOF
    environment = {
      DB_ARN     = aws_rds_cluster.main.arn
      DB_NAME    = aws_rds_cluster.main.database_name
      SECRET_ARN = aws_secretsmanager_secret.rds.arn
    }
    interpreter = ["bash", "-c"]
  }
}

resource "null_resource" "sample_data" {
  count = terraform.workspace == "dev" ? 1 : 0

  triggers = {
    file = filesha1("data.sql")
  }
  provisioner "local-exec" {
    command = <<-EOF
			while read line; do
				echo "$line"
				aws rds-data execute-statement --resource-arn "$DB_ARN" --database  "$DB_NAME" --secret-arn "$SECRET_ARN" --sql "$line"
			done  < <(awk 'BEGIN{RS=";\n"}{gsub(/\n/,""); if(NF>0) {print $0";"}}' data.sql)
			EOF
    environment = {
      DB_ARN     = aws_rds_cluster.main.arn
      DB_NAME    = aws_rds_cluster.main.database_name
      SECRET_ARN = aws_secretsmanager_secret.rds.arn
    }
    interpreter = ["bash", "-c"]
  }

  depends_on = [
    null_resource.db_ddl
  ]
}

References

https://stackoverflow.com/questions/45394458/how-to-apply-sql-scripts-on-rds-with-terraform
https://advancedweb.hu/how-to-run-sql-scripts-against-the-rds-data-api-with-terraform/

Ryusei Kakujo

researchgatelinkedingithub

Focusing on data science for mobility

Bench Press 100kg!