Modularising Terraform IaC across microservice environments - amazon-web-services

I am trying to refactor my IaC Terraform set up to repeat less code and be quicker to make changes. I am working on a serverless microservice application, and so, for example i am running a few instances of aws-ecs-autoscaling and aws-ecs. I have develop and production environments, and within each one a modules folder where each microservice module is defined. Please see image for mock folder structure.
As you can see there are many repeated folders. In the main.tf of the dev and prod environments each module is called and vars assigned.
EG:
ecs-autoscaling-microservice-A main.tf
resource "aws_appautoscaling_target" "dev_ecs_autoscaling_microservice_A_target" {
max_capacity = 2
min_capacity = 1
resource_id = "service/${var.ecs_cluster.name}/${var.ecs_service.name}"
scalable_dimension = "ecs:service:DesiredCount"
service_namespace = "ecs"
}
resource "aws_appautoscaling_policy" "dev_ecs_autoscaling_microservice_A_memory" {
name = "dev_ecs_autoscaling_microservice_A_memory"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.dev_ecs_autoscaling_microservice_A_target.resource_id
scalable_dimension = aws_appautoscaling_target.dev_ecs_autoscaling_microservice_A_target.scalable_dimension
service_namespace = aws_appautoscaling_target.dev_ecs_autoscaling_microservice_A_target.service_namespace
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "ECSServiceAverageMemoryUtilization"
}
target_value = 80
}
}
resource "aws_appautoscaling_policy" "dev_ecs_autoscaling_microservice_A_cpu" {
name = "dev_ecs_autoscaling_microservice_A_cpu"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.dev_ecs_autoscaling_microservice_A_target.resource_id
scalable_dimension = aws_appautoscaling_target.dev_ecs_autoscaling_microservice_A_target.scalable_dimension
service_namespace = aws_appautoscaling_target.dev_ecs_autoscaling_microservice_A_target.service_namespace
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "ECSServiceAverageCPUUtilization"
}
target_value = 60
}
}
DEVELOP main.tf
module "ecs_autoscaling_microservice_A" {
source = "./modules/ecs-autoscaling-microservice-A"
ecs_cluster = module.ecs_autoscaling_microservice_A.ecs_cluster_A
ecs_service = module.ecs_autoscaling_microservice_A.ecs_service_A
}
My question is what is the best way to go about removing all modules. SO that instead of having an ecs module for each microservice for both prod and dev environments, I can just have 1 module for ecs, that can be re used for any microservice in any environment. See Image for required folder structure. Is this possible or am i wasting my time? I was thinking of using some kind of for_each where each microservice is defined before hand with its owned mapped variables. But would like some guidance please. Thanks in advance!

I suggest you read the excellent series of blog posts on Terraform by Yevgeniy Brikman which cleared my understanding of Terraform:
https://blog.gruntwork.io/a-comprehensive-guide-to-terraform-b3d32832baca
This exact question seems to be touched on in this one: https://blog.gruntwork.io/how-to-create-reusable-infrastructure-with-terraform-modules-25526d65f73d

Related

Terraform loop through multiple providers(accounts) - invokation through module

i have a use case where need help to use for_each to loop through multiple providers( AWS accounts & regions) and this is a module, the TF will be using hub and spoke model.
below is the TF Pseudo code i would like to achieve.
module.tf
---------
app_accounts = [
{ "account" : "53xxxx08", "app_vpc_id" : "vpc-0fxxxxxfec8", "role" : "xxxxxxx", "profile" : "child1"},
{ "account" : "53xxxx08", "app_vpc_id" : "vpc-0fxxxxxfec8", "role" : "xxxxxxx", "profile" : "child2"}
]
below are the provider and resource files, pleas ignore the variables and output files, as its not relevant here
provider.tf
------------
provider "aws" {
for_each = var.app_accounts
alias = "child"
profile = each.value.role
}
here is the main resouce block where i want to multiple child accounts against single master account, so i want to iterate through the loop
resource "aws_route53_vpc_association_authorization" "master" {
provider = aws.master
vpc_id = vpc_id
zone_id = zone_id
}
resource "aws_route53_zone_association" "child" {
provider = aws.child
vpc_id = vpc_id
zone_id = zone_id
}
any idea on how to achieve this, please? thanks in advance.
The typical way to achieve your goal in Terraform is to define a shared module representing the objects that should be present in a single account and then to call that module once for each account, passing a different provider configuration into each.
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
}
}
}
provider "aws" {
alias = "master"
# ...
}
provider "aws" {
alias = "example1"
profile = "example1"
}
module "example1" {
source = "./modules/account"
account = "53xxxx08"
app_vpc_id = "vpc-0fxxxxxfec8"
providers = {
aws = aws.example1
aws.master = aws.master
}
}
provider "aws" {
alias = "example2"
profile = "example2"
}
module "example2" {
source = "./modules/account"
account = "53xxxx08"
app_vpc_id = "vpc-0fxxxxxfec8"
providers = {
aws = aws.example2
aws.master = aws.master
}
}
The ./modules/account directory would then contain the resource blocks describing what should exist in each individual account. For example:
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
configuration_aliases = [ aws, aws.master ]
}
}
}
variable "account" {
type = string
}
variable "app_vpc_id" {
type = string
}
resource "aws_route53_zone" "example" {
# (omitting the provider argument will associate
# with the default provider configuration, which
# is different for each instance of this module)
# ...
}
resource "aws_route53_vpc_association_authorization" "master" {
provider = aws.master
vpc_id = var.app_vpc_id
zone_id = aws_route53_zone.example.id
}
resource "aws_route53_zone_association" "child" {
provider = aws.master
vpc_id = var.app_vpc_id
zone_id = aws_route53_zone.example.id
}
(I'm not sure if you actually intended var.app_vpc_id to be the VPC specified for those zone associations, but my goal here is only to show the general pattern, not to show a fully-working example.)
Using a shared module in this way allows to avoid repeating the definitions for each account separately, and keeps each account-specific setting specified in only one place (either in a provider "aws" block or in a module block).
There is no way to make this more dynamic within the Terraform language itself, but if you expect to be adding and removing accounts regularly and want to make it more systematic then you could use code generation for the root module to mechanically produce the provider and module block for each account, to ensure that they all remain consistent and that you can update them all together in case you need to change the interface of the shared module in a way that will affect all of the calls.

Issue with Creating Application Auto Scaling with AWS Lambda using Terraform

I'm converting some Cloudformation into Terraform that creates a Lambda and then sets up Provisioned Concurrency and Application Auto Scaling for the Lambda. When Terraform runs the aws_appautoscaling_target resource, it fails with the following message:
Error: Error creating application autoscaling target: ValidationException: Unsupported service namespace, resource type or scalable dimension
I haven't found too many examples of the aws_appautoscaling_target resource being used with Lambdas. Is this no longer supported? For reference, I'm running Terraform version 1.0.11 and I'm using AWS provider version 3.66.0. I'm posting my Terraform below. Thanks.
data "archive_file" "foo_create_dist_pkg" {
source_dir = var.lambda_file_location
output_path = "foo.zip"
type = "zip"
}
resource "aws_lambda_function" "foo" {
function_name = "foo"
description = "foo lambda"
handler = "foo.main"
runtime = "python3.8"
publish = true
role = "arn:aws:iam::${local.account_id}:role/serverless-role"
memory_size = 256
timeout = 900
depends_on = [data.archive_file.foo_create_dist_pkg]
source_code_hash = data.archive_file.foo_create_dist_pkg.output_base64sha256
filename = data.archive_file.foo_create_dist_pkg.output_path
}
resource "aws_lambda_provisioned_concurrency_config" "foo_provisioned_concurrency" {
function_name = aws_lambda_function.foo.function_name
provisioned_concurrent_executions = 15
qualifier = aws_lambda_function.foo.version
}
resource "aws_appautoscaling_target" "autoscale_foo" {
max_capacity = var.PCMax
min_capacity = var.PCMin
resource_id = "function:${aws_lambda_function.foo.function_name}"
scalable_dimension = "lambda:function:ProvisionedConcurrency"
service_namespace = "lambda"
}
You need to publish your Lambda to get a new version. This can be done by setting publish = true in aws_lambda_function resource. This will give a numeric version for your function which can be used in the aws_appautoscaling_target:
resource "aws_appautoscaling_target" "autoscale_foo" {
max_capacity = var.PCMax
min_capacity = var.PCMin
resource_id = "function:${aws_lambda_function.foo.function_name}:${aws_lambda_function.foo.version}"
scalable_dimension = "lambda:function:ProvisionedConcurrency"
service_namespace = "lambda"
}
Alternatively, you can create an aws_lambda_alias and use that in the aws_appautoscaling_target instead of the Lambda version. Nevertheless, this would require also the function to be published.

Dinamically add resources in Terraform

I set up a jenkins pipeline that launches terraform to create a new EC2 instance in our VPC and register it to our private hosted zone on R53 (which is created at the same time) at every run.
I also managed to save the state into S3 so it doesn't fail with the hosted zone being re-created.
the main issue I have is that at every run terraform keeps replacing the previous instance with the new one and not adding it to the pool of instances.
How can avoid this?
here's a snippet of my code
terraform {
backend "s3" {
bucket = "<redacted>"
key = "<redacted>/terraform.tfstate"
region = "eu-west-1"
}
}
provider "aws" {
region = "${var.region}"
}
data "aws_ami" "image" {
# limit search criteria for performance
most_recent = "${var.ami_filter_most_recent}"
name_regex = "${var.ami_filter_name_regex}"
owners = ["${var.ami_filter_name_owners}"]
# filter on tag purpose
filter {
name = "tag:purpose"
values = ["${var.ami_filter_purpose}"]
}
# filter on tag os
filter {
name = "tag:os"
values = ["${var.ami_filter_os}"]
}
}
resource "aws_instance" "server" {
# use extracted ami from image data source
ami = data.aws_ami.image.id
availability_zone = data.aws_subnet.most_available.availability_zone
subnet_id = data.aws_subnet.most_available.id
instance_type = "${var.instance_type}"
vpc_security_group_ids = ["${var.security_group}"]
user_data = "${var.user_data}"
iam_instance_profile = "${var.iam_instance_profile}"
root_block_device {
volume_size = "${var.root_disk_size}"
}
ebs_block_device {
device_name = "${var.extra_disk_device_name}"
volume_size = "${var.extra_disk_size}"
}
tags = {
Name = "${local.available_name}"
}
}
resource "aws_route53_zone" "private" {
name = var.hosted_zone_name
vpc {
vpc_id = var.vpc_id
}
}
resource "aws_route53_record" "record" {
zone_id = aws_route53_zone.private.zone_id
name = "${local.available_name}.${var.hosted_zone_name}"
type = "A"
ttl = "300"
records = [aws_instance.server.private_ip]
depends_on = [
aws_route53_zone.private
]
}
the outcome is that my previously created instance is destroyed and a new one is created. what I want is to keep adding instances with this code.
thank you
Your code creates only one instance aws_instance.server, and any change to its properties will modify that one instance only as your backend is in S3, thus it acts as a global state for each pipeline. The same goes for aws_route53_record.record and anything else in your script.
If you want different pipelines to reuse the same exact script, you should either use different workspaces, or create different TF states for each pipeline. The other alternative is to redefine your TF script to take a map of instances as an input variable and use for_each to create different instances.
If those instances should be same, you should manage their count using using aws_autoscaling_group and desired capacity.

How to fix "An Unknown Error Occurred" when creating multiple Google Cloud SQL instances with private IP simultaneously?

Our cloud backend setup contains 5 Cloud SQL for Postgres instances. We manage our infrastructure using Terraform. We are using connecting them from GKE using a public IP and the Cloud SQL container.
In order to simplify our setup we wish to get rid of the proxy containers by moving to a private IP. I tried following the Terraform guide. While a creating a single instance works fine, trying to create 5 instances simultaneously ends in 4 failed ones and one successful:
The error which appears in the Google Clod Console on the failed instances is "An Unknown Error occurred":
Following is the code which reproduces it. Pay attention to the count = 5 line:
resource "google_compute_network" "private_network" {
provider = "google-beta"
name = "private-network"
}
resource "google_compute_global_address" "private_ip_address" {
provider = "google-beta"
name = "private-ip-address"
purpose = "VPC_PEERING"
address_type = "INTERNAL"
prefix_length = 16
network = "${google_compute_network.private_network.self_link}"
}
resource "google_service_networking_connection" "private_vpc_connection" {
provider = "google-beta"
network = "${google_compute_network.private_network.self_link}"
service = "servicenetworking.googleapis.com"
reserved_peering_ranges = ["${google_compute_global_address.private_ip_address.name}"]
}
resource "google_sql_database_instance" "instance" {
provider = "google-beta"
count = 5
name = "private-instance-${count.index}"
database_version = "POSTGRES_9_6"
depends_on = [
"google_service_networking_connection.private_vpc_connection"
]
settings {
tier = "db-custom-1-3840"
availability_type = "REGIONAL"
ip_configuration {
ipv4_enabled = "false"
private_network = "${google_compute_network.private_network.self_link}"
}
}
}
provider "google-beta" {
version = "~> 2.5"
credentials = "credentials.json"
project = "PROJECT_ID"
region = "us-central1"
zone = "us-central1-a"
}
I tried several alternatives:
Waiting a minute after creating the google_service_networking_connection and then creating all the instances simultaneously, but I got the same error.
Creating an address range and a google_service_networking_connection per instance, but I got an error that google_service_networking_connection cannot be created simultaneously.
Creating an address range per instance and a single google_service_networking_connection which links to all of them, but I got the same error.
Found an ugly yet working solution. There is a bug in GCP which does not prevent simultaneous creation of instances although it cannot be completed. There is neither documentation about it nor a meaningful error message. It appears in the Terraform Google provider issue tracker as well.
One alternative is adding a dependence between the instances. This allows their creation to complete successfully. However, each instance takes several minutes to create. This accumulates to many spent minutes. If we add an artificial delay of 60 seconds between instance creation, we manage to avoid the failures. Notes:
The needed amount of seconds to delay depends on the instance tier. For example, for db-f1-micro, 30 seconds were enough. They were not enough for db-custom-1-3840.
I am not sure what is the exact number of needed seconds for db-custom-1-3840. 30 seconds were not enough, 60 were.
Following is a the code sample to resolve the issue. It shows 2 instances only since due to depends_on limitations I could not use the count feature and showing the full code for 5 instances would be very long. It works the same for 5 instances:
resource "google_compute_network" "private_network" {
provider = "google-beta"
name = "private-network"
}
resource "google_compute_global_address" "private_ip_address" {
provider = "google-beta"
name = "private-ip-address"
purpose = "VPC_PEERING"
address_type = "INTERNAL"
prefix_length = 16
network = "${google_compute_network.private_network.self_link}"
}
resource "google_service_networking_connection" "private_vpc_connection" {
provider = "google-beta"
network = "${google_compute_network.private_network.self_link}"
service = "servicenetworking.googleapis.com"
reserved_peering_ranges = ["${google_compute_global_address.private_ip_address.name}"]
}
locals {
db_instance_creation_delay_factor_seconds = 60
}
resource "null_resource" "delayer_1" {
depends_on = ["google_service_networking_connection.private_vpc_connection"]
provisioner "local-exec" {
command = "echo Gradual DB instance creation && sleep ${local.db_instance_creation_delay_factor_seconds * 0}"
}
}
resource "google_sql_database_instance" "instance_1" {
provider = "google-beta"
name = "private-instance-delayed-1"
database_version = "POSTGRES_9_6"
depends_on = [
"google_service_networking_connection.private_vpc_connection",
"null_resource.delayer_1"
]
settings {
tier = "db-custom-1-3840"
availability_type = "REGIONAL"
ip_configuration {
ipv4_enabled = "false"
private_network = "${google_compute_network.private_network.self_link}"
}
}
}
resource "null_resource" "delayer_2" {
depends_on = ["google_service_networking_connection.private_vpc_connection"]
provisioner "local-exec" {
command = "echo Gradual DB instance creation && sleep ${local.db_instance_creation_delay_factor_seconds * 1}"
}
}
resource "google_sql_database_instance" "instance_2" {
provider = "google-beta"
name = "private-instance-delayed-2"
database_version = "POSTGRES_9_6"
depends_on = [
"google_service_networking_connection.private_vpc_connection",
"null_resource.delayer_2"
]
settings {
tier = "db-custom-1-3840"
availability_type = "REGIONAL"
ip_configuration {
ipv4_enabled = "false"
private_network = "${google_compute_network.private_network.self_link}"
}
}
}
provider "google-beta" {
version = "~> 2.5"
credentials = "credentials.json"
project = "PROJECT_ID"
region = "us-central1"
zone = "us-central1-a"
}
provider "null" {
version = "~> 1.0"
}
In case someone lands here with a slightly different case (creating google_sql_database_instance in a private network results in an "Unknown error"):
Launch one Cloud SQL instance manually (this will enable servicenetworking.googleapis.com and some other APIs for the project it seems)
Run your manifest
Terminate the instance created in step 1.
Works for me after that
¯_(ツ)_/¯
I land here with a slightly different case, same as #Grigorash Vasilij
(creating google_sql_database_instance in a private network results in an "Unknown error").
I was using the UI to deploy an SQL instance on a private VPC, for some reason that trows me an "Unknown error" as well. I finally solved using the gcloud command instead (why that works and no the UI? IDK, maybe the UI is not doing the same as the command)
gcloud --project=[PROJECT_ID] beta sql instances create [INSTANCE_ID]
--network=[VPC_NETWORK_NAME]
--no-assign-ip
follow this for more details

Avoid Update of resources in Terraform

Currently we are using the Blue/Green Deployment Model for our Application using Terraform.
And our TF Files have resources for both Blue & Green as seen below -
resource "aws_instance" "green_node" {
count = "${var.node_count * var.keep_green * var.build}"
lifecycle = {
create_before_destroy = true
}
ami = "${var.green_ami_id}"
instance_type = "${lookup(var.instance_type,lower(var.env))}"
security_groups = "${split(",", lookup(var.security_groups, format("%s-%s", lower(var.env),var.region)))}"
subnet_id = "${element(split(",", lookup(var.subnets, format("%s-%s", lower(var.env),var.region))), count.index)}"
iam_instance_profile = "${var.iam_role}"
key_name = "${var.key_name}"
associate_public_ip_address = "false"
tags {
Name = "node-green-${var.env}-${count.index + 1}"
}
user_data = "${data.template_cloudinit_config.green_node.rendered}"
}
resource "aws_instance" "blue_node" {
count = "${var.node_count * var.keep_blue * var.build}"
lifecycle = {
create_before_destroy = true
}
ami = "${var.blue_ami_id}"
instance_type = "${lookup(var.instance_type,lower(var.env))}"
security_groups = "${split(",", lookup(var.security_groups, format("%s-%s", lower(var.env),var.region)))}"
subnet_id = "${element(split(",", lookup(var.subnets, format("%s-%s", lower(var.env),var.region))), count.index)}"
iam_instance_profile = "${var.iam_role}"
key_name = "${var.key_name}"
associate_public_ip_address = "false"
tags {
Name = "node-blue-${var.env}-${count.index + 1}"
}
user_data = "${data.template_cloudinit_config.blue_node.rendered}"
}
My question - Is there a way to update the Green Resources without updating the Blue Resources and vice versa Without Using Targeted Plan. For eg. If we update the Security Groups(var.security_groups) which is a common variable, the update will occur for both Blue and Green and i will have to do a targeted plan(seen below)to avoid Blue Resources from getting updated with the New Security Group's -
terraform plan -out=green.plan -target=<green_resource_name>
This is a good question.
If you need to make the blue/green stack work as your expect and reduce the complexity of the code, You can use terraform modules, and set a variable to control which color you will update.
So the stack shares the module when you need update blue or green resources. Define a variable, such as TF_VAR_stack_color to blue or green
Add ${var.stack_color} in the name of any resources you try to create/update in modules.
module "nodes" {
source = "modules/nodes"
name = "${var.name}-${var.stack_color}-${var.others}"
...
}
So you can deploy the blue resource with below command without impact the running green resources.
TF_VAR_stack_color=blue terraform plan
or
terraform plan -var stack_color=blue
With terraform modules, you needn't write resource aws_instance two times for blue and green nodes.
I will recommend splitting the resources into different state files by terraform init, so they will be the totally separate stacks.