Deploying resource to multiple regions w/ TF 0.12/13 - amazon-web-services

We have a rather complex environment where we have lots of AWS accounts, in multiple regions and these are all connected to a transit network via VPN tunnels.
At the moment we deploy Customer Gateways via a "VPC" module for each VPC in a region but the problem that we get is that deploying the first VPC is fine but subsequent VPC deploys cause issues with the fact that the CGW is already there and so we have to import it before we can continue which isn't an ideal place to be in and also I think there's a risk that if we tear down a VPC it might try to kill the CGW that is being used by other VPN's.
What I'm wanting to do is deploy the CGW's separately from the VPC and then the VPC does a data lookup for the CGW.
I've been thinking that perhaps we can use our "base" job to provision the CGW's that are defined in the variables file but nothing I've tried has worked so far.
The variable definition would be:
variable "region_data" {
type = list(object({
region = string
deploy_cgw = bool
gateways = any
}))
default = [
{
region = "eu-west-1"
deploy_cgw = true
gateways = [
{
name = "gateway1"
ip = "1.2.3.4"
},
{
name = "gateway2"
ip = "2.3.4.5"
}
]
},
{
region = "us-east-1"
deploy_cgw = true
gateways = [
{
name = "gateway1"
ip = "2.3.4.5"
},
{
name = "gateway2"
ip = "3.4.5.6"
}
]
}
]
}
I've tried a few things, like:
locals {
regions = [for region in var.region_data : region if region.deploy_cgw]
cgws = flatten([
for region in local.regions : [
for gateway in region.gateways : {
region = region.region
name = gateway.name
ip = gateway.ip
}
]
])
}
provider "aws" {
region = "eu-west-1"
alias = "eu-west-1"
}
provider "aws" {
region = "us-east-1"
alias = "us-east-1"
}
module "cgw" {
source = "../../../modules/customer-gateway"
for_each = { for cgw in local.cgws: "${cgw.region}.${cgw.name}" => cgw }
name_tag = each.value.name
ip_address = each.value.ip
providers = {
aws = "aws.${each.value.region}"
}
}
But with this I get:
Error: Invalid provider configuration reference
on main.tf line 439, in module "cgw":
439: aws = "aws.${each.value.region}"
A provider configuration reference must not be given in quotes.
If I move the AWS provider into the module and pass the region as a parameter, I get the following:
Error: Module does not support for_each
on main.tf line 423, in module "cgw":
423: for_each = { for cgw in local.testing : "${cgw.region}.${cgw.name}" => cgw }
Module "cgw" cannot be used with for_each because it contains a nested
provider configuration for "aws", at
I've done quite a bit of research and the last one I understand is something that Terraform take a tough stance on.
Is what I'm asking possible?

for_each can't be used on modules that have providers defined within them. I was disappointed to find this out too. They do this because having nested providers does cause nightmares if that provider goes away, then you have orphaned resources in the state that you can't manage and your plans will fail. It is, however, entirely possible in https://www.pulumi.com/. I'm sick of the limitations in terraform and will be moving to pulumi. But that's not what you asked so I'll move on.
Definitely don't keep importing it. You'll end up with multiple parts of your terraform managing the same resource.
Just create the cgw once per region. Then pass the id into your vpc module. You can't iterate over providers, so have one module per provider. In other words, for each over all vpcs in the same account and same region per module call.
resource "aws_customer_gateway" "east" {
bgp_asn = 65000
ip_address = "172.83.124.10"
type = "ipsec.1"
}
resource "aws_customer_gateway" "west" {
bgp_asn = 65000
ip_address = "172.83.128.10"
type = "ipsec.1"
}
module "east" {
source = "../../../modules/customer-gateway"
for_each = map(
{
name = "east1"
ip = "1.2.3.4"
},
{
name = "east2"
ip = "1.2.3.5"
},
)
name_tag = each.value.name
ip_address = each.value.ip
cgw_id = aws_customer_gateway.east.id
providers = {
aws = "aws.east"
}
}
module "west" {
source = "../../../modules/customer-gateway"
for_each = map(
{
name = "west1"
ip = "1.2.3.4"
},
{
name = "west2"
ip = "1.2.3.5"
},
)
name_tag = each.value.name
ip_address = each.value.ip
cgw_id = aws_customer_gateway.west.id
providers = {
aws = "aws.west"
}
}

Related

Terraform loop through multiple providers(accounts) - invokation through module

i have a use case where need help to use for_each to loop through multiple providers( AWS accounts & regions) and this is a module, the TF will be using hub and spoke model.
below is the TF Pseudo code i would like to achieve.
module.tf
---------
app_accounts = [
{ "account" : "53xxxx08", "app_vpc_id" : "vpc-0fxxxxxfec8", "role" : "xxxxxxx", "profile" : "child1"},
{ "account" : "53xxxx08", "app_vpc_id" : "vpc-0fxxxxxfec8", "role" : "xxxxxxx", "profile" : "child2"}
]
below are the provider and resource files, pleas ignore the variables and output files, as its not relevant here
provider.tf
------------
provider "aws" {
for_each = var.app_accounts
alias = "child"
profile = each.value.role
}
here is the main resouce block where i want to multiple child accounts against single master account, so i want to iterate through the loop
resource "aws_route53_vpc_association_authorization" "master" {
provider = aws.master
vpc_id = vpc_id
zone_id = zone_id
}
resource "aws_route53_zone_association" "child" {
provider = aws.child
vpc_id = vpc_id
zone_id = zone_id
}
any idea on how to achieve this, please? thanks in advance.
The typical way to achieve your goal in Terraform is to define a shared module representing the objects that should be present in a single account and then to call that module once for each account, passing a different provider configuration into each.
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
}
}
}
provider "aws" {
alias = "master"
# ...
}
provider "aws" {
alias = "example1"
profile = "example1"
}
module "example1" {
source = "./modules/account"
account = "53xxxx08"
app_vpc_id = "vpc-0fxxxxxfec8"
providers = {
aws = aws.example1
aws.master = aws.master
}
}
provider "aws" {
alias = "example2"
profile = "example2"
}
module "example2" {
source = "./modules/account"
account = "53xxxx08"
app_vpc_id = "vpc-0fxxxxxfec8"
providers = {
aws = aws.example2
aws.master = aws.master
}
}
The ./modules/account directory would then contain the resource blocks describing what should exist in each individual account. For example:
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
configuration_aliases = [ aws, aws.master ]
}
}
}
variable "account" {
type = string
}
variable "app_vpc_id" {
type = string
}
resource "aws_route53_zone" "example" {
# (omitting the provider argument will associate
# with the default provider configuration, which
# is different for each instance of this module)
# ...
}
resource "aws_route53_vpc_association_authorization" "master" {
provider = aws.master
vpc_id = var.app_vpc_id
zone_id = aws_route53_zone.example.id
}
resource "aws_route53_zone_association" "child" {
provider = aws.master
vpc_id = var.app_vpc_id
zone_id = aws_route53_zone.example.id
}
(I'm not sure if you actually intended var.app_vpc_id to be the VPC specified for those zone associations, but my goal here is only to show the general pattern, not to show a fully-working example.)
Using a shared module in this way allows to avoid repeating the definitions for each account separately, and keeps each account-specific setting specified in only one place (either in a provider "aws" block or in a module block).
There is no way to make this more dynamic within the Terraform language itself, but if you expect to be adding and removing accounts regularly and want to make it more systematic then you could use code generation for the root module to mechanically produce the provider and module block for each account, to ensure that they all remain consistent and that you can update them all together in case you need to change the interface of the shared module in a way that will affect all of the calls.

How to deploy aws infrastructure in multiple regions using terraform?

I want to deploy my infrastructure to multiple regions, right now I am doing this using provider-alias and thus a single state file is created.
main.tf
module "mumbai" {
source = "./site-to-site-vpn-setup"
providers = { aws = aws.mumbai }
vpc_cidr = local.vpc_cidr
cg_ip_address = lookup(local.region_map, "mumbai", "") == "" ? "" : lookup(local.region_map, "mumbai", "").cg_ip_address
}
module "seoul" {
source = "./site-to-site-vpn-setup"
providers = { aws = aws.seoul }
vpc_cidr = local.vpc_cidr
cg_ip_address = lookup(local.region_map, "seoul", "") == "" ? "" : lookup(local.region_map, "seoul", "").cg_ip_address
}
providers.tf
provider "aws" {
alias = "mumbai"
region = "ap-south-1"
}
provider "aws" {
alias = "seoul"
region = "ap-northeast-2"
}
Now, for example I want to increase the computes for all the regions but I want to first apply it to say 2 regions for testing. With the current structure if I do terraform apply then it is applied in all the regions. Now one solution can be to create different directories for different regions and thus different state files for each, Now how can this be optimized further or what other better approaches are there?

Error AWS Terraform VPC Peering while running TF Import

I have created a VPC peering between 2 AWS accounts. One VPC for account A is in us-east-1 and second VPC for account B is in us-west-2.
The peering connection is active and is working fine!
I need to now add it in my terraform code for both Accounts terraform codes.
I have adding it in ACCOUNT B first right now!
This is what I have done till yet:
# VPC peering connection #
# (3) #
##########################
resource "aws_vpc_peering_connection" "this_3" {
count = var.create_peering_3 ? 1 : 0
peer_owner_id = var.peer_account_id_3
peer_vpc_id = var.vpc_peer_id_3
vpc_id = module.vpc-us-west-2.vpc_id
auto_accept = var.auto_accept_peering_3
}
and these are the variables:
##########################
# VPC peering connection #
# (3) #
##########################
variable "peer_account_id_3" {
description = "AWS owner account ID"
default = "**account*A**"
}
variable "vpc_peer_id_3" {
description = "Peer VPC ID"
default = "vpc-029***"
}
variable "peer_cidr_block_3" {
description = "Peer VPC CIDR block"
default = "192.168.0.0/16"
}
variable "auto_accept_peering_3" {
description = "Auto accept peering connection"
default = true
}
variable "create_peering_3" {
description = "Create peering connection, 0 to not create"
default = true
type = bool
}
variable "this_vpc_id_3" {
description = "This VPC ID"
default = "vpc-0e2**"
}
variable "private_route_table_ids_3" {
type = list(string)
description = "A list of private route tables"
default = ["rtb-0**, rtb-04**"]
}
variable "public_route_table_ids_3" {
type = list(string)
description = "A list of public route tables"
default = ["rtb-0f**"]
}
variable "peering_id_3" {
description = "Provide already existing peering connection id"
default = "pcx-0878***"
}
Now when I run tf plan it is creating it.. which I do not want it to do, as it is already made!
I want to see no changes in my plan!
I have also tried using the tf import command:
terraform import aws_vpc_peering_connection.this_3 pcx-0878******
but it gives me this error:
Error: Cannot import non-existent remote object
While attempting to import an existing object to
aws_vpc_peering_connection.this_3, the provider detected that no object exists
with the given id. Only pre-existing objects can be imported; check that the
id is correct and that it is associated with the provider's configured region
or endpoint, or use "terraform apply" to create a new remote object for this
resource.
I do not know how to fix this
Confirm if you are using the right credentials from account B.
provider "aws" {
alias = "account_b"
region = "us-west-2"
access_key = "my-access-key"
secret_key = "my-secret-key"
}
resource "aws_vpc_peering_connection" "this_3" {
provider = "aws.account_b"
count = var.create_peering_3 ? 1 : 0
peer_owner_id = var.peer_account_id_3
peer_vpc_id = var.vpc_peer_id_3
vpc_id = module.vpc-us-west-2.vpc_id
auto_accept = var.auto_accept_peering_3
}
And try to run the import again

Terraform cycle with AWS and Kubernetes provider

My Terraform code describes some AWS infrastructure to build a Kubernetes cluster including some deployments into the cluster. When I try to destroy the infrastructure using terraform plan -destroy I get a cycle:
module.eks_control_plane.aws_eks_cluster.this[0] (destroy)
module.eks_control_plane.output.cluster
provider.kubernetes
module.aws_auth.kubernetes_config_map.this[0] (destroy)
data.aws_eks_cluster_auth.this[0] (destroy)
Destroying the infrastructure works by hand using just terraform destroy works fine. Unfortunately, Terraform Cloud uses terraform plan -destroy to plan the destructuion first, which causes this to fail. Here is the relevant code:
excerpt from eks_control_plane module:
resource "aws_eks_cluster" "this" {
count = var.enabled ? 1 : 0
name = var.cluster_name
role_arn = aws_iam_role.control_plane[0].arn
version = var.k8s_version
# https://docs.aws.amazon.com/eks/latest/userguide/control-plane-logs.html
enabled_cluster_log_types = var.control_plane_log_enabled ? var.control_plane_log_types : []
vpc_config {
security_group_ids = [aws_security_group.control_plane[0].id]
subnet_ids = [for subnet in var.control_plane_subnets : subnet.id]
}
tags = merge(var.tags,
{
}
)
depends_on = [
var.dependencies,
aws_security_group.node,
aws_iam_role_policy_attachment.control_plane_cluster_policy,
aws_iam_role_policy_attachment.control_plane_service_policy,
aws_iam_role_policy.eks_cluster_ingress_loadbalancer_creation,
]
}
output "cluster" {
value = length(aws_eks_cluster.this) > 0 ? aws_eks_cluster.this[0] : null
}
aws-auth Kubernetes config map from aws_auth module:
resource "kubernetes_config_map" "this" {
count = var.enabled ? 1 : 0
metadata {
name = "aws-auth"
namespace = "kube-system"
}
data = {
mapRoles = jsonencode(
concat(
[
{
rolearn = var.node_iam_role.arn
username = "system:node:{{EC2PrivateDNSName}}"
groups = [
"system:bootstrappers",
"system:nodes",
]
}
],
var.map_roles
)
)
}
depends_on = [
var.dependencies,
]
}
Kubernetes provider from root module:
data "aws_eks_cluster_auth" "this" {
count = module.eks_control_plane.cluster != null ? 1 : 0
name = module.eks_control_plane.cluster.name
}
provider "kubernetes" {
version = "~> 1.10"
load_config_file = false
host = module.eks_control_plane.cluster != null ? module.eks_control_plane.cluster.endpoint : null
cluster_ca_certificate = module.eks_control_plane.cluster != null ? base64decode(module.eks_control_plane.cluster.certificate_authority[0].data) : null
token = length(data.aws_eks_cluster_auth.this) > 0 ? data.aws_eks_cluster_auth.this[0].token : null
}
And this is how the modules are called:
module "eks_control_plane" {
source = "app.terraform.io/SDA-SE/eks-control-plane/aws"
version = "0.0.1"
enabled = local.k8s_enabled
cluster_name = var.name
control_plane_subnets = module.vpc.private_subnets
k8s_version = var.k8s_version
node_subnets = module.vpc.private_subnets
tags = var.tags
vpc = module.vpc.vpc
dependencies = concat(var.dependencies, [
# Ensure that VPC including all security group rules, network ACL rules,
# routing table entries, etc. is fully created
module.vpc,
])
}
# aws-auth config map module. Creating this config map will allow nodes and
# Other users to join the cluster.
# CNI and CSI plugins must be set up before creating this config map.
# Enable or disable this via `aws_auth_enabled` variable.
# TODO: Add Developer and other roles.
module "aws_auth" {
source = "app.terraform.io/SDA-SE/aws-auth/kubernetes"
version = "0.0.0"
enabled = local.aws_auth_enabled
node_iam_role = module.eks_control_plane.node_iam_role
map_roles = [
{
rolearn = "arn:aws:iam::${var.aws_account_id}:role/Administrator"
username = "admin"
groups = [
"system:masters",
]
},
{
rolearn = "arn:aws:iam::${var.aws_account_id}:role/Terraform"
username = "terraform"
groups = [
"system:masters",
]
}
]
}
Removing the aws_auth config map, which means not using the Kubernetes provider at all, breaks the cycle. The problem is obviously that Terraform tries to destroys the Kubernetes cluster, which is required for the Kubernetes provider. Manually removing the resources step by step using multiple terraform apply steps works fine, too.
Is there a way that I can tell Terraform first to destroy all Kubernetes resources so that the Provider is not required anymore, then destroy the EKS cluster?
You can control the order of destruction using the depends_on meta-argument, like you did with some of your Terraform code.
If you add the depends_on argument to all of the required resources that are needing to be destroyed first and have it depend on the eks-cluster Terraform will destroy those resources before the cluster.
You can also visualize your configuration and dependency with the terraform graph command to help you make decisions on what dependencies need to be created.
https://www.terraform.io/docs/cli/commands/graph.html

How to fix "An Unknown Error Occurred" when creating multiple Google Cloud SQL instances with private IP simultaneously?

Our cloud backend setup contains 5 Cloud SQL for Postgres instances. We manage our infrastructure using Terraform. We are using connecting them from GKE using a public IP and the Cloud SQL container.
In order to simplify our setup we wish to get rid of the proxy containers by moving to a private IP. I tried following the Terraform guide. While a creating a single instance works fine, trying to create 5 instances simultaneously ends in 4 failed ones and one successful:
The error which appears in the Google Clod Console on the failed instances is "An Unknown Error occurred":
Following is the code which reproduces it. Pay attention to the count = 5 line:
resource "google_compute_network" "private_network" {
provider = "google-beta"
name = "private-network"
}
resource "google_compute_global_address" "private_ip_address" {
provider = "google-beta"
name = "private-ip-address"
purpose = "VPC_PEERING"
address_type = "INTERNAL"
prefix_length = 16
network = "${google_compute_network.private_network.self_link}"
}
resource "google_service_networking_connection" "private_vpc_connection" {
provider = "google-beta"
network = "${google_compute_network.private_network.self_link}"
service = "servicenetworking.googleapis.com"
reserved_peering_ranges = ["${google_compute_global_address.private_ip_address.name}"]
}
resource "google_sql_database_instance" "instance" {
provider = "google-beta"
count = 5
name = "private-instance-${count.index}"
database_version = "POSTGRES_9_6"
depends_on = [
"google_service_networking_connection.private_vpc_connection"
]
settings {
tier = "db-custom-1-3840"
availability_type = "REGIONAL"
ip_configuration {
ipv4_enabled = "false"
private_network = "${google_compute_network.private_network.self_link}"
}
}
}
provider "google-beta" {
version = "~> 2.5"
credentials = "credentials.json"
project = "PROJECT_ID"
region = "us-central1"
zone = "us-central1-a"
}
I tried several alternatives:
Waiting a minute after creating the google_service_networking_connection and then creating all the instances simultaneously, but I got the same error.
Creating an address range and a google_service_networking_connection per instance, but I got an error that google_service_networking_connection cannot be created simultaneously.
Creating an address range per instance and a single google_service_networking_connection which links to all of them, but I got the same error.
Found an ugly yet working solution. There is a bug in GCP which does not prevent simultaneous creation of instances although it cannot be completed. There is neither documentation about it nor a meaningful error message. It appears in the Terraform Google provider issue tracker as well.
One alternative is adding a dependence between the instances. This allows their creation to complete successfully. However, each instance takes several minutes to create. This accumulates to many spent minutes. If we add an artificial delay of 60 seconds between instance creation, we manage to avoid the failures. Notes:
The needed amount of seconds to delay depends on the instance tier. For example, for db-f1-micro, 30 seconds were enough. They were not enough for db-custom-1-3840.
I am not sure what is the exact number of needed seconds for db-custom-1-3840. 30 seconds were not enough, 60 were.
Following is a the code sample to resolve the issue. It shows 2 instances only since due to depends_on limitations I could not use the count feature and showing the full code for 5 instances would be very long. It works the same for 5 instances:
resource "google_compute_network" "private_network" {
provider = "google-beta"
name = "private-network"
}
resource "google_compute_global_address" "private_ip_address" {
provider = "google-beta"
name = "private-ip-address"
purpose = "VPC_PEERING"
address_type = "INTERNAL"
prefix_length = 16
network = "${google_compute_network.private_network.self_link}"
}
resource "google_service_networking_connection" "private_vpc_connection" {
provider = "google-beta"
network = "${google_compute_network.private_network.self_link}"
service = "servicenetworking.googleapis.com"
reserved_peering_ranges = ["${google_compute_global_address.private_ip_address.name}"]
}
locals {
db_instance_creation_delay_factor_seconds = 60
}
resource "null_resource" "delayer_1" {
depends_on = ["google_service_networking_connection.private_vpc_connection"]
provisioner "local-exec" {
command = "echo Gradual DB instance creation && sleep ${local.db_instance_creation_delay_factor_seconds * 0}"
}
}
resource "google_sql_database_instance" "instance_1" {
provider = "google-beta"
name = "private-instance-delayed-1"
database_version = "POSTGRES_9_6"
depends_on = [
"google_service_networking_connection.private_vpc_connection",
"null_resource.delayer_1"
]
settings {
tier = "db-custom-1-3840"
availability_type = "REGIONAL"
ip_configuration {
ipv4_enabled = "false"
private_network = "${google_compute_network.private_network.self_link}"
}
}
}
resource "null_resource" "delayer_2" {
depends_on = ["google_service_networking_connection.private_vpc_connection"]
provisioner "local-exec" {
command = "echo Gradual DB instance creation && sleep ${local.db_instance_creation_delay_factor_seconds * 1}"
}
}
resource "google_sql_database_instance" "instance_2" {
provider = "google-beta"
name = "private-instance-delayed-2"
database_version = "POSTGRES_9_6"
depends_on = [
"google_service_networking_connection.private_vpc_connection",
"null_resource.delayer_2"
]
settings {
tier = "db-custom-1-3840"
availability_type = "REGIONAL"
ip_configuration {
ipv4_enabled = "false"
private_network = "${google_compute_network.private_network.self_link}"
}
}
}
provider "google-beta" {
version = "~> 2.5"
credentials = "credentials.json"
project = "PROJECT_ID"
region = "us-central1"
zone = "us-central1-a"
}
provider "null" {
version = "~> 1.0"
}
In case someone lands here with a slightly different case (creating google_sql_database_instance in a private network results in an "Unknown error"):
Launch one Cloud SQL instance manually (this will enable servicenetworking.googleapis.com and some other APIs for the project it seems)
Run your manifest
Terminate the instance created in step 1.
Works for me after that
¯_(ツ)_/¯
I land here with a slightly different case, same as #Grigorash Vasilij
(creating google_sql_database_instance in a private network results in an "Unknown error").
I was using the UI to deploy an SQL instance on a private VPC, for some reason that trows me an "Unknown error" as well. I finally solved using the gcloud command instead (why that works and no the UI? IDK, maybe the UI is not doing the same as the command)
gcloud --project=[PROJECT_ID] beta sql instances create [INSTANCE_ID]
--network=[VPC_NETWORK_NAME]
--no-assign-ip
follow this for more details