Terraform AWS EKS Cluster Deployment Error - amazon-web-services

I have been trying to deploy an EKS cluster within us-east-1 region and I see that one of the availability zone us-east-1e does not support the setup due to which my cluster fails to create.
Please see the error below and let me know if there is a way to skip us-east-1e AZ within terraform deployment.
Plan: 26 to add, 0 to change, 0 to destroy.
This plan was saved to: development.tfplan
To perform exactly these actions, run the following command to apply:
terraform apply "development.tfplan"
(base) _C0DL:deploy-eks-cluster-using-terraform-master snadella001$
terraform apply
"development.tfplan"data.aws_availability_zones.available_azs:
Reading... [id=2020-12-04 22:10:40.079079 +0000 UTC]
data.aws_availability_zones.available_azs: Read complete after 0s
[id=2020-12-04 22:10:47.208548 +0000 UTC]
module.eks-cluster.aws_eks_cluster.this[0]: Creating...
Error: error creating EKS Cluster (eks-ha):
UnsupportedAvailabilityZoneException: Cannot create cluster 'eks-hia'
because us-east-1e, the targeted availability zone, does not currently
have sufficient capacity to support the cluster. Retry and choose from
these availability zones: us-east-1a, us-east-1b, us-east-1c,
us-east-1d, us-east-1f { RespMetadata: {
StatusCode: 400,
RequestID: "0f2ddbd1-107f-490e-b45f-6985e1c7f1f8" }, ClusterName: "eks-ha", Message_: "Cannot create cluster 'eks-hia'
because us-east-1e, the targeted availability zone, does not currently
have sufficient capacity to support the cluster. Retry and choose from
these availability zones: us-east-1a, us-east-1b, us-east-1c,
us-east-1d, us-east-1f", ValidZones: [
"us-east-1a",
"us-east-1b",
"us-east-1c",
"us-east-1d",
"us-east-1f" ] }
on .terraform/modules/eks-cluster/cluster.tf line 9, in resource
"aws_eks_cluster" "this": 9: resource "aws_eks_cluster" "this" {
Please find the EKS cluster listed below:
# create EKS cluster
module "eks-cluster" {
source = "terraform-aws-modules/eks/aws"
version = "12.1.0"
cluster_name = var.cluster_name
cluster_version = "1.17"
write_kubeconfig = false
availability-zones = ["us-east-1a", "us-east-1b", "us-east-1c"]## tried but does not work
subnets = module.vpc.private_subnets
vpc_id = module.vpc.vpc_id
worker_groups_launch_template = local.worker_groups_launch_template
# map developer & admin ARNs as kubernetes Users
map_users = concat(local.admin_user_map_users, local.developer_user_map_users)
}
# get EKS cluster info to configure Kubernetes and Helm providers
data "aws_eks_cluster" "cluster" {
name = module.eks-cluster.cluster_id
}
data "aws_eks_cluster_auth" "cluster" {
name = module.eks-cluster.cluster_id
}
#################
# Private subnet
#################
resource "aws_subnet" "private" {
count = var.create_vpc && length(var.private_subnets) > 0 ? length(var.private_subnets) : 0
vpc_id = local.vpc_id
cidr_block = var.private_subnets[count.index]
# availability_zone = ["us-east-1a", "us-east-1b", "us-east-1c"]
availability_zone = length(regexall("^[a-z]{2}-", element(var.azs, count.index))) > 0 ? element(var.azs, count.index) : null
availability_zone_id = length(regexall("^[a-z]{2}-", element(var.azs, count.index))) == 0 ? element(var.azs, count.index) : null
assign_ipv6_address_on_creation = var.private_subnet_assign_ipv6_address_on_creation == null ? var.assign_ipv6_address_on_creation : var.private_subnet_assign_ipv6_address_on_creation
ipv6_cidr_block = var.enable_ipv6 && length(var.private_subnet_ipv6_prefixes) > 0 ? cidrsubnet(aws_vpc.this[0].ipv6_cidr_block, 8, var.private_subnet_ipv6_prefixes[count.index]) : null
tags = merge(
{
"Name" = format(
"%s-${var.private_subnet_suffix}-%s",
var.name,
element(var.azs, count.index),
)
},
var.tags,
var.private_subnet_tags,
)
}
variable "azs" {
description = "A list of availability zones names or ids in the region"
type = list(string)
default = []
#default = ["us-east-1a", "us-east-1b","us-east-1c","us-east-1d"]
}

module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "2.44.0"
name = "${var.name_prefix}-vpc"
cidr = var.main_network_block
# azs = data.aws_availability_zones.available_azs.names
azs = ["us-east-1a", "us-east-1b", "us-east-1c"]
private_subnets = [
# this loop will create a one-line list as ["10.0.0.0/20", "10.0.16.0/20", "10.0.32.0/20", ...]
# with a length depending on how many Zones are available
for zone_id in data.aws_availability_zones.available_azs.zone_ids :
cidrsubnet(var.main_network_block, var.subnet_prefix_extension, tonumber(substr(zone_id, length(zone_id) - 1, 1)) - 1)
]

Related

limit AZs aws_availability_zones using terraform aws vpc module

CIDR = 10.50.0.0/16
variable "region" {
default = "us-east-1"
description = "AWS region"
}
data "aws_availability_zones" "available" {}
us-east-1 have 6 Azs.
["us-east-1a", "us-east-1b", "us-east-1c", "us-east-1d", "us-east-1e", "us-east-1f"]
I want to create 1 public and 1 private subnet per AZ configured.
I got 3 environment (dev/stage/prod)
For
dev env, I want to create subnet on 3 availability zones
stage env, on 4 availability zones
prod env on all availability zones. for this us-east-1 region have 6 availability_zones.
local.tf
locals {
selected_azs = map(data.avaialbility_zones.name[3])
}
vpc.tf
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
name = var.vpc_name
cidr = var.vpc_cidr
azs = data.aws_availability_zones.available.names
private_subnets = var.ath_private_subnet_block
public_subnets = var.ath_public_subnet_block
enable_nat_gateway = local.natgw_states[var.natgw_configuration].enable_nat_gateway
single_nat_gateway = local.natgw_states[var.natgw_configuration].single_nat_gateway
one_nat_gateway_per_az = local.natgw_states[var.natgw_configuration].one_nat_gateway_per_az
tags = var.resource_tags
}
variable.tf
variable "az_throttle_limit" {
type = number
default = 0
description = "number of AZs to limit to, 0 for all"
}
Any advice on reading availability zones, How Can I control from the local state.
Default will create a subnet for all availability zones on the current region
summary:
Target AZs: all “opt-in-not-required” AZs.
(us-east-1a, us-east-1b, us-east-1c, us-east-1d, us-east-1e, us-east-1f)
AZ’s should not be a static list, and should be automatically get from aws
Configurable: limit # of AZs to limit resources used (especially in non-production environment)
You can have new variable called env with local variable having different AZs for each env:
variable "env" {
type = string
default = "dev"
}
locals {
selected_azs = {
"dev" = [for i in range(3): data.aws_availability_zones.available.names[i]]
"stage" = [for i in range(4): data.aws_availability_zones.available.names[i]]
"prod" = data.aws_availability_zones.available.names
}
}
then use it:
azs = locals.selected_azs[var.env]

How Aws ecs fargate availablity zone works?

Main Two Question with terraform code.
Alb for Ecs fargate is for routing to another avaliablity zones? or routing to containers
If I create a subnet based on the availability zone number (us-east-2a, 2b, 2c, so number is 3 and create 3 subnets) and map it to an ecs cluster with alb, does the availability zone apply?
I'm trying to build infra like under image
resource "aws_vpc" "cluster_vpc" {
tags = {
Name = "ecs-vpc"
}
cidr_block = "10.30.0.0/16"
}
data "aws_availability_zones" "available" {
}
resource "aws_subnet" "cluster" {
vpc_id = aws_vpc.cluster_vpc.id
count = length(data.aws_availability_zones.available.names)
cidr_block = "10.30.${10 + count.index}.0/24"
availability_zone = data.aws_availability_zones.available.names[count.index]
tags = {
Name = "ecs-subnet"
}
}
resource "aws_internet_gateway" "cluster_igw" {
vpc_id = aws_vpc.cluster_vpc.id
tags = {
Name = "ecs-igw"
}
}
resource "aws_route_table" "public_route" {
vpc_id = aws_vpc.cluster_vpc.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.cluster_igw.id
}
tags = {
Name = "ecs-route-table"
}
}
resource "aws_route_table_association" "to-public" {
count = length(aws_subnet.cluster)
subnet_id = aws_subnet.cluster[count.index].id
route_table_id = aws_route_table.public_route.id
}
resource "aws_ecs_cluster" "staging" {
name = "service-ecs-cluster"
}
resource "aws_ecs_service" "staging" {
name = "staging"
cluster = aws_ecs_cluster.staging.id
task_definition = aws_ecs_task_definition.service.arn
desired_count = 1
launch_type = "FARGATE"
network_configuration {
security_groups = [aws_security_group.ecs_tasks.id]
subnets = aws_subnet.cluster[*].id
assign_public_ip = true
}
load_balancer {
target_group_arn = aws_lb_target_group.staging.arn
container_name = var.app_name
container_port = var.container_port
}
resource "aws_lb" "staging" {
name = "alb"
subnets = aws_subnet.cluster[*].id
load_balancer_type = "application"
security_groups = [aws_security_group.lb.id]
access_logs {
bucket = aws_s3_bucket.log_storage.id
prefix = "frontend-alb"
enabled = true
}
tags = {
Environment = "staging"
Application = var.app_name
}
}
... omit like lb_target, or specific components
Alb for Ecs fargate is for routing to another avaliablity zones? or routing to containers
Not really. It is to provide a single, fixed endpoint (url) to your ECS service. The ALB will automatically distribute incoming connection from the internet across your ECS services. They can be in one or multiple AZs. In your case it is only 1 AZ since you are using desired_count = 1. This means that you will have only 1 ECS service in a single AZ.
If I create a subnet based on the availability zone number (us-east-2a, 2b, 2c, so number is 3 and create 3 subnets) and map it to an ecs cluster with alb, does the availability zone apply?
Yes, because your ALB is enabled for the same subnets as your ECS service through aws_subnet.cluster[*].id. But as explained in the first question, you will have only 1 service in one AZ.
my intent is to build infra which has three availability zone and also deploy aws fargate on three availablity zone.
As explained before, your desired_count = 1 so you will not have ECS services across 3 AZs.
Also you are creating only public subnets, while your schematic diagram shows that ECS services should be in private ones.

Terraform error creating subnet dependency

I'm trying to get a documentdb cluster up and running and have it running from within a private subnet I have created.
Running the config below without the depends_on i get the following error message as the subnet hasn't been created:
Error: error creating DocDB cluster: DBSubnetGroupNotFoundFault: DB subnet group 'subnet-0b97a3f5bf6db758f' does not exist.
status code: 404, request id: 59b75d23-50a4-42f9-99a3-367af58e6e16
Added the depends on setup to wait for the subnet to be created but are running into an issue.
cluster_identifier = "my-docdb-cluster"
engine = "docdb"
master_username = "myusername"
master_password = "mypassword"
backup_retention_period = 5
preferred_backup_window = "07:00-09:00"
skip_final_snapshot = true
apply_immediately = true
db_subnet_group_name = aws_subnet.eu-west-3a-private
depends_on = [aws_subnet.eu-west-3a-private]
}
On running terraform apply I an getting an error on the config:
Error: error creating DocDB cluster: DBSubnetGroupNotFoundFault: DB subnet group 'subnet-0b97a3f5bf6db758f' does not exist.
status code: 404, request id: 8b992d86-eb7f-427e-8f69-d05cc13d5b2d
on main.tf line 230, in resource "aws_docdb_cluster" "docdb":
230: resource "aws_docdb_cluster" "docdb"
A DB subnet group is a logical resource in itself that tells AWS where it may schedule a database instance in a VPC. It is not referring to the subnets directly which is what you're trying to do there.
To create a DB subnet group you should use the aws_db_subnet_group resource. You then refer to it by name directly when creating database instances or clusters.
A basic example would look like this:
resource "aws_vpc" "example" {
cidr_block = "10.0.0.0/16"
}
resource "aws_subnet" "eu-west-3a" {
vpc_id = aws_vpc.example.id
availability_zone = "a"
cidr_block = "10.0.1.0/24"
tags = {
AZ = "a"
}
}
resource "aws_subnet" "eu-west-3b" {
vpc_id = aws_vpc.example.id
availability_zone = "b"
cidr_block = "10.0.2.0/24"
tags = {
AZ = "b"
}
}
resource "aws_db_subnet_group" "example" {
name = "main"
subnet_ids = [
aws_subnet.eu-west-3a.id,
aws_subnet.eu-west-3b.id
]
tags = {
Name = "My DB subnet group"
}
}
resource "aws_db_instance" "example" {
allocated_storage = 20
storage_type = "gp2"
engine = "mysql"
engine_version = "5.7"
instance_class = "db.t2.micro"
name = "mydb"
username = "foo"
password = "foobarbaz"
parameter_group_name = "default.mysql5.7"
db_subnet_group_name = aws_db_subnet_group.example.name
}
The same thing applies to Elasticache subnet groups which use the aws_elasticache_subnet_group resource.
It's also worth noting that adding depends_on to a resource that already references the dependent resource via interpolation does nothing. The depends_on meta parameter is for resources that don't expose a parameter that would provide this dependency information directly only.
It seems value in parameter is wrong. db_subnet_group_name created somewhere else gives the output id/arn. So u need to use id value. although depends_on clause looks okie.
db_subnet_group_name = aws_db_subnet_group.eu-west-3a-private.id
So that would be correct/You can try to use arn in place of id.
Thanks,
Ashish

Terraform AWS - How to change default route to an existing route table from nat-gateway to ec2?

I have a private subnet with default route targeting to a nat-gateway. Both were created by terraform.
Now I have another code to raise an ec2 to use as NAT in my VPC (as cloud-nat-gateway become very expensive). I'm trying to change the default route in my rtb to this new ec2 and getting the error below:
Error: Error applying plan:
1 error occurred:
* module.ec2-nat.aws_route.defaultroute_to_ec2-nat: 1 error occurred:
* aws_route.defaultroute_to_ec2-nat: Error creating route: RouteAlreadyExists: The route identified by 0.0.0.0/0 already exists.
status code: 400, request id: 408deb59-d223-4c9f-9a28-209e2e0478e9
I know this route already exists, but how to change this already existing route to a new target? In this case my new ec2 network interface?
Thanks for your help.
Follow are the code I'm using:
#####################
# FIRST TERRAFORM
# create the internet gateway
resource "aws_internet_gateway" "this" {
count = "${var.create_vpc && length(var.public_subnets) > 0 ? 1 : 0}"
vpc_id = "${aws_vpc.this.id}"
tags = "${merge(map("Name", format("%s", var.name)), var.igw_tags, var.tags)}"
}
# Add default route (0.0.0.0/0) to internet gateway
resource "aws_route" "public_internet_gateway" {
count = "${var.create_vpc && length(var.public_subnets) > 0 ? 1 : 0}"
route_table_id = "${aws_route_table.public.id}"
destination_cidr_block = "0.0.0.0/0"
gateway_id = "${aws_internet_gateway.this.id}"
timeouts {
create = "5m"
}
}
#####################
# SECOND TERRAFORM
# Spin EC2 to run as NAT
resource "aws_instance" "ec2-nat" {
count = "${var.instance_qtd}"
ami = "${data.aws_ami.nat.id}"
availability_zone = "${var.region}a"
instance_type = "${var.instance_type}"
key_name = "${var.aws_key_name}"
vpc_security_group_ids = ["${var.sg_ec2}","${var.sg_ops}"]
subnet_id = "${var.public_subnet_id}"
iam_instance_profile = "${var.iam_instance_profile}"
associate_public_ip_address = true
source_dest_check = false
tags = {
Name = "ec2-nat-${var.brand}-${var.role}-${count.index}"
Brand = "${var.brand}"
Role = "${var.role}"
Type = "ec2-nat"
}
}
# Add default route (0.0.0.0/0) to aws_instance.ec2-nat
variable "default_route" {
default = "0.0.0.0/0"
}
resource "aws_route" "defaultroute_to_ec2-nat" {
route_table_id = "${var.private_route_id}"
destination_cidr_block = "${var.default_route}"
instance_id = "${element(aws_instance.ec2-nat.*.id, 0)}"
}

Terraform - Creating resources in one transaction / setting rollback policies

I'm using Terraform with AWS as a provider.
In one of my networks I accidentally configured wrong values which led to
failure in resources creation.
So the situation was that some parts of the resources were up and running,
but I would prefer that the all process was executed as one transaction.
I'm familiar with the output the Terraform gives in such cases:
Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with any
resources that successfully completed. Please address the error above
and apply again to incrementally change your infrastructure.
My question is: Is there still a way to setup a rollback policy in cases that some resources where created and some failed?
Below is a simple example to reproduce the problem.
In the local variable 'az_list' just the change value from 'names' to 'zone_ids':
az_list = "${data.aws_availability_zones.available.zone_ids}"
And a VPC will be created with some default security groups and Route tables but without subnets.
resources.tf:
provider "aws" {
region = "${var.region}"
}
### Local data ###
data "aws_availability_zones" "available" {}
locals {
#In order to reproduce an error: Change 'names' to 'zone_ids'
az_list = "${data.aws_availability_zones.available.names}"
}
### Vpc ###
resource "aws_vpc" "base_vpc" {
cidr_block = "${var.cidr}"
instance_tenancy = "default"
enable_dns_hostnames = "false"
enable_dns_support = "true"
}
### Subnets ###
resource "aws_subnet" "private" {
vpc_id = "${aws_vpc.base_vpc.id}"
cidr_block = "${cidrsubnet( var.cidr, 8, count.index + 1 + length(local.az_list) )}"
availability_zone = "${element(local.az_list, count.index)}"
count = 2
}
resource "aws_subnet" "public" {
vpc_id = "${aws_vpc.base_vpc.id}"
cidr_block = "${cidrsubnet(var.cidr, 8, count.index + 1)}"
availability_zone = "${element(local.az_list, count.index)}"
count = 2
map_public_ip_on_launch = true
}
variables.tf:
variable "region" {
description = "Name of region"
default = "ap-south-1"
}
variable "cidr" {
description = "The CIDR block for the VPC"
default = "10.0.0.0/16"
}