ECR VPC endpoint for fargate not working as expected - amazon-web-services

I am using Fargate for a task that runs every hour. As the docker image size is 1.5go, I want to use a ECR VPC endpoint to optimize the AWS data transfer fee.
The fargate tasks run in a private subnet. The route table of one of the private subnet is the following (where eigw is eigress only internet gateway and nat-01 is a nat gateway in the public subnet):
Destination Target
10.50.0.0/16 local
0.0.0.0/0 nat-01ec80c2754229321
2a05:d012:43e:de00::/56 local
::/0 eigw-0a0c583a8390d5736
Expected behavior: Right now, the Fargate task takes around 1 minute to start, due to the time it takes to docker pull the image. I expect that with the ECR VPC endpoint, the time it takes to start would go down.
Actual behavior: There is not a single second difference, which means I probably did something wrong!
My terraform setup:
The VPC, subnets and route tables:
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 2.21.0"
name = "dev-vpc"
cidr = "10.10.0.0/16"
azs = ["eu-west-3a", "eu-west-3b"]
private_subnets = ["10.10.0.0/20", "10.10.32.0/20"]
public_subnets = ["10.10.128.0/20", "10.10.160.0/20"]
enable_nat_gateway = true
single_nat_gateway = true
reuse_nat_ips = false
enable_vpn_gateway = false
enable_dns_hostnames = true
create_database_subnet_group = true
enable_ipv6 = true
assign_ipv6_address_on_creation = true
private_subnet_assign_ipv6_address_on_creation = false
public_subnet_ipv6_prefixes = [0, 1]
private_subnet_ipv6_prefixes = [2, 3]
database_subnet_ipv6_prefixes = [4, 5]
database_subnets = ["10.10.64.0/20", "10.10.80.0/20"]
tags = {
ManagedByTerraform = "true"
EnvironmentType = "dev"
}
}
# the SG for the VPC endpoints
resource "aws_security_group" "vpce" {
name = "dev-vpce-sg"
vpc_id = module.vpc.vpc_id
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = [module.vpc.vpc_cidr_block]
}
tags = {
Environment = "dev"
}
}
# all the VPC endpoints needed (from AWS documentation)
resource "aws_vpc_endpoint" "ecr_endpoint" {
vpc_id = module.vpc.vpc_id
private_dns_enabled = true
service_name = "com.amazonaws.eu-west-3.ecr.dkr"
vpc_endpoint_type = "Interface"
security_group_ids = [
aws_security_group.vpce.id,
]
subnet_ids = module.vpc.private_subnets
tags = {
Name = "dkr-endpoint"
Environment = "dev"
}
}
resource "aws_vpc_endpoint" "ecr_api_endpoint" {
vpc_id = module.vpc.vpc_id
private_dns_enabled = true
service_name = "com.amazonaws.eu-west-3.ecr.api"
vpc_endpoint_type = "Interface"
security_group_ids = [
aws_security_group.vpce.id,
]
subnet_ids = module.vpc.private_subnets
tags = {
Name = "ecr-api-endpoint"
Environment = "dev"
}
}
resource "aws_vpc_endpoint" "s3" {
vpc_id = module.vpc.vpc_id
service_name = "com.amazonaws.eu-west-3.s3"
vpc_endpoint_type = "Gateway"
route_table_ids = module.vpc.private_route_table_ids
policy = <<-EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": "*",
"Action": [
"s3:PutObjectAcl",
"s3:PutObject",
"s3:ListBucket",
"s3:GetObject",
"s3:Delete*"
],
"Resource": [
"arn:aws:s3:::prod-eu-west-3-starport-layer-bucket",
"arn:aws:s3:::prod-eu-west-3-starport-layer-bucket/*"
]
}
]
}
EOF
tags = {
Name = "s3-endpoint"
Environment = "dev"
}
}
resource "aws_vpc_endpoint" "logs" {
vpc_id = module.vpc.vpc_id
private_dns_enabled = true
service_name = "com.amazonaws.eu-west-3.logs"
vpc_endpoint_type = "Interface"
security_group_ids = [
aws_security_group.vpce.id,
]
subnet_ids = module.vpc.private_subnets
tags = {
Name = "logs-endpoint"
Environment = "dev"
}
}
resource "aws_vpc_endpoint" "ecs_agent" {
vpc_id = module.vpc.vpc_id
private_dns_enabled = true
service_name = "com.amazonaws.eu-west-3.ecs-agent"
vpc_endpoint_type = "Interface"
security_group_ids = [
aws_security_group.vpce.id,
]
subnet_ids = module.vpc.private_subnets
tags = {
Name = "ecs-agent"
Environment = "dev"
}
}
resource "aws_vpc_endpoint" "ecs_telemetry" {
vpc_id = module.vpc.vpc_id
private_dns_enabled = true
service_name = "com.amazonaws.eu-west-3.ecs-telemetry"
vpc_endpoint_type = "Interface"
security_group_ids = [
aws_security_group.vpce.id,
]
subnet_ids = module.vpc.private_subnets
tags = {
Name = "telemetry"
Environment = "dev"
}
}
resource "aws_vpc_endpoint" "ecs_endpoint" {
vpc_id = module.vpc.vpc_id
private_dns_enabled = true
service_name = "com.amazonaws.eu-west-3.ecs"
vpc_endpoint_type = "Interface"
security_group_ids = [
aws_security_group.vpce.id,
]
subnet_ids = module.vpc.private_subnets
tags = {
Name = "ecs-endpoint"
Environment = "dev"
}
}
Can you let me know what can be wrong in my setup?
I have zero knowledge in network engineering, so please let me know if you need further information.

Related

I'm struggling to deploy my EKS node/node-group using terraform

I was getting this error first
NodeCreationFailure: Instances failed to
join the kubernetes cluster
and I didn't have my private subnets tagged right. I found examples online where they tagged their vpc and subnet a certain way, so I copied that and now I'm getting this error
Error: Cycle: aws_eks_cluster.eks, aws_subnet.private_subnet
This is frustrating, but here's my main.tf file condensed to all of the relevant resource blocks. This is my entire vpc section, since I feel like it could be anything in here based off other posts. Also for context, I'm trying to deploy the cluster inside private subnets.
resource "aws_vpc" "vpc" {
cidr_block = "10.1.0.0/16"
tags = {
"kubernetes.io/cluster/${var.cluster_name}" = "shared"
}
}
resource "aws_subnet" "public_subnet" {
count = length(var.azs)
vpc_id = aws_vpc.vpc.id
cidr_block = var.public_cidrs[count.index]
availability_zone = var.azs[count.index]
map_public_ip_on_launch = true
tags = {
Name = "${var.name}-public-subnet-${count.index + 1}"
}
}
resource "aws_subnet" "private_subnet" {
count = length(var.azs)
vpc_id = aws_vpc.vpc.id
cidr_block = var.private_cidrs[count.index]
availability_zone = var.azs[count.index]
map_public_ip_on_launch = false
tags = {
"kubernetes.io/cluster/${aws_eks_cluster.eks.name}" = "shared"
"kubernetes.io/role/internal-elb" = "1"
}
}
resource "aws_internet_gateway" "internet_gateway" {
vpc_id = aws_vpc.vpc.id
tags = {
Name = "${var.name}-internet-gateway"
}
}
resource "aws_route_table" "public_rt" {
vpc_id = aws_vpc.vpc.id
tags = {
Name = "${var.name}-public-rt"
}
}
resource "aws_route" "default_route" {
route_table_id = aws_route_table.public_rt.id
destination_cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.internet_gateway.id
}
resource "aws_route_table_association" "public_assoc" {
count = length(var.public_cidrs)
subnet_id = aws_subnet.public_subnet[count.index].id
route_table_id = aws_route_table.public_rt.id
}
resource "aws_eip" "nat_eip" {
count = length(var.public_cidrs)
vpc = true
depends_on = [aws_internet_gateway.internet_gateway]
tags = {
Name = "${var.name}-nat-eip-${count.index + 1}"
}
}
resource "aws_nat_gateway" "nat_gateway" {
count = length(var.public_cidrs)
allocation_id = aws_eip.nat_eip[count.index].id
subnet_id = aws_subnet.public_subnet[count.index].id
depends_on = [aws_internet_gateway.internet_gateway]
tags = {
Name = "${var.name}-NAT-gateway-${count.index + 1}"
}
}
Here's all of my source blocks related to my cluster and nodes
resource "aws_iam_role" "eks_cluster" {
name = "${var.name}-eks-cluster-role"
assume_role_policy = <<POLICY
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "eks.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
POLICY
}
resource "aws_iam_role_policy_attachment" "amazon_eks_cluster_policy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
role = aws_iam_role.eks_cluster.name
}
resource "aws_eks_cluster" "eks" {
name = var.cluster_name
role_arn = aws_iam_role.eks_cluster.arn
## k8s Version
version = var.k8s_version
vpc_config {
endpoint_private_access = true
endpoint_public_access = false
subnet_ids = [
aws_subnet.private_subnet[0].id,
aws_subnet.private_subnet[1].id,
aws_subnet.private_subnet[2].id,
]
}
depends_on = [
aws_iam_role_policy_attachment.amazon_eks_cluster_policy
]
}
resource "aws_iam_role" "nodes_eks" {
name = "role-node-group-eks"
assume_role_policy = <<POLICY
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
POLICY
}
resource "aws_iam_role_policy_attachment" "amazon_eks_worker_node_policy_eks" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
role = aws_iam_role.nodes_eks.name
}
resource "aws_iam_role_policy_attachment" "amazon_eks_cni_policy_eks" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
role = aws_iam_role.nodes_eks.name
}
resource "aws_iam_role_policy_attachment" "amazon_ec2_container_registry_read_only" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
role = aws_iam_role.nodes_eks.name
}
resource "aws_eks_node_group" "nodes_eks" {
cluster_name = aws_eks_cluster.eks.name
node_group_name = "${var.name}-node-group"
node_role_arn = aws_iam_role.nodes_eks.arn
subnet_ids = [
aws_subnet.private_subnet[0].id,
aws_subnet.private_subnet[1].id,
aws_subnet.private_subnet[2].id,
]
remote_access {
ec2_ssh_key = aws_key_pair.bastion_auth.id
}
scaling_config {
desired_size = 3
max_size = 6
min_size = 3
}
ami_type = "AL2_x86_64"
capacity_type = "ON_DEMAND"
disk_size = 20
instance_types = [var.instance_type]
labels = {
role = "nodes-group-1"
}
version = var.k8s_version
depends_on = [
aws_iam_role_policy_attachment.amazon_eks_worker_node_policy_eks,
aws_iam_role_policy_attachment.amazon_eks_cni_policy_eks,
aws_iam_role_policy_attachment.amazon_ec2_container_registry_read_only,
]
}
In the private-subnet resource, you are referencing your EKS cluster in the tag: ${aws_eks_cluster.eks.name}, which creates a dependency for this resource on EKS cluster.
resource "aws_subnet" "private_subnet" {
count = length(var.azs)
vpc_id = aws_vpc.vpc.id
cidr_block = var.private_cidrs[count.index]
availability_zone = var.azs[count.index]
map_public_ip_on_launch = false
tags = {
"kubernetes.io/cluster/${aws_eks_cluster.eks.name}" = "shared" <- this creates dependency
"kubernetes.io/role/internal-elb" = "1"
}
}
On the other side, you are referencing the same private subnet, when you create the EKS cluster, which now creates a dependency for this resource on the private subnet.
resource "aws_eks_cluster" "eks" {
name = var.cluster_name
role_arn = aws_iam_role.eks_cluster.arn
## k8s Version
version = var.k8s_version
vpc_config {
endpoint_private_access = true
endpoint_public_access = false
subnet_ids = [
aws_subnet.private_subnet[0].id, <- this creates dependency
aws_subnet.private_subnet[1].id, <- this creates dependency
aws_subnet.private_subnet[2].id, <- this creates dependency
]
}
depends_on = [
aws_iam_role_policy_attachment.amazon_eks_cluster_policy
]
}
And in a result, you get a dependency cycle that causes your error.
To solve it, update the tag for the private subnet to:
tags = {
"kubernetes.io/cluster/${var.cluster_name}" = "shared"
"kubernetes.io/role/internal-elb" = "1"
}

Error creating EKS node-group with terraform

While I am trying to deploy EKS via Terraform, I am facing an error with node-group creation.
I am getting the following error:
Error: error waiting for EKS Node Group (Self-Hosted-Runner:Self-Hosted-Runner-default-node-group) to create:
unexpected state 'CREATE_FAILED', wanted target 'ACTIVE'.
last error: 1 error occurred:i-04db15f25be4212fb, i-07bd88adabaa103c0, i-0915982ac0f217fe4:
NodeCreationFailure: Instances failed to join the kubernetes cluster.
with module.eks.aws_eks_node_group.eks-node-group,
│ on ../../modules/aws/eks/eks-node-group.tf line 1, in resource "aws_eks_node_group" "eks-node-group":
│ 1: resource "aws_eks_node_group" "eks-node-group" {
EKS
# EKS Cluster Resources
resource "aws_eks_cluster" "eks" {
name = var.cluster-name
version = var.k8s-version
role_arn = aws_iam_role.cluster.arn
vpc_config {
security_group_ids = [var.security_group]
subnet_ids = var.private_subnets
}
enabled_cluster_log_types = var.eks-cw-logging
depends_on = [
aws_iam_role_policy_attachment.cluster-AmazonEKSClusterPolicy,
aws_iam_role_policy_attachment.cluster-AmazonEKSServicePolicy,
]
}
EKS-NODE-GROUP
resource "aws_eks_node_group" "eks-node-group" {
cluster_name = var.cluster-name
node_group_name = "${var.cluster-name}-default-node-group"
node_role_arn = aws_iam_role.node.arn
subnet_ids = var.private_subnets
capacity_type = "SPOT"
node_group_name_prefix = null #"Creates a unique name beginning with the specified prefix. Conflicts with node_group_name"
scaling_config {
desired_size = var.desired-capacity
max_size = var.max-size
min_size = var.min-size
}
update_config {
max_unavailable = 1
}
instance_types = [var.node-instance-type]
# Ensure that IAM Role permissions are created before and deleted after EKS Node Group handling.
# Otherwise, EKS will not be able to properly delete EC2 Instances and Elastic Network Interfaces.
depends_on = [
aws_eks_cluster.eks,
aws_iam_role_policy_attachment.node-AmazonEKSWorkerNodePolicy,
aws_iam_role_policy_attachment.node-AmazonEKS_CNI_Policy
]
tags = {
Name = "${var.cluster-name}-default-node-group"
}
}
IAM
# IAM
# CLUSTER
resource "aws_iam_role" "cluster" {
name = "${var.cluster-name}-eks-cluster-role"
assume_role_policy = <<POLICY
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "eks.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
POLICY
}
resource "aws_iam_role_policy_attachment" "cluster-AmazonEKSClusterPolicy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
role = aws_iam_role.cluster.name
}
resource "aws_iam_role_policy_attachment" "cluster-AmazonEKSServicePolicy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSServicePolicy"
role = aws_iam_role.cluster.name
}
# NODES
resource "aws_iam_role" "node" {
name = "${var.cluster-name}-eks-node-role"
assume_role_policy = <<POLICY
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
POLICY
}
resource "aws_iam_role_policy_attachment" "node-AmazonEKSWorkerNodePolicy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
role = aws_iam_role.node.name
}
resource "aws_iam_role_policy_attachment" "node-AmazonEKS_CNI_Policy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
role = aws_iam_role.node.name
}
resource "aws_iam_role_policy_attachment" "node-AmazonEC2ContainerRegistryReadOnly" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
role = aws_iam_role.node.name
}
resource "aws_iam_instance_profile" "node" {
name = "${var.cluster-name}-eks-node-instance-profile"
role = aws_iam_role.node.name
}
Security Group
# Create Security Group
resource "aws_security_group" "cluster" {
name = "terraform_cluster"
description = "AWS security group for terraform"
vpc_id = aws_vpc.vpc1.id
# Input
ingress {
from_port = "1"
to_port = "65365"
protocol = "TCP"
cidr_blocks = [var.address_allowed, var.vpc1_cidr_block]
}
# Output
egress {
from_port = 0 # any port
to_port = 0 # any port
protocol = "-1" # any protocol
cidr_blocks = ["0.0.0.0/0"] # any destination
}
# ICMP Ping
ingress {
from_port = -1
to_port = -1
protocol = "icmp"
cidr_blocks = [var.address_allowed, var.vpc1_cidr_block]
}
tags = merge(
{
Name = "onboarding-sg",
},
var.tags,
)
}
VPC
# Create VPC
resource "aws_vpc" "vpc1" {
cidr_block = var.vpc1_cidr_block
instance_tenancy = "default"
enable_dns_support = true
enable_dns_hostnames = true
tags = merge(
{
Name = "onboarding-vpc",
},
var.tags,
)
}
# Subnet Public
resource "aws_subnet" "subnet_public1" {
vpc_id = aws_vpc.vpc1.id
cidr_block = var.subnet_public1_cidr_block[0]
map_public_ip_on_launch = "true" #it makes this a public subnet
availability_zone = data.aws_availability_zones.available.names[0]
tags = merge(
{
Name = "onboarding-public-sub",
"kubernetes.io/role/elb" = "1"
},
var.tags,
)
}
# Subnet Private
resource "aws_subnet" "subnet_private1" {
for_each = { for idx, cidr_block in var.subnet_private1_cidr_block: cidr_block => idx}
vpc_id = aws_vpc.vpc1.id
cidr_block = each.key
map_public_ip_on_launch = "false" //it makes this a public subnet
availability_zone = data.aws_availability_zones.available.names[each.value]
tags = merge(
{
Name = "onboarding-private-sub",
"kubernetes.io/role/internal-elb" = "1",
"kubernetes.io/cluster/${var.cluster-name}" = "owned"
},
var.tags,
)
}
tfvars
#General vars
region = "eu-west-1"
#Bucket vars
bucket = "tf-state"
tag_name = "test"
tag_environment = "Dev"
acl = "private"
versioning_enabled = "Enabled"
# Network EKS vars
aws_public_key_path = "~/.ssh/id_rsa.pub"
aws_key_name = "aws-k8s"
address_allowed = "/32" # Office public IP Address
vpc1_cidr_block = "10.0.0.0/16"
subnet_public1_cidr_block = ["10.0.128.0/20", "10.0.144.0/20", "10.0.160.0/20"]
subnet_private1_cidr_block = ["10.0.0.0/19", "10.0.32.0/19", "10.0.64.0/19"]
tags = {
Scost = "testing",
Terraform = "true",
Environment = "testing"
}
#EKS
cluster-name = "Self-Hosted-Runner"
k8s-version = "1.21"
node-instance-type = "t3.medium"
desired-capacity = "3"
max-size = "7"
min-size = "1"
# db-subnet-cidr = ["10.0.192.0/21", "10.0.200.0/21", "10.0.208.0/21"]
eks-cw-logging = ["api", "audit", "authenticator", "controllerManager", "scheduler"]
ec2-key-public-key = ""
"issues" : [ {
"code" : "NodeCreationFailure",
"message" : "Instances failed to join the kubernetes cluster",
What do you think I missed configured?

Terraform AWS LB healthcheck failed

I have a terraform following code that's configuring me a gateway service on AWS ECS Fargate. Services that are not under load balancer which are in private network work as expected however gateway with added LB is failing it's health check and every 2-3 minute is deprovisioning and provisioning new task. Docker file is exposing a service on port 3000.
Here's a terraform plan that is failing
locals {
gateway_version = "1.0.0"
gateway_port = 3000
}
## VPC
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "3.11.0"
name = "${var.env}-vpc"
cidr = "20.0.0.0/16"
enable_ipv6 = true
azs = ["eu-central-1a", "eu-central-1b"]
public_subnets = ["20.0.1.0/24", "20.0.2.0/24"]
private_subnets = ["20.0.86.0/24", "20.0.172.0/24"]
elasticache_subnets = ["20.0.31.0/24", "20.0.32.0/24"]
enable_nat_gateway = true
single_nat_gateway = true
tags = {
Terraform = "true"
}
}
## Security Groups
module "sg" {
source = "terraform-aws-modules/security-group/aws"
version = "~> 4.0"
name = "${var.env}-sg-default"
description = "Default service security group"
vpc_id = module.vpc.vpc_id
ingress_cidr_blocks = ["0.0.0.0/0"]
ingress_rules = [
"all-icmp",
"http-80-tcp",
"https-443-tcp",
"mysql-tcp",
"rabbitmq-4369-tcp",
"rabbitmq-5671-tcp",
"rabbitmq-5672-tcp",
"rabbitmq-15672-tcp",
"rabbitmq-25672-tcp",
"redis-tcp"
]
egress_rules = ["all-all"]
}
module "security_group" {
source = "terraform-aws-modules/security-group/aws"
version = "~> 4.0"
name = "${var.env}-sg-lb"
description = "Security group for ALB"
vpc_id = module.vpc.vpc_id
ingress_cidr_blocks = ["0.0.0.0/0"]
ingress_rules = ["http-80-tcp", "all-icmp"]
egress_rules = ["all-all"]
}
resource "aws_security_group" "service_security_group" {
name = "${var.env}-lb-connection"
ingress {
from_port = 0
to_port = 0
protocol = "-1"
# Only allowing traffic in from the load balancer security group
security_groups = [module.security_group.security_group_id]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
vpc_id = module.vpc.vpc_id
}
## ECS Cluster
resource "aws_ecs_cluster" "default" {
name = "${var.env}-cluster"
}
## ECR
data "aws_ecr_repository" "gateway_ecr" {
name = "gateway-${var.env}"
}
## ECS Task Definition
resource "aws_ecs_task_definition" "gateway_task" {
family = "${var.env}-gateway-task"
container_definitions = <<DEFINITION
[
{
"name": "${var.env}-gateway-task",
"image": "${data.aws_ecr_repository.gateway_ecr.repository_url}:${local.gateway_version}",
"networkMode": "awsvpc",
"essential": true,
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "${aws_cloudwatch_log_group.gateway_logs.name}",
"awslogs-stream-prefix": "ecs",
"awslogs-region": "${var.aws-region}"
}
},
"portMappings": [
{
"containerPort": ${local.gateway_port},
"hostPort": ${local.gateway_port}
}
],
"environment": [
{
"name": "AWS_REGION",
"value": "${var.aws-region}"
},
{
"name": "PORT",
"value": "${local.gateway_port}"
},
{
"name": "STAGE",
"value": "${var.env}"
},
{
"name": "NODE_ENV",
"value": "development"
},
{
"name": "VERSION",
"value": "${local.gateway_version}"
}
],
"memory": 512,
"cpu": 256
}
]
DEFINITION
requires_compatibilities = ["FARGATE"]
network_mode = "awsvpc"
memory = 512
cpu = 256
task_role_arn = aws_iam_role.gateway_task_definition_role.arn
execution_role_arn = aws_iam_role.gateway_task_execution_role.arn
}
## ECS Service
resource "aws_ecs_service" "gateway_service" {
name = "${var.env}-gateway-service"
cluster = aws_ecs_cluster.default.id
task_definition = aws_ecs_task_definition.gateway_task.arn
launch_type = "FARGATE"
desired_count = 1
force_new_deployment = true
network_configuration {
subnets = concat(
module.vpc.public_subnets,
module.vpc.private_subnets,
)
security_groups = [
module.sg.security_group_id,
aws_security_group.service_security_group.id
]
assign_public_ip = true
}
lifecycle {
ignore_changes = [desired_count]
}
load_balancer {
target_group_arn = aws_lb_target_group.target_group.arn
container_name = aws_ecs_task_definition.gateway_task.family
container_port = local.gateway_port
}
}
## Cloudwatch Log Group
resource "aws_cloudwatch_log_group" "gateway_logs" {
name = "${var.env}-gateway-log-group"
tags = {
Name = "${var.env}-gateway-log-group"
}
}
## IAM Roles
resource "aws_iam_role" "gateway_task_definition_role" {
name = "${var.env}-gateway-task-definition-role"
assume_role_policy = data.aws_iam_policy_document.gateway_assume_role_policy.json
tags = {
Name = "${var.env}-gateway-task-definition-role"
}
}
resource "aws_iam_role" "gateway_task_execution_role" {
name = "${var.env}-gateway-task-execution-role"
assume_role_policy = data.aws_iam_policy_document.gateway_assume_role_policy.json
tags = {
Name = "${var.env}-gateway-task-execution-role"
}
}
data "aws_iam_policy_document" "gateway_assume_role_policy" {
statement {
effect = "Allow"
actions = ["sts:AssumeRole"]
principals {
type = "Service"
identifiers = ["ecs-tasks.amazonaws.com"]
}
}
}
resource "aws_iam_role_policy" "gateway_exec" {
name = "${var.env}-gateway-execution-role-policy"
role = aws_iam_role.gateway_task_execution_role.id
policy = data.aws_iam_policy_document.gateway_exec_policy.json
}
data "aws_iam_policy_document" "gateway_exec_policy" {
statement {
effect = "Allow"
resources = ["*"]
actions = [
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage",
"logs:CreateLogStream",
"logs:PutLogEvents",
]
}
}
## ALB
resource "aws_lb" "alb" {
name = "${var.env}-lb"
load_balancer_type = "application"
subnets = module.vpc.public_subnets
security_groups = [module.security_group.security_group_id]
}
resource "aws_lb_target_group" "target_group" {
name = "target-group"
port = 80
protocol = "HTTP"
target_type = "ip"
vpc_id = module.vpc.vpc_id
health_check {
matcher = "200,301,302"
path = "/health"
interval = 120
timeout = 30
}
}
resource "aws_lb_listener" "listener" {
load_balancer_arn = aws_alb.alb.arn
port = 80
protocol = "HTTP"
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.target_group.arn
}
}
That's the error
Task failed ELB health checks in (target-group arn:aws:elasticloadbalancing:eu-central-1:129228585726:targetgroup/target-group/5853904c0d3ad322)
After it's deployed I see that a ECS service is started and it's working there however I don't see any requests to check it's health
Your target group uses port = 80, but your ECS task definition specifies port 3000. So this is likely reason why your ALB can't connect to your containers.
The load balancer tries to check if it is able to reach the application on the specified target port. In your case it is 3000.
Replace your target group resource to use the application port for LB healthchecks to pass.
resource "aws_lb_target_group" "target_group" {
name = "target-group"
port = 3000
protocol = "HTTP"
target_type = "ip"
vpc_id = module.vpc.vpc_id
health_check {
matcher = "200,301,302"
path = "/health"
interval = 120
timeout = 30
}
}
Target group was not an issue -> the issue was wrong security_group which didn't allowed to hit port 3000

Impossible to SSH to EC2 instance and unable to place ECS task

Given the following terraform.tf file:
provider "aws" {
profile = "default"
region = "us-east-1"
}
locals {
vpc_name = "some-vpc-name"
dev_vpn_source = "*.*.*.*/32" # Insted of * I have a CIDR block of our VPN here
}
resource "aws_vpc" "vpc" {
cidr_block = "10.0.0.0/16"
enable_dns_hostnames = true
tags = {
Name: local.vpc_name
}
}
resource "aws_subnet" "a" {
cidr_block = "10.0.0.0/17"
vpc_id = aws_vpc.vpc.id
tags = {
Name: "${local.vpc_name}-a"
}
}
resource "aws_subnet" "b" {
cidr_block = "10.0.128.0/17"
vpc_id = aws_vpc.vpc.id
tags = {
Name: "${local.vpc_name}-b"
}
}
resource "aws_security_group" "ssh" {
name = "${local.vpc_name}-ssh"
vpc_id = aws_vpc.vpc.id
tags = {
Name: "${local.vpc_name}-ssh"
}
}
resource "aws_security_group_rule" "ingress-ssh" {
from_port = 22
protocol = "ssh"
security_group_id = aws_security_group.ssh.id
to_port = 22
type = "ingress"
cidr_blocks = [local.dev_vpn_source]
description = "SSH access for developer"
}
resource "aws_security_group" "outbound" {
name = "${local.vpc_name}-outbound"
vpc_id = aws_vpc.vpc.id
tags = {
Name: "${local.vpc_name}-outbound"
}
}
resource "aws_security_group_rule" "egress" {
from_port = 0
protocol = "all"
security_group_id = aws_security_group.outbound.id
to_port = 65535
type = "egress"
cidr_blocks = ["0.0.0.0/0"]
description = "All outbound allowed"
}
module "ecs-clusters" {
source = "./ecs-clusters/"
subnets = [aws_subnet.a, aws_subnet.b]
vpc_name = local.vpc_name
security_groups = [aws_security_group.ssh, aws_security_group.outbound]
}
And the following ecs-clusters/ecs-cluster.tf file:
variable "vpc_name" {
type = string
}
variable "subnets" {
type = list(object({
id: string
}))
}
variable "security_groups" {
type = list(object({
id: string
}))
}
data "aws_ami" "amazon_linux_ecs" {
most_recent = true
owners = ["amazon"]
filter {
name = "name"
values = ["amzn2-ami-ecs*"]
}
}
resource "aws_iam_instance_profile" "ecs-launch-profile" {
name = "${var.vpc_name}-ecs"
role = "ecsInstanceRole"
}
resource "aws_launch_template" "ecs" {
name = "${var.vpc_name}-ecs"
image_id = data.aws_ami.amazon_linux_ecs.id
instance_type = "r5.4xlarge"
key_name = "some-ssh-key-name"
iam_instance_profile {
name = "${var.vpc_name}-ecs"
}
block_device_mappings {
device_name = "/dev/xvda"
ebs {
volume_type = "gp3"
volume_size = 1024
delete_on_termination = false
}
}
network_interfaces {
associate_public_ip_address = true
subnet_id = var.subnets[0].id
security_groups = var.security_groups[*].id
}
update_default_version = true
}
resource "aws_autoscaling_group" "ecs-autoscaling_group" {
name = "${var.vpc_name}-ecs"
vpc_zone_identifier = [for subnet in var.subnets: subnet.id]
desired_capacity = 1
max_size = 1
min_size = 1
protect_from_scale_in = true
launch_template {
id = aws_launch_template.ecs.id
version = aws_launch_template.ecs.latest_version
}
tag {
key = "Name"
propagate_at_launch = true
value = "${var.vpc_name}-ecs"
}
depends_on = [aws_launch_template.ecs]
}
resource "aws_ecs_capacity_provider" "ecs-capacity-provider" {
name = var.vpc_name
auto_scaling_group_provider {
auto_scaling_group_arn = aws_autoscaling_group.ecs-autoscaling_group.arn
managed_termination_protection = "ENABLED"
managed_scaling {
maximum_scaling_step_size = 1
minimum_scaling_step_size = 1
status = "ENABLED"
target_capacity = 1
}
}
depends_on = [aws_autoscaling_group.ecs-autoscaling_group]
}
resource "aws_ecs_cluster" "ecs-cluster" {
name = var.vpc_name
capacity_providers = [aws_ecs_capacity_provider.ecs-capacity-provider.name]
depends_on = [aws_ecs_capacity_provider.ecs-capacity-provider]
}
resource "aws_iam_role" "ecs-execution" {
name = "${var.vpc_name}-ecs-execution"
assume_role_policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Action": "sts:AssumeRole",
"Principal": {
"Service": "ecs-tasks.amazonaws.com"
},
"Effect": "Allow",
"Sid": ""
}
]
}
EOF
}
resource "aws_iam_role" "ecs" {
name = "${var.vpc_name}-ecs"
assume_role_policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Action": "sts:AssumeRole",
"Principal": {
"Service": "ecs-tasks.amazonaws.com"
},
"Effect": "Allow",
"Sid": ""
}
]
}
EOF
}
resource "aws_iam_role_policy_attachment" "execution-role" {
role = aws_iam_role.ecs-execution.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}
resource "aws_iam_role_policy_attachment" "role" {
role = aws_iam_role.ecs.name
policy_arn = "arn:aws:iam::aws:policy/AmazonS3FullAccess"
}
I'm facing two problems:
I can't SSH into EC2 instance created by the autoscaling group, despite the fact that I'm using the same SSH key and VPN to access other EC2 instances. My VPN client config includes route to the target machine via VPN gateway.
I can't execute task on the ESC cluster. The task gets stuck in provisioning status and then fails with "Unable to run task". The task is configured to use 1 GB of RAM and 1 vCPU.
What am I doing wrong?
Based on the comments.
There were two issues with the original setup:
Lack of connectivity to ECS and ECR services, which was solved by enabling internet access in the VPC. It is also possible to use VPC interface endpoints for ECS, ECR and S3, if the internet access is not desired.
Container instances did not register with ECS. This was fixed by using user_data to bootstrap ECS instances so that they can register with the ECS cluster.

Cannot create an Elastic Beanstalk application in a custom VPC using Terraform

I am trying to create an Elastic Beanstalk application in a private subnet of a custom VPC using Terraform. The problem I have is that the creation hits a time-out.
Error:
Error waiting for Elastic Beanstalk Environment (e-xxxxxxxxx) to become ready: 2 errors occurred:
2021-01-22 16:37:56.664 +0000 UTC (e-xxxxxxxxx) : Stack named 'awseb-e-xxxxxxxxx-stack' aborted operation. Current state: 'CREATE_FAILED' Reason: The following resource(s) failed to create: [AWSEBInstanceLaunchWaitCondition].
2021-01-22 16:37:56.791 +0000 UTC (e-xxxxxxxxx) : The EC2 instances failed to communicate with AWS Elastic Beanstalk, either because of configuration problems with the VPC or a failed EC2 instance. Check your VPC configuration and try launching the environment again.
I think I am missing a connection from the VPC to the outside, but I'm not sure what.
I have tried to add some aws_vpc_endpoint resources:
resource "aws_vpc" "main" {
cidr_block = "10.16.0.0/16"
enable_dns_support = true
enable_dns_hostnames = true
}
resource "aws_security_group" "elasticbeanstalk_vpc_endpoint" {
name = "elasticbeanstalk-vpc-endpoint"
vpc_id = aws_vpc.main.id
ingress {
from_port = 0
to_port = 65535
protocol = "tcp"
cidr_blocks = [ "0.0.0.0/0" ]
}
egress {
from_port = 0
to_port = 65535
protocol = "tcp"
cidr_blocks = [ "0.0.0.0/0" ]
}
}
resource "aws_vpc_endpoint" "elasticbeanstalk" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.${var.aws_region}.elasticbeanstalk"
security_group_ids = [
aws_security_group.elasticbeanstalk_vpc_endpoint.id,
]
vpc_endpoint_type = "Interface"
}
resource "aws_security_group" "ec2_vpc_endpoint" {
name = "ec2-vpc-endpoint"
vpc_id = aws_vpc.main.id
ingress {
from_port = 0
to_port = 65535
protocol = "tcp"
cidr_blocks = [ "0.0.0.0/0" ]
}
egress {
from_port = 0
to_port = 65535
protocol = "tcp"
cidr_blocks = [ "0.0.0.0/0" ]
}
}
resource "aws_vpc_endpoint" "ec2" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.${var.aws_region}.ec2"
security_group_ids = [
aws_security_group.ec2_vpc_endpoint.id,
]
vpc_endpoint_type = "Interface"
}
EB instance:
resource "aws_elastic_beanstalk_environment" "endpoint" {
name = "foo-env"
setting {
namespace = "aws:ec2:vpc"
name = "VPCId"
value = aws_vpc.main.vpc_id
resource = ""
}
setting {
namespace = "aws:ec2:vpc"
name = "Subnets"
value = join(",", [ aws_subnet.private_a.id, aws_subnet.private_b.id ])
resource = ""
}
setting {
namespace = "aws:ec2:vpc"
name = "ELBScheme"
value = "internal"
resource = ""
}
setting {
namespace = "aws:elasticbeanstalk:environment"
name = "EnvironmentType"
value = "LoadBalanced"
resource = ""
}
setting {
namespace = "aws:elasticbeanstalk:environment"
name = "LoadBalancerType"
value = "application"
resource = ""
}
setting {
namespace = "aws:elasticbeanstalk:application:environment"
name = "AWS_REGION"
value = var.aws_region
resource = ""
}
# ...
depends_on = [
aws_vpc_endpoint.ec2,
aws_vpc_endpoint.elasticbeanstalk,
]
}
Subnets created like this:
resource "aws_subnet" "public" {
vpc_id = aws_vpc.main.id
cidr_block = "10.16.0.0/24"
availability_zone = "${var.aws_region}c"
map_public_ip_on_launch = true
}
resource "aws_subnet" "private_a" {
vpc_id = aws_vpc.main.id
cidr_block = "10.16.192.0/24"
availability_zone = "${var.aws_region}a"
}
resource "aws_subnet" "private_b" {
vpc_id = aws_vpc.main.id
cidr_block = "10.16.224.0/24"
availability_zone = "${var.aws_region}b"
}
Route tables:
resource "aws_route_table" "private" {
vpc_id = aws_vpc.main.id
}
resource "aws_route_table_association" "private_a" {
route_table_id = aws_route_table.private.id
subnet_id = aws_subnet.private_a.id
}
resource "aws_route_table_association" "private_b" {
route_table_id = aws_route_table.private.id
subnet_id = aws_subnet.private_b.id
}
resource "aws_route_table" "public" {
vpc_id = aws_vpc.main.id
}
resource "aws_route_table_association" "public" {
route_table_id = aws_route_table.public.id
subnet_id = aws_subnet.public.id
}
resource "aws_internet_gateway" "public" {
vpc_id = aws_vpc.main.id
}
resource "aws_route" "public_internet" {
route_table_id = aws_route_table.public.id
gateway_id = aws_internet_gateway.public.id
destination_cidr_block = "0.0.0.0/0"
}
Some observations which can contribute to your issue are listed below. General guide on setting up EB in private VPC is in aws docs.
The observations:
aws_vpc.main.vpc_id should be aws_vpc.main.id.
in all your endpoints private_dns_enabled is false by default. It should be true for the endpoints to work seamlessly. You can add the following to your endpoints:
private_dns_enabled = true
your endpoints are not associated with any subnets. To associate them you can use:
subnet_ids = [aws_subnet.private_a.id, aws_subnet.private_b.id]
there is no interface endpoint for cloudformation:
resource "aws_vpc_endpoint" "cloudformation" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.${var.aws_region}.cloudformation"
security_group_ids = [
aws_security_group.elasticbeanstalk_vpc_endpoint.id,
]
subnet_ids = [aws_subnet.private_a.id, aws_subnet.private_b.id]
private_dns_enabled = true
vpc_endpoint_type = "Interface"
}
EB gets application zip from S3, yet there is no S3 endpoint. For S3 you can use:
resource "aws_vpc_endpoint" "s3" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.${var.aws_region}.s3"
vpc_endpoint_type = "Gateway"
route_table_ids = [ aws_route_table.private.id ]
}
there is no vpc endpoint to elasticbeanstalk-healthd. Can use:
resource "aws_vpc_endpoint" "elasticbeanstalk-hc" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.${var.aws_region}.elasticbeanstalk-health"
security_group_ids = [
aws_security_group.elasticbeanstalk_vpc_endpoint.id,
]
private_dns_enabled = true
vpc_endpoint_type = "Interface"
}
if you add new endpoints, you need to update your depends_on in aws_elastic_beanstalk_environment