How to make ALB slow_start work during ECS update service - amazon-web-services

For my high-traffic containerized app running in ECS Fargate, slow ramp-up is required for new containers, to avoid out of memory situation immediately after startup. This is especially important during the update service operation when all the containers are replaced at the same time.
How can I get this to work with ECS Fargate and ALB, making sure the old containers stay around until the slow_start period for the new containers is over?
This is my current terraform setup. I enabled slow_start, but during update service the old containers are stopped too early, so that the new containers get full traffic instantly.
resource "aws_alb_target_group" "my_target_group" {
name = "my_service"
port = 8080
protocol = "HTTP"
vpc_id = data.aws_vpc.active.id
target_type = "ip"
slow_start = 120
health_check {
enabled = true
port = 8080
path = "/healthCheck"
unhealthy_threshold = 2
healthy_threshold = 2
}
}
resource "aws_ecs_service" "my_service" {
name = "my_service"
cluster = aws_ecs_cluster.my_services.id
task_definition = aws_ecs_task_definition.my_services.arn
launch_type = "FARGATE"
desired_count = var.desired_count
deployment_maximum_percent = 400
deployment_minimum_healthy_percent = 100
enable_execute_command = true
wait_for_steady_state = true
network_configuration {
subnets = data.aws_subnets.private.ids
security_groups = [aws_security_group.my_service_container.id]
}
load_balancer {
container_name = "my-service"
container_port = 8080
target_group_arn = aws_alb_target_group.my_target_group.arn
}
lifecycle {
create_before_destroy = true
ignore_changes = [desired_count]
}
}

aws ecs usually sends sigterm to gracefully shutdown and sends sigkill if 30 second passed.
So, you can handle this sigterm signal (example to catch this signal in python) and add delay in your code. After that, you need to adjust sigkill 30 second wait with stopTimeout in ContainerDefinition to stop aws shutting down the ecs.

Related

Terraform aws-lb-target-group problems

My situation is as follows:
I’m creating a AWS ECS (NGINX container for SFTP traffic) FARGATE setup with Terraform with a network load balancer infront of it. I’ve got most parts set-up just fine and the current setup works. But now i want to add more target groups to the setup so i can allow more different ports to the container. My variable part is as follows:
variable "sftp_ports" {
type = map
default = {
test1 = {
port = 50003
}
test2 = {
port = 50004
}
}
}
and the actual deployment is as follows:
resource "aws_alb_target_group" "default-target-group" {
name = local.name
port = var.sftp_test_port
protocol = "TCP"
target_type = "ip"
vpc_id = data.aws_vpc.default.id
depends_on = [
aws_lb.proxypoc
]
}
resource "aws_alb_target_group" "test" {
for_each = var.sftp_ports
name = "sftp-target-group-${each.key}"
port = each.value.port
protocol = "TCP"
target_type = "ip"
vpc_id = data.aws_vpc.default.id
depends_on = [
aws_lb.proxypoc
]
}
resource "aws_alb_listener" "ecs-alb-https-listenertest" {
for_each = var.sftp_ports
load_balancer_arn = aws_lb.proxypoc.id
port = each.value.port
protocol = "TCP"
default_action {
type = "forward"
target_group_arn = aws_alb_target_group.default-target-group.arn
}
}
This deploys the needed listeners and the target groups just fine but the only problem i have is on how i can configure the registered target part. The aws ecs service resource only allows one target group arn so i have no clue on how i can add the additional target groups in order to reach my goal. I've been wrapping my head around this problem and scoured the internet but so far .. nothing. So is it possible to configure the ecs service to contain more target groups arns or am i supposed to configure only single target group with multiple ports (as far as i know, this is not supported out of the box, checked the docs as well. But it is possible to add multiple registered targets in the GUI so i guess it is a possibility)?
I’d like to hear from you guys,
Thanks!

Target group constantly fails health check on port 80 and launches new instances when using dynamic port mapping

I have an ECS cluster and an Application load balancer. I have setup dynamic port mapping for Amazon ECS following aws's docs.
The problem is that port 80 of my instance gets registered as a target in my target group which always fail (and it will because the container is exposed at the ephemeral port range 32768 - 65535:
Because of that, the Autoscaling group that I have constantly spun up new EC2 instances and terminates existing ones
Bellow are my Tarraform config file that creates the ALB, listener and target_group:
resource "aws_alb" "default" {
name = "${var.app_name}-${var.app_environment}-alb"
load_balancer_type = "application"
internal = true
subnets = var.loadbalancer_subnets
security_groups = [aws_security_group.load_balancer_security_group.id]
}
resource "aws_lb_listener" "default" {
load_balancer_arn = aws_alb.default.arn
port = "80"
protocol = "HTTP"
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.default.arn
}
}
resource "aws_lb_target_group" "default" {
name_prefix = "rushmo"
port = 80
protocol = "HTTP"
vpc_id = var.vpc_id
target_type = "instance"
health_check {
healthy_threshold = "2"
unhealthy_threshold = "5"
interval = "300"
port = "traffic-port"
path = "/"
protocol = "HTTP"
matcher = "200,301,302"
}
}
resource "aws_autoscaling_group" "default" {
name = "${var.app_name}-${var.app_environment}-ASG"
desired_capacity = 1
health_check_type = "ELB"
health_check_grace_period = 600 # 10 min
launch_configuration = aws_launch_configuration.default.name
max_size = 1
min_size = 1
target_group_arns = [aws_lb_target_group.default.arn]
termination_policies = ["OldestInstance"]
vpc_zone_identifier = var.application_subnets
protect_from_scale_in = true
}
Note: If I manually deregister the target on port 80 from the Target group the problem with the constant termination and launching of new instances is solved but I don't understand what I have done wrong and why this port 80 shows up as a registered target and not only the ephemeral port range
I think the issue is due to:
health_check_type = "ELB"
This makes ASG to use ALB's health checks on port 80 of your instances. However, since you are using ECS, the health checks should be only used for your containers, not the instances themself. Thus it should be:
health_check_type = "EC2"

AWS - ECS EC2 cluster running container with ALB and more than 5 ports forwarded - in Terraform

I am running an ECS cluster with about 20 containers. I have a big monolith application running on 1 container which requires to listen on 10 ports.
However AWS requires to have a max of 5 load balancer target group links in an ECS Service.
Any ideas how to overcome this (if possible)? Here's what I've tried:
Defining 10+ target groups with 1 listener each. Doesn't work since AWS requires a max of 5 load balancer definitions in the aws_ecs_service - for info - here: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/service-load-balancing.html as stated in the 2nd bullet under "Service load balancing considerations"
Defining 10+ listeners with 1 target group - however all listeners forward to a single port on the container...
Tried without specifying port in the load_balancer definition in aws_ecs_service, however AWS complains for missing argument
Tried without specifying port in the aws_lb_target_group, however AWS complains that target type is "ip", so port is required...
Here's my current code:
resource "aws_ecs_service" "service_app" {
name = "my_service_name"
cluster = var.ECS_CLUSTER_ID
task_definition = aws_ecs_task_definition.task_my_app.arn
desired_count = 1
force_new_deployment = true
...
load_balancer { # Note: I've stripped the for_each to simplify reading
target_group_arn = var.tga
container_name = var.n
container_port = var.p
}
}
resource "aws_lb_target_group" "tg_mytg" {
name = "Name"
protocol = "HTTP"
port = 3000
target_type = "ip"
vpc_id = aws_vpc.my_vpc.id
}
resource "aws_lb_listener" "ls_3303" {
load_balancer_arn = aws_lb.my_lb.id
port = "3303"
protocol = "HTTP"
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.tg_mytg.arn
}
}
...

ALB Health checks Targets Unhealthy

I am trying to provision an ECS cluster using Terraform along with an ALB. The targets come up as Unhealthy. The error code is 502 in the console Health checks failed with these codes: [502]
I checked through the AWS Troubleshooting guide and nothing helped there.
EDIT: I have no services/tasks running on the EC2 containers. Its a vanilla ECS cluster.
Here is my relevant code for the ALB:
# Target Group declaration
resource "aws_alb_target_group" "lb_target_group_somm" {
name = "${var.alb_name}-default"
port = 80
protocol = "HTTP"
vpc_id = "${var.vpc_id}"
deregistration_delay = "${var.deregistration_delay}"
health_check {
path = "/"
port = 80
protocol = "HTTP"
}
lifecycle {
create_before_destroy = true
}
tags = {
Environment = "${var.environment}"
}
depends_on = ["aws_alb.alb"]
}
# ALB Listener with default forward rule
resource "aws_alb_listener" "https_listener" {
load_balancer_arn = "${aws_alb.alb.id}"
port = "80"
protocol = "HTTP"
default_action {
target_group_arn = "${aws_alb_target_group.lb_target_group_somm.arn}"
type = "forward"
}
}
# The ALB has a security group with ingress rules on TCP port 80 and egress rules to anywhere.
# There is a security group rule for the EC2 instances that allows ingress traffic to the ECS cluster from the ALB:
resource "aws_security_group_rule" "alb_to_ecs" {
type = "ingress"
/*from_port = 32768 */
from_port = 80
to_port = 65535
protocol = "TCP"
source_security_group_id = "${module.alb.alb_security_group_id}"
security_group_id = "${module.ecs_cluster.ecs_instance_security_group_id}"
}
Has anyone hit this error and know how to debug/fix this ?
It looks like you're trying to be register the ECS cluster instances with the ALB target group. This isn't how you're meant to send traffic to an ECS service via an ALB.
Instead you should have your service join the tasks to the target group. This will mean that if you are using host networking then only the instances with the task deployed will be registered. If you are using bridge networking then it will add the ephemeral ports used by your task to your target group (including allowing for there to be multiple targets on a single instance). And if you are using awsvpc networking then it will register the ENIs of every task that the service spins up.
To do this you should use the load_balancer block in the aws_ecs_service resource. An example might look something like this:
resource "aws_ecs_service" "mongo" {
name = "mongodb"
cluster = "${aws_ecs_cluster.foo.id}"
task_definition = "${aws_ecs_task_definition.mongo.arn}"
desired_count = 3
iam_role = "${aws_iam_role.foo.arn}"
load_balancer {
target_group_arn = "${aws_lb_target_group.lb_target_group_somm.arn}"
container_name = "mongo"
container_port = 8080
}
}
If you were using bridge networking this would mean that the tasks are accessible on the ephemeral port range on the instances so your security group rule would need to look like this:
resource "aws_security_group_rule" "alb_to_ecs" {
type = "ingress"
from_port = 32768 # ephemeral port range for bridge networking tasks
to_port = 60999 # cat /proc/sys/net/ipv4/ip_local_port_range
protocol = "TCP"
source_security_group_id = "${module.alb.alb_security_group_id}"
security_group_id = "${module.ecs_cluster.ecs_instance_security_group_id}"
}
it looks like the http://ecsInstanceIp:80 is not returning HTTP 200 OK. I would check that first. It would be easy to check if the instance is public. It wont be the case most of the times. Otherwise I would create an EC2 instance and make a curl request to confirm that.
You may also check the container logs to see if its logging the health check response.
Hope this helps. good luck.

How do I configure AWS network load balancers to route HTTPS traffic with Terraform?

I am trying to use API Gateway’s VPC links to route traffic to an internal API on HTTPS.
But, VPC links forces me to change my API’s load balancer from "application" to "network".
I understand that the network load balancer is on layer 4 and as such will not know about HTTPS.
I am used to using the layer 7 application load balancer. As such, I am not sure how I should configure or indeed use the network load balancer in terraform.
Below is my attempt at configuring the network load balancer in Terraform.
The health check fails and I'm not sure what I am doing wrong.
resource "aws_ecs_service" “app” {
name = "${var.env}-${var.subenv}-${var.appname}"
cluster = "${aws_ecs_cluster.cluster.id}"
task_definition = "${aws_ecs_task_definition.app.arn}"
desired_count = "${var.desired_app_count}"
deployment_minimum_healthy_percent = 50
deployment_maximum_percent = 200
iam_role = "arn:aws:iam::${var.account}:role/ecsServiceRole"
load_balancer {
target_group_arn = "${aws_lb_target_group.app-lb-tg.arn}"
container_name = "${var.env}-${var.subenv}-${var.appname}"
container_port = 9000
}
depends_on = [
"aws_lb.app-lb"
]
}
resource "aws_lb" “app-lb" {
name = "${var.env}-${var.subenv}-${var.appname}"
internal = false
load_balancer_type = "network"
subnets = "${var.subnet_ids}"
idle_timeout = 600
tags {
Owner = ""
Env = "${var.env}"
}
}
resource "aws_lb_listener" “app-lb-listener" {
load_balancer_arn = "${aws_lb.app-lb.arn}"
port = 443
protocol = "TCP"
default_action {
type = "forward"
target_group_arn = "${aws_lb_target_group.app-lb-tg.arn}"
}
}
resource "aws_lb_target_group" “app-lb-tg" {
name = "${var.env}-${var.subenv}-${var.appname}"
port = 443
stickiness = []
health_check {
path = "/actuator/health"
}
protocol = "TCP"
vpc_id = "${var.vpc_id}"
}
For reference, this is how I previously configured my application load balancer before attempting to switch to a network load balancer:
resource "aws_ecs_service" "app" {
name = "${var.env}-${var.subenv}-${var.appname}"
cluster = "${aws_ecs_cluster.cluster.id}"
task_definition = "${aws_ecs_task_definition.app.arn}"
desired_count = "${var.desired_app_count}"
deployment_minimum_healthy_percent = 50
deployment_maximum_percent = 200
iam_role = "arn:aws:iam::${var.account}:role/ecsServiceRole"
load_balancer {
target_group_arn = "${aws_lb_target_group.app-alb-tg.arn}"
container_name = "${var.env}-${var.subenv}-${var.appname}"
container_port = 9000
}
depends_on = [
"aws_alb.app-alb"]
}
resource "aws_alb" "app-alb" {
name = "${var.env}-${var.subenv}-${var.appname}"
subnets = "${var.subnet_ids}"
security_groups = [
"${var.vpc_default_sg}",
"${aws_security_group.app_internal.id}"]
internal = false
idle_timeout = 600
tags {
Owner = ""
Env = "${var.env}"
}
}
resource "aws_lb_listener" "app-alb-listener" {
load_balancer_arn = "${aws_alb.app-alb.arn}"
port = "443"
protocol = "HTTPS"
ssl_policy = "ELBSecurityPolicy-2015-05"
certificate_arn = "${var.certificate_arn}"
default_action {
type = "forward"
target_group_arn = "${aws_lb_target_group.app-alb-tg.arn}"
}
}
resource "aws_lb_target_group" "app-alb-tg" {
name = "${var.env}-${var.subenv}-${var.appname}"
port = 80
health_check {
path = "/actuator/health"
}
protocol = "HTTP"
vpc_id = "${var.vpc_id}"
}
A network load balancer automatically performs passive health checks on non UDP traffic that flows through it so if that's enough then you can just remove the active health check configuration.
If you want to enable active health checks then you can either use TCP health checks (the default) which will just check that the port is open or you can specify the HTTP/HTTPS protocol and specify a path. Ideally the AWS API would error when you try to specify a path for the health check but don't set the protocol to HTTP or HTTPS but apparently that's not the case right now.
With Terraform this would look something like this:
resource "aws_lb_target_group" "app-alb-tg" {
name = "${var.env}-${var.subenv}-${var.appname}"
port = 443
protocol = "TCP"
vpc_id = "${var.vpc_id}"
health_check {
path = "/actuator/health"
protocol = "HTTPS"
}
}
Remember that active health checks which will check that the port is open on the target from the network load balancer's perspective (not just the source traffic). This means that your target will need to allow traffic from the subnets that your NLBs reside in as well as security groups or CIDR ranges etc that your source traffic originates in.