The ELB could not be updated due to the following error: Primary taskset target group must be behind listener arn:aws:elasticloadbalancing - amazon-web-services

I'm trying to learn about ECS and Blue/Green deployments with CodePipeline.
I have been following several tutorials but I'm stuck with this error:
The ELB could not be updated due to the following error: Primary
taskset target group must be behind listener
arn:aws:elasticloadbalancing:us-east-1:XXX:listener/app/ecs-t-XXX/XXX/XXX.
This is what I have:
I created 2 listeners: one for PROD (8080) and the other for TEST (8088)
I created 2 Target Groups: one for each listener.
When I go to my Load Balancer and check the Listeners I can see them there.
I have 2 services.
The original (Service1) and the new one (Service2)
Both services have the same configuration (except for the target group under load balancer: Service1 has the target-1 and Service2 the target-2).
Service2 has CODE_DEPLOY as DeploymentController and REPLICA as SchedulingStrategy (Service1 doesn't)
In CodePipeline, when I reach the deploy action, it fails with the previous message.
The Load Balancer configuration seems fine:
Listener 8080 forwards to target1
Listener 8088 forwards to target2
I checked my CodeDeploy application and deployment group.
Everything seems fine. Under Load Balancing, I have target-1 which points to the PROD listener, and target-2 which points to the TEST listener.
In relation to the environment, the ECS Service is the Service2 (aka, the new one).
Permissions are fine.
So, what am I not seeing?
I searched about this error but I could not find an answer that worked for me.
The closest one was about not attaching the target groups to the Load Balancer. But in my case, I have them attached to the Load Balancer and the listeners are attached to the respective target group.
I'd appreciate help. I'm out of ideas.

This might be the issue:
I created 2 Target Groups: one for each listener. When I go to my Load Balancer and check the Listeners I can see them there.
If you've already pointed both listeners to their corresponding target groups, this issue will occur. Try pointing both the listeners to the same target group under which application is currently running.
More troubleshooting steps here: https://aws.amazon.com/premiumsupport/knowledge-center/ecs-blue-green-deployment/

Related

Health Check keeps failing for ECS container

I am currently trying to deploy 2 ECS services on a single EC2 instance for test environment.
Here is what I have done so far:
Successfully created 2 Security Groups for Load Balancer and EC2 instance.
My EC2 Security Group
My ALB Security Group
Successfully created 2 different Task Definitions for my 2 applications, all Spring Boot application. First application is running on port 8080, Container Port in Task Definition is also 8080. The second application is running on port 8081, Container Port in Task Definition is also 8081.
Successfully created an ECS cluster with an Auto-Scaling Group as Capacity Provider. The cluster also recognizes the Container Instance created from Auto-Scaling Group (I am using t2.micro since it is in free-tier package). Attached created Security Group to EC2 instance.
My EC2 Security Group
Successfully created an ALB with 2 forward listeners 8080 and 8081 configured to 2 different Target Groups for each service. Attached created Security Group to ALB.
Here is how the ECS behaves with my services:
I attempted to create 2 new services. First service mapped with port 8080 on ALB. The second one mapped with port 8081 on ALB. Each of them have different Target Group but the Health Check configurations are the same
Health Check Configuration for Service 1
Health Check Configuration for Service 2
The first service was deployed pretty smooth, health check returned success on the first try.
However, for the second service, I use the exact same configuration as the first one, just a different port listener on ALB and the application container running on a different port number as well (which I believe that it should not be a problem). The service attempted 10 times before it fails the deployment and I noticed getting this repeated error message: service <service_name> instance <instance_id> port <port_number> is unhealthy in target-group <target_group_name> due to (reason Health checks failed).
This did not happen with my first service with the same configuration. The weird thing is that when I attempted to send a request the ALB domain name on port 8081, the application on the second service seems to be working fine without any error. It is just that the Unhealthy Check keeps throwing my service off.
I went over bunch of posts and nothing really helps with the current situation. Also, it is kind of dumb since I cannot dig any further details rather than this info in this image below.
Anyone has any suggestion to resolve this problem? Would really appreciate it.

AWS ECS Fargate dynamically register IP to target group

I use AWS at work and I am fairly new to this.
I have multiple Services with one Task/Container running. Each Container is fundamentally the same with a few changes, it's basically for different stages/deployments. I have one target group for each, so my load balancer routes requests from specific domains to each.
For example: if host is example1.com then forward to exampleTargetGroup1 and so on.
The Problem
As you may know each time a container is updated, its IP changes, hence I have to re-register the new IP to the target group
I have found several approaches to this problem. Most of them suggest to use a Network Load Balancer for a static IP, but this doesn't work because, as I understand it, it registers the containers automatically on updates.
Another solution is to trigger a Lambda function on a cloud watch events when the Task is being updated. The function grabs the IP and updates the Route53 record. My Idea was to take this approach and deregister the old IP in the target group and register the new one.
My Questions
Is there a better solution to this or did I understand the first solutions wrong? If the last solution is optimal for my problem is there maybe a code sample so I won't need to figure it out?
EDIT:
Thanks to Mark B I now know, you should preferably use the AWS API or a tool like Terraform to create an ECS Service and associating a target group to it.
"but this doesn't work because, as I understand it, it registers the
containers automatically on updates."
I think you are misunderstanding something here. Each ECS service should be associated with a load balancer Target Group. Whenever the service creates a task, the service will automatically add that task's IP to the target group. Whenever the service removes a task, it will also remove that task's IP from the target group. This works with both Network Load Balancers and Application Load Balancers.
You stated the following:
"I have multiple Services with one Task/Container running"
So you have one task per service, and one service per target group. From your description, your architecture should look like this:
One load Balancer with multiple domains pointing at it.
In the Load Balancer listener configuration, you have each domain configured to route to a different target group.
Each ECS service configured with a task count of 1
Load balancer -> domain name 1 -> target group 1 -> ECS service 1 -> ECS task 1
Load balancer -> domain name 2 -> target group 2 -> ECS service 2 -> ECS task 2
Load balancer -> domain name 3 -> target group 3 -> ECS service 3 -> ECS task 3
etc...
In the above scenario, as long as you have each ECS service configured with the appropriate target group, each time that service redeploys a task it will automatically update the target group to point to the updated task.
In other words ECS will "dynamically register the IP to target group", exactly like you are wanting.

AWS CodeDeploy: stuck on install step

I'm running through this tutorial to create a deployment pipeline with my custom .net-based docker image.
But when I start a deployment, it's stuck on install phase, so I have to stop it manually:
After that I get a couple of running tasks with different task definitions (note :1 and :4, 'cause I've tried to run deployment 4 times by now):
They also change their state RUNNING->PROVISIONING->PENDING all the time. And the list of stopped tasks grows:
Q:
So, how to hunt down the issue with CodeDeploy? Why It's running forever?
UPDATE:
It is connected to health checks.
UPDATE:
I'm getting this:
(service dataapi-dev-service, taskSet ecs-svc/9223370487815385540) (port 80) is unhealthy in target-group dataapi-dev-tg1 due to (reason Health checks failed with these codes: [404]).
Don't quite understand, why is it failing for newly created container, 'cause the original one passes health-check.
While the ECS task is running, ELB (Elastic Load Balancer) will constantly do healthchecking the container as you config in the target group to check if the container is still responding.
From your debug message, the container (api) responded the healthcheck path with 404.
I suggest you config the healhcheck path in target group dataapi-dev-tg1.
For those who are still hitting this issue: in my case the ECS cluster had no outbound connectivity.
Possible solutions to this problem:
make security groups you use with your VPC allow outbound traffic
make sure that the route table you use with VPC has subnet associations with subnets you use with your load balancer (examine route tables)
I have able to figure it out because I enabled CloudWatch during ECS cluster creation and got CannotPullContainerError. For more information on solving this problem look into Cannot Pull Container Image Error.
Make sure your Internet Gateway is attached to your Subnets through the Route Table (Routes), if your Load Balancer is internet facing.
The error is due to health check which detected an unhealthy target.
Make sure to check your configuration in Target group settings.

AWS ECS error: Task failed ELB health checks in Target group

I am using cloud formation template to build the infrastructure (ECS fargate cluster).
Template executed successfully and stack has been created successfully. However, task has failed with the following error:
Task failed ELB health checks in (target-group arn:aws:elasticloadbalancing:eu-central-1:890543041640:targetgroup/prc-service-devTargetGroup/97e3566c8b307abf)
I am not getting what and where to look for this to troubleshoot the issue.
as it is fargate cluster, I am not getting how to login to container and execute some health check queries to debug further.
Can someone please help me to guide further on this and help me?
Due to this error, I am not even able to access my web app. As ALB won't route the traffic if it is unhealthy.
What I did
After some googling, I found this post:
https://aws.amazon.com/premiumsupport/knowledge-center/troubleshoot-unhealthy-checks-ecs/
However, I guess, this is related to EC2 compatibility in fargate. But in my case, EC2 is not there.
If you feel, I can paste the entire template as well.
please help
This is resolved.
It was the issue with the following points:
Docker container port mapping with host port were incorrect
ALB health check interval time was very short. Due to that, ALB was giving up immediately, not waiting for docker container to up and running properly.
after making these changes, it worked properly
There are quite a few of different possible reasons for this issue, not only the open ports:
Improper IAM permissions for the ecsServiceRole IAM role
Container instance security group Elastic Load Balancing load
balancer not configured for all Availability Zones Elastic Load
Balancing load balancer health check misconfigured
Unable to update the service servicename: Load balancer container name or port changed in task definition
Therefore AWS created an own website in order to address the possibilities of this error:
https://docs.aws.amazon.com/en_en/AmazonECS/latest/developerguide/troubleshoot-service-load-balancers.html
Edit: in my case the health check code of my application was different. The default is 200 but you can also add a range such as 200-499.
Let me share my experience.
In my case everything was correct, except the host on which the server listens, it was localhost which makes the server not reachable from the outside world and respectively the health check didn't work. It should be 0.0.0.0 or empty in some libraries.
I got this error message because the security group between the ECS service and the load balancer target group was only allowing HTTP and HTTPS traffic.
Apparently the health check happens over some other port and or protocol as updating the security group to allow all traffic on all ports (as suggested at https://docs.aws.amazon.com/AmazonECS/latest/userguide/create-application-load-balancer.html) made the health check work.
I had this exact same problem. I was able to get around the issue by:
navigate to EC2 service
then select Target Group in the side panel
select your target group for your load balancer
select the health check tab
make sure the health check for your EC2 instance is the same as the health check in the target group. This will tell your ELB to route its traffic to this endpoint when conducting its health check. In my case my health check path was /health.
In my case, ECS Fargate orchestration of the docker container functionality as a service and not a Web app or API. The service is that is not listening to any port (eg: Schedule corn/ActiveMQ message consumer ...etc).
In order words, it is a client and not a server node. So I made to listen to localhost for health check only...
All I added health check path in Target Group to -
And below code in index.ts -
import express from 'express';
const app = express();
const port = process.env.PORT || 8080;
//Health Check
app.get('/__health', (_, res) => res.send({ ok: 'yes' }));
app.listen(port, () => {
logger.info(`Health Check: Listening at http://localhost:${port}`);
});
As mentioned by tschumann above, check the security group around the ECS cluster. If using Terraform, allow ingress to all docker ephemeral ports with something like below:
resource "aws_security_group" "ecs_sg" {
name = "ecs_security_group"
vpc_id = "${data.aws_vpc.vpc.id}"
}
resource "aws_security_group_rule" "ingress_docker_ports" {
type = "ingress"
from_port = 32768
to_port = 61000
protocol = "-1"
cidr_blocks = ["${data.aws_vpc.vpc.cidr_block}"]
security_group_id = "${aws_security_group.ecs_sg.id}"
}
Possibly helpful for someone.. our target group health check path was set to /, which for our services pointed to Swagger and worked well. After updating to use Springfox instead of manually generating swagger.json, / now performs a 302 redirect to /swagger-ui.html, which caused the health check to fail. Since this was for a Spring Boot service we simply pointed the health check path in the target group to /health instead (OOTB Spring status page).
Solution is partial correct in response 'iravinandan', but in last part of your nodejs router just simple add status(200) and that's it. Or you can set your personal status clicking on advance tab, on end of the page.
app.get('/__health', (request, response) => response.status(200).end(""));
More info here: enter link description here
Regards
My case was a React application running on FARGATE mode.
The first issue was that the Docker image was built over NodeJS "serving" it with:
CMD npm run start # react-scripts start
Besides that's not a good practice at all, it requires a lot of resources (4GB & 2vCPU were not enough), and because of that, the checks were failing. (this article mentions this as a probable cause)
To solve the previous issue, we modify the image as a multistage build with NodeJS for the building phase + NGINX for serving the content. Locally that was working great, but we haven't realized that the default port for NGINX is 80, and you can not use a different host and container port on FARGATE with awsvpc network mode.
To troubleshoot it, I launched an EC2 instance with the right Security Groups to connect with the FARGATE targets on the same port the Load Balancer was failing to perform a Health Check. I was able to execute curl's commands against other targets, but with this unhealthy target (constantly being recycled) I received an instant Connection refused response. It wasn't a timeout, which told me that the target was not able to manage that request because it was not listening to that port. Then I realized that my container was expecting traffic on port 80 and my application was configured to work on a 3xxx port.
The solution here was to modify the default configuration of NGINX to listen to the port we wanted, re-build the image and re-launch the service.
On my case, my ECS Fargate service does not need load balancer so I've removed "Load Balancer" and "Security Group" then it works.
I had the same issue with deploying a java springboot app on ACS running as a fargate. There were 3 issues which I had to address to fix the problem, if this can help others in future.
The container was running on port 8080 (because of tomcat), so the ELB, target group and the two security groups (one with ELB and one with ECS) must allow 8080 in their inbounds rules. Also the task set up had to be revised to change the container to map at 8080.
The port on target group health check section (advance settings) had to be explicitly changed to 8080 instead of 80 as the default.
I had to create a dummy health check path in the application because pinging the root of the app at "/" was resulting in a 302 error code.
Hope this helps.
I have also faced the same issue while using the AWS Fargate.
Here are some possible solutions to try:
First Check the Security group of Service that Attached has outbound and Inbound rules in place.
If you are using the Loadbalancer and pointing out to target group then you must enable the docker container port on security group and attached the inbound traffic only coming from the ALB security group
3)Also check the healthcheck endpoint that we are assigning to target group are there any dependanies it should return only 200 status repsonse / what we have specifed in target group
In my case it was a security group rule which allowed connections only from a certain IP, and this was blocking healthchecks from LB. I added VPC's cidr as another rule to the security group and then it worked.

AWS CodeDeploy deployment failed at event BlockTraffic

I am trying to set up auto-deployment from GitHub to AWS, using EC2 behind an ELB.
After following the Tutorial: Use AWS CodeDeploy to Deploy an Application from GitHub, my deployment fails at the BlockTraffic event, after trying for an hour (1h 2min last time) with error code ScriptFailed. I'm not sure how to troubleshoot the issue/where to look.
The ELB target group target health status: healthy
Health Check configuration:
Healthy threshold: 2
Unhealthy threshold: 2
Timeout: 5
Interval: 10
Success codes: 200
don't enable Load Balancer on Code Deploy deployment group for Pipeline and you will get rid off that BlockTraffic and AllowTraffic steps.
Screenshot
Make sure your Code Deploy role has sufficient access to register and de-register instance if it is behind ELB.
Below permissions may required.
"elasticloadbalancing:DescribeLoadBalancers",
"elasticloadbalancing:DescribeInstanceHealth",
"elasticloadbalancing:RegisterInstancesWithLoadBalancer",
"elasticloadbalancing:DeregisterInstancesFromLoadBalancer",
"elasticloadbalancing:DescribeTargetGroups",
"elasticloadbalancing:DescribeTargetHealth",
"elasticloadbalancing:RegisterTargets",
"elasticloadbalancing:DeregisterTargets"
There is an AWSCodeDeployRole policy, makes it very easy to cover the permissions you need to use codedeploy
I had the same issue and I realised that in the deployment group,
I didn't tag the instance id of the target group, upon which I was doing health checks to find whether the target group was healthy or not. Hence deployment group knew the target group, it has to deal traffic with.
The issue I ran into is that for an ELB if the port was not the expected port, code deploy's BlockTraffic would not know how to deregister the instance from the target group.
In my example I had my HTTPS ELB communicate via HTTP to port 3000 on each of my target groups. I found the specific root cause by using this guide: https://aws.amazon.com/premiumsupport/knowledge-center/codedeploy-failed-ec2-deployment/
It gave the following output which identified the instance of me using port 3000 instead of the expected port 80.
During BlockTraffic, Codedeploy service invoke the Loadbalancer to de-register the instance from the target group before start installing the application revision
DeregisterTargets API call can be noticed in cloudtrail logs during BlockTraffic lifecycle hook
Currently Codedeploy does not support the case when the target group have a different port than the port used to register the instance.
** DeregisterTargets API will not be able to deregister the instance if the port configured in the Target group is different
You need to make sure that both the target group and the instance are configured to use the same port.
BlockTraffic depends mainly on the de-registration delay on the target group or connection draining on Classic LB. To speed up this step, the de-registration delay /connection draining value can be reduced to a reasonable value.