AWS ALB health check failure - amazon-web-services

I have created an AWS EC2 instance and running a jenkins container inside it,
Here are the details :
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
xxxxxxxxx docker.io/jenkins/jenkins 47 hours ago Up 47 hours ago 0.0.0.0:8080->8080/tcp, 0.0.0.0:50000->50000/tcp jenkins
after that I have configured an AWS ALB to listen on port 443 and given this jenkins instance running jenkins container as target.
The loadbalancer health check details are as below:
I have tried with traffic port aswell, but that also showd unhealthy.
after this my loadbalancer is able to forward request to the instance and I am able to reach jenkins container with the loadbalancer dns.
But my Loadbalancer is showing my target as unhealthy,
Can some one help me why my target is not healthy and what steps i could do to make it healthy.

Jenkins EC2 should be assigned a security group that allows access on the Jenkins port (8080) from the ALB. and you’ll provide full outbound internet access to Jenkins in order that updates and plugins can be installed.
The Jenkins requires authentication, so when the ALB is doing its health check as an anonymous user it gets a 403 access forbidden page, but http health check expect 200 ok code, so You can switch this to a tcp health check.

Use /login instead of /
Verify the port no also. I am using 8080

I created a Jenkins container on Fargate ECS and it was giving me this error when I tried to configure an ALB to it.
service jenkins-alb (port 8080) is unhealthy in target-group ecs-demo2-jenkins-alb due to (reason Health checks failed with these codes: [403]).
I found the solution that when the ALB healthcheck tries to go to the target group, it is greeted with the jenkins login page and requires authorization which is why the healthcheck fails with error 403. The workaround I used for this was to change the healthcheck path from '/' to '/login?from=%2F' and it worked!

Related

ALB results in 504 gateway time out error with ECS

I have an httpd container with ECS service along with ALB.
Container with ALB are using a dynamic port feature which means host port is set to 0.
if i try to ssh in an instance container and try to curl localhost:port number it works.
But when i try to use ALB DNS name it turns out to 504.
ALb security group allows HTTP 80 connections from anywhere and instance sg allows any connection on any port from alb sg.
Interestingly
when I try to check the target group associated with alb all the instances are unhealthy.
Update:- i tried to open a security group of ecs container to public and yet the instance were not healthy
you need to check the events of the ECS service and see what is the exact error message. If it states something like port 45675 is unhealthy then you need to check your security group configuration, it should get rid of 504 error message. If it states health check failed (this should give 502) then you should ssh into the container and check on which port the application is running and create a new service with the modification.
Assuming, you have configured the health check for traffic port and haven't modified it.
httpd service generally works on port 80. So I'll suggest use the container port as 80.
504 is Gateway Timeout error, if the above information doesn't help you can provide look at the AWS link here - https://aws.amazon.com/premiumsupport/knowledge-center/troubleshoot-http-5xx/
If you can share the error message from the ecs events that will help in narrowing down the issue.
Adding the screenshots of the changes I made to fix the issue, I hope it helps. I am assuming you are using the default httpd image -

Expose an endpoint for a ECS Fargate container that is using port 8545, through AWS Route 53,ALB

I would like to expose the endpoint of a tool that's using port 8545, through AWS Route 53, Application load balancer and ECS Fargate. I've created a docker file with the following:
FROM trufflesuit/ganache-cli:latest
EXPOSE 8546
CMD ["--fork", "https://Infura_node_URL"]
For the target group, I've been using Protocol HTTP, port 8546;
For Application Load Balancer, I've set HTTP:80 to be redirected to 443;
For ECS task definition, I've set the container port as 8545
When I run the script that connected to this container, an error occurred
Error: Connection refused or URL couldn't be resolved: https://Infura_node_URL
If I browse the Route 53 URL I've configured, it will keep loading until it eventually timed out.
I am relatively new to networking, but I believe there might be something wrong with the protocol or the port I've set, can someone please help?
*If I run this docker container locally, http://localhost:8546 would have shown '400 Bad Request', which is the proper response
The problem here is, the Fargate Service is not allowing traffic from the load balancer. Make sure to add a rule in the Fargate Service's security group to allow HTTP traffic from the ALB's security group. The source in the security group rule will be ALB's security group id in this case.

ALB Target groups showing Unhealthy, though my application is running fine

I have microservices deployed in containers, which are running fine and we are able to access with ALBendpoint/microservice.
But my target group which attached to ALB is showing the "Unhealthy" status.
Errors in AWS console:
None of these Availability Zones contains a healthy target. Requests are being routed to all targets.
Health checks failed with these codes: [404]
I am seeing two issues here.
Why the application is running fine when the healthcheck fails. here is the explanation from AWS Docs:
If a target group contains only unhealthy registered targets, the load balancer nodes route requests across its unhealthy targets. Health checks for your target groups
How could you fix the health check while the instances are draining because of failed healthchecks.
404 means that the health check URL is not found. Confirm the health check configuration. your health check URL should respond HTTP 200 OK response. If your instances are draining repeatedly, you can temporarily set the health check rule to match HTTP 404 until your instances becomes healthy. Once you figure out the correct health check URL, you can set that.
Hope this helps.
In my case IIS server and resolved with the below steps.
Check the security groups - whether we have opened the required ports from ALB SG to EC2 SG.
Login to server and check does IIS server's default site has 443 port opened if your health-check is on 443. (whatever port you are using for health checks).
Use the curl command to troubleshoot the issue.
If you would like to check on HTTPS use the below command to check the response. Use -k or --insecure to ignore the SSL issue.
curl https://[serverIP] -k
For HTTP test use the below command.
curl http://[serverIP]
One of the reason fo this could be that ALB can not access the EC2 containers. I faced a similar issue in which my Drupal application was running but target group was showing unhealthy.
To resolve this, please check whether you have added ALB's security group on port 80 in EC2's security group.
By doing this, the issue will be resolved.
I was dealing with this issue for 1 day. I finally realized that I had removed the default server configuration from nginx.
this is needed to for de default path that the healthCheck checks

AWS ECS error: Task failed ELB health checks in Target group

I am using cloud formation template to build the infrastructure (ECS fargate cluster).
Template executed successfully and stack has been created successfully. However, task has failed with the following error:
Task failed ELB health checks in (target-group arn:aws:elasticloadbalancing:eu-central-1:890543041640:targetgroup/prc-service-devTargetGroup/97e3566c8b307abf)
I am not getting what and where to look for this to troubleshoot the issue.
as it is fargate cluster, I am not getting how to login to container and execute some health check queries to debug further.
Can someone please help me to guide further on this and help me?
Due to this error, I am not even able to access my web app. As ALB won't route the traffic if it is unhealthy.
What I did
After some googling, I found this post:
https://aws.amazon.com/premiumsupport/knowledge-center/troubleshoot-unhealthy-checks-ecs/
However, I guess, this is related to EC2 compatibility in fargate. But in my case, EC2 is not there.
If you feel, I can paste the entire template as well.
please help
This is resolved.
It was the issue with the following points:
Docker container port mapping with host port were incorrect
ALB health check interval time was very short. Due to that, ALB was giving up immediately, not waiting for docker container to up and running properly.
after making these changes, it worked properly
There are quite a few of different possible reasons for this issue, not only the open ports:
Improper IAM permissions for the ecsServiceRole IAM role
Container instance security group Elastic Load Balancing load
balancer not configured for all Availability Zones Elastic Load
Balancing load balancer health check misconfigured
Unable to update the service servicename: Load balancer container name or port changed in task definition
Therefore AWS created an own website in order to address the possibilities of this error:
https://docs.aws.amazon.com/en_en/AmazonECS/latest/developerguide/troubleshoot-service-load-balancers.html
Edit: in my case the health check code of my application was different. The default is 200 but you can also add a range such as 200-499.
Let me share my experience.
In my case everything was correct, except the host on which the server listens, it was localhost which makes the server not reachable from the outside world and respectively the health check didn't work. It should be 0.0.0.0 or empty in some libraries.
I got this error message because the security group between the ECS service and the load balancer target group was only allowing HTTP and HTTPS traffic.
Apparently the health check happens over some other port and or protocol as updating the security group to allow all traffic on all ports (as suggested at https://docs.aws.amazon.com/AmazonECS/latest/userguide/create-application-load-balancer.html) made the health check work.
I had this exact same problem. I was able to get around the issue by:
navigate to EC2 service
then select Target Group in the side panel
select your target group for your load balancer
select the health check tab
make sure the health check for your EC2 instance is the same as the health check in the target group. This will tell your ELB to route its traffic to this endpoint when conducting its health check. In my case my health check path was /health.
In my case, ECS Fargate orchestration of the docker container functionality as a service and not a Web app or API. The service is that is not listening to any port (eg: Schedule corn/ActiveMQ message consumer ...etc).
In order words, it is a client and not a server node. So I made to listen to localhost for health check only...
All I added health check path in Target Group to -
And below code in index.ts -
import express from 'express';
const app = express();
const port = process.env.PORT || 8080;
//Health Check
app.get('/__health', (_, res) => res.send({ ok: 'yes' }));
app.listen(port, () => {
logger.info(`Health Check: Listening at http://localhost:${port}`);
});
As mentioned by tschumann above, check the security group around the ECS cluster. If using Terraform, allow ingress to all docker ephemeral ports with something like below:
resource "aws_security_group" "ecs_sg" {
name = "ecs_security_group"
vpc_id = "${data.aws_vpc.vpc.id}"
}
resource "aws_security_group_rule" "ingress_docker_ports" {
type = "ingress"
from_port = 32768
to_port = 61000
protocol = "-1"
cidr_blocks = ["${data.aws_vpc.vpc.cidr_block}"]
security_group_id = "${aws_security_group.ecs_sg.id}"
}
Possibly helpful for someone.. our target group health check path was set to /, which for our services pointed to Swagger and worked well. After updating to use Springfox instead of manually generating swagger.json, / now performs a 302 redirect to /swagger-ui.html, which caused the health check to fail. Since this was for a Spring Boot service we simply pointed the health check path in the target group to /health instead (OOTB Spring status page).
Solution is partial correct in response 'iravinandan', but in last part of your nodejs router just simple add status(200) and that's it. Or you can set your personal status clicking on advance tab, on end of the page.
app.get('/__health', (request, response) => response.status(200).end(""));
More info here: enter link description here
Regards
My case was a React application running on FARGATE mode.
The first issue was that the Docker image was built over NodeJS "serving" it with:
CMD npm run start # react-scripts start
Besides that's not a good practice at all, it requires a lot of resources (4GB & 2vCPU were not enough), and because of that, the checks were failing. (this article mentions this as a probable cause)
To solve the previous issue, we modify the image as a multistage build with NodeJS for the building phase + NGINX for serving the content. Locally that was working great, but we haven't realized that the default port for NGINX is 80, and you can not use a different host and container port on FARGATE with awsvpc network mode.
To troubleshoot it, I launched an EC2 instance with the right Security Groups to connect with the FARGATE targets on the same port the Load Balancer was failing to perform a Health Check. I was able to execute curl's commands against other targets, but with this unhealthy target (constantly being recycled) I received an instant Connection refused response. It wasn't a timeout, which told me that the target was not able to manage that request because it was not listening to that port. Then I realized that my container was expecting traffic on port 80 and my application was configured to work on a 3xxx port.
The solution here was to modify the default configuration of NGINX to listen to the port we wanted, re-build the image and re-launch the service.
On my case, my ECS Fargate service does not need load balancer so I've removed "Load Balancer" and "Security Group" then it works.
I had the same issue with deploying a java springboot app on ACS running as a fargate. There were 3 issues which I had to address to fix the problem, if this can help others in future.
The container was running on port 8080 (because of tomcat), so the ELB, target group and the two security groups (one with ELB and one with ECS) must allow 8080 in their inbounds rules. Also the task set up had to be revised to change the container to map at 8080.
The port on target group health check section (advance settings) had to be explicitly changed to 8080 instead of 80 as the default.
I had to create a dummy health check path in the application because pinging the root of the app at "/" was resulting in a 302 error code.
Hope this helps.
I have also faced the same issue while using the AWS Fargate.
Here are some possible solutions to try:
First Check the Security group of Service that Attached has outbound and Inbound rules in place.
If you are using the Loadbalancer and pointing out to target group then you must enable the docker container port on security group and attached the inbound traffic only coming from the ALB security group
3)Also check the healthcheck endpoint that we are assigning to target group are there any dependanies it should return only 200 status repsonse / what we have specifed in target group
In my case it was a security group rule which allowed connections only from a certain IP, and this was blocking healthchecks from LB. I added VPC's cidr as another rule to the security group and then it worked.

EC2 instance attached to a load balancer is showing Unhealthy status

I created a load balancer and assigned it one of the running EC2 instance. After creation, I navigated to Target Group section in the AWS Console under Load Balancing and when I selected the target group that was assigned to the load balancer, it shows registered instance status as "Unhealthy" and there was a message above registered instance pane that says "None of these Availability Zones contains a healthy target. Requests are being routed to all targets". While creating the load balancer, I selected all the subnets (availability zones).
settings I used for health check are mentioned below,
Protocol: HTTP
Path: /healthcheck.html
Port: traffic port
Healthy threshold: 3
Unhealthy threshold: 2
Timeout: 5
Interval: 10
Success codes: 200
So why does my registered instance status as "Unhealthy" and how can I rectify/resolve that to change the status to "In-service"?
Unhealthy indicates that the health check is failing for the instance.
Things to check:
Check that the instance is running a web server
Check that the web page at healthcheck.html responds with a valid 200 response
Check that instance has a security group that permits access on Port 80 (HTTP)
In my case health check configuration on ALB is / with https.
I resolved with below steps.
Check the security groups - whether we have opened the required ports from ALB SG to EC2 SG.
Login to server and check does IIS server's default site has 443 port opened if your health-check is on 443. (whatever port you are using for health checks).
Use the curl command to troubleshoot the issue.
If you would like to check on HTTPS use the below command to check the response. Use -k or --insecure to ignore the SSL issue.
curl https://[serverIP] -k
For HTTP test use the below command.
curl http://[serverIP]
If you are sharing the load balancer among several EC2 instances that run similar services, make sure each of your services run in a different port otherwise your service won't be reachable and therefore your health check won't pass