EC2 instance attached to a load balancer is showing Unhealthy status - amazon-web-services

I created a load balancer and assigned it one of the running EC2 instance. After creation, I navigated to Target Group section in the AWS Console under Load Balancing and when I selected the target group that was assigned to the load balancer, it shows registered instance status as "Unhealthy" and there was a message above registered instance pane that says "None of these Availability Zones contains a healthy target. Requests are being routed to all targets". While creating the load balancer, I selected all the subnets (availability zones).
settings I used for health check are mentioned below,
Protocol: HTTP
Path: /healthcheck.html
Port: traffic port
Healthy threshold: 3
Unhealthy threshold: 2
Timeout: 5
Interval: 10
Success codes: 200
So why does my registered instance status as "Unhealthy" and how can I rectify/resolve that to change the status to "In-service"?

Unhealthy indicates that the health check is failing for the instance.
Things to check:
Check that the instance is running a web server
Check that the web page at healthcheck.html responds with a valid 200 response
Check that instance has a security group that permits access on Port 80 (HTTP)

In my case health check configuration on ALB is / with https.
I resolved with below steps.
Check the security groups - whether we have opened the required ports from ALB SG to EC2 SG.
Login to server and check does IIS server's default site has 443 port opened if your health-check is on 443. (whatever port you are using for health checks).
Use the curl command to troubleshoot the issue.
If you would like to check on HTTPS use the below command to check the response. Use -k or --insecure to ignore the SSL issue.
curl https://[serverIP] -k
For HTTP test use the below command.
curl http://[serverIP]

If you are sharing the load balancer among several EC2 instances that run similar services, make sure each of your services run in a different port otherwise your service won't be reachable and therefore your health check won't pass

Related

ALB results in 504 gateway time out error with ECS

I have an httpd container with ECS service along with ALB.
Container with ALB are using a dynamic port feature which means host port is set to 0.
if i try to ssh in an instance container and try to curl localhost:port number it works.
But when i try to use ALB DNS name it turns out to 504.
ALb security group allows HTTP 80 connections from anywhere and instance sg allows any connection on any port from alb sg.
Interestingly
when I try to check the target group associated with alb all the instances are unhealthy.
Update:- i tried to open a security group of ecs container to public and yet the instance were not healthy
you need to check the events of the ECS service and see what is the exact error message. If it states something like port 45675 is unhealthy then you need to check your security group configuration, it should get rid of 504 error message. If it states health check failed (this should give 502) then you should ssh into the container and check on which port the application is running and create a new service with the modification.
Assuming, you have configured the health check for traffic port and haven't modified it.
httpd service generally works on port 80. So I'll suggest use the container port as 80.
504 is Gateway Timeout error, if the above information doesn't help you can provide look at the AWS link here - https://aws.amazon.com/premiumsupport/knowledge-center/troubleshoot-http-5xx/
If you can share the error message from the ecs events that will help in narrowing down the issue.
Adding the screenshots of the changes I made to fix the issue, I hope it helps. I am assuming you are using the default httpd image -

AWS ALB health check failure

I have created an AWS EC2 instance and running a jenkins container inside it,
Here are the details :
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
xxxxxxxxx docker.io/jenkins/jenkins 47 hours ago Up 47 hours ago 0.0.0.0:8080->8080/tcp, 0.0.0.0:50000->50000/tcp jenkins
after that I have configured an AWS ALB to listen on port 443 and given this jenkins instance running jenkins container as target.
The loadbalancer health check details are as below:
I have tried with traffic port aswell, but that also showd unhealthy.
after this my loadbalancer is able to forward request to the instance and I am able to reach jenkins container with the loadbalancer dns.
But my Loadbalancer is showing my target as unhealthy,
Can some one help me why my target is not healthy and what steps i could do to make it healthy.
Jenkins EC2 should be assigned a security group that allows access on the Jenkins port (8080) from the ALB. and you’ll provide full outbound internet access to Jenkins in order that updates and plugins can be installed.
The Jenkins requires authentication, so when the ALB is doing its health check as an anonymous user it gets a 403 access forbidden page, but http health check expect 200 ok code, so You can switch this to a tcp health check.
Use /login instead of /
Verify the port no also. I am using 8080
I created a Jenkins container on Fargate ECS and it was giving me this error when I tried to configure an ALB to it.
service jenkins-alb (port 8080) is unhealthy in target-group ecs-demo2-jenkins-alb due to (reason Health checks failed with these codes: [403]).
I found the solution that when the ALB healthcheck tries to go to the target group, it is greeted with the jenkins login page and requires authorization which is why the healthcheck fails with error 403. The workaround I used for this was to change the healthcheck path from '/' to '/login?from=%2F' and it worked!

ALB Target groups showing Unhealthy, though my application is running fine

I have microservices deployed in containers, which are running fine and we are able to access with ALBendpoint/microservice.
But my target group which attached to ALB is showing the "Unhealthy" status.
Errors in AWS console:
None of these Availability Zones contains a healthy target. Requests are being routed to all targets.
Health checks failed with these codes: [404]
I am seeing two issues here.
Why the application is running fine when the healthcheck fails. here is the explanation from AWS Docs:
If a target group contains only unhealthy registered targets, the load balancer nodes route requests across its unhealthy targets. Health checks for your target groups
How could you fix the health check while the instances are draining because of failed healthchecks.
404 means that the health check URL is not found. Confirm the health check configuration. your health check URL should respond HTTP 200 OK response. If your instances are draining repeatedly, you can temporarily set the health check rule to match HTTP 404 until your instances becomes healthy. Once you figure out the correct health check URL, you can set that.
Hope this helps.
In my case IIS server and resolved with the below steps.
Check the security groups - whether we have opened the required ports from ALB SG to EC2 SG.
Login to server and check does IIS server's default site has 443 port opened if your health-check is on 443. (whatever port you are using for health checks).
Use the curl command to troubleshoot the issue.
If you would like to check on HTTPS use the below command to check the response. Use -k or --insecure to ignore the SSL issue.
curl https://[serverIP] -k
For HTTP test use the below command.
curl http://[serverIP]
One of the reason fo this could be that ALB can not access the EC2 containers. I faced a similar issue in which my Drupal application was running but target group was showing unhealthy.
To resolve this, please check whether you have added ALB's security group on port 80 in EC2's security group.
By doing this, the issue will be resolved.
I was dealing with this issue for 1 day. I finally realized that I had removed the default server configuration from nginx.
this is needed to for de default path that the healthCheck checks

aws ECS, ECS instance is not registered to ALB target group

I create ECS service and it runs 1 ecs instance and I can see the instance is registered as a target of the load balancer.
Now I trigger a Auto Scaling Group (by just incrementing desired instance count) to launch a new instance.
The instance is launched and added to the ECS cluster. (I can see it on ECS instances tab)
But the instance is not added to the ALB target. (I expect to see 2 instances in the following image, but I only see 1)
I can edit AutoScalingGroup 's target group like the following
Then I see the following .
But the health check fails. It seems the 80 port is not reachable.
Although I have port 80 open for public in the security group for the instance. (Also, instance created from ecs service uses dynamic port mapping but instance created by ALS does not)
So AutoScalingGroup can launch new instance but my load balancer never gives traffic to the new instance.
I did try https://aws.amazon.com/premiumsupport/knowledge-center/troubleshoot-unhealthy-checks-ecs/?nc1=h_ls and it shows I can connect to port 80 from host to the docker container by something like curl -v http://${IPADDR}/health.
So it must be the case that there's something wrong with host port 80 (load balancer can't connect to it).
But it is also the case the security group setting is not wrong, because the working instance and this non working instance is using the same SG.
Edit
Because I used dynamic mapping, my webserver is running on some random port.
As you can see the instance started by ecs service has registered itself to target group with random port.
However instance started by ALB has registered itself to target group with port 80.
The instance will not be added to the target group if it's not healthy. So you need to fix the health check first.
From your first instance, your mapped port is 32769 so I assume if this is the same target group and if it is the same application then the port in new instance should be 32769.
When you curl the IP endpoint curl -I -v http://${IPADDR}/health. is the HTTP status code was 200, if it is 200 then it should be healthy if it's not 200 then update the backend http-status code or you can update health check HTTP status code.
I assume that you are also running ECS in both instances, so ECS create target group against each ECS services, are you running some mix services that you need target group in AS group? if you are running dynamic port then remove the health check path to traffic port.
Now if we look the offical possible causes for 502 bad Gateway
Dynamic port mapping is a feature of container instance in Amazon Elastic Container Service (Amazon ECS)
Dynamic port mapping with an Application Load Balancer makes it easier
to run multiple tasks on the same Amazon ECS service on an Amazon ECS
cluster.
With the Classic Load Balancer, you must statically map port numbers
on a container instance. The Classic Load Balancer does not allow you
to run multiple copies of a task on the same instance because the
ports conflict. An Application Load Balancer uses dynamic port mapping
so that you can run multiple tasks from a single service on the same
container instance.
Your created target group will not work with dynamic port, you have to bind the target group with ECS services.
dynamic-port-mapping-ecs
HTTP 502: Bad Gateway
Possible causes:
The load balancer received a TCP RST from the target when attempting to establish a connection.
The load balancer received an unexpected response from the target, such as "ICMP Destination unreachable (Host unreachable)", when attempting to establish a connection. Check whether traffic is allowed from the load balancer subnets to the targets on the target port.
The target closed the connection with a TCP RST or a TCP FIN while the load balancer had an outstanding request to the target. Check whether the keep-alive duration of the target is shorter than the idle timeout value of the load balancer.
The target response is malformed or contains HTTP headers that are not valid.
The load balancer encountered an SSL handshake error or SSL handshake timeout (10 seconds) when connecting to a target.
The deregistration delay period elapsed for a request being handled by a target that was deregistered. Increase the delay period so that lengthy operations can complete.
http-502-issues
It seems you know the root cause, which is that port 80 is failing the health check and thats why it is never added to ALB. Here is what you can try
First, check that your service is listening on port 80 on the new host. You can use command like netcat
nv -v localhost 80
Once you know that the service is listening, the recommended way to allow your ALB to connect to your host is to add a Security group inbound rule for your instance to allow traffic from your ALB security group on port 80

AWS ECS ALB Error (Request timed out)

I am trying to learn/use AWS ECS but keep getting
service has reached a steady state.
Followed by:
service (instance i-05873e2a55ecba2f6) (port 32768) is unhealthy in target-group due to (reason Request timed out)
I'm not really sure which info you need to help, but I was using this load balancer across EC2 instances before, but I am replacing those EC2 instances with ones launched through ECS and now I am running into this error.
My cluster is in my default VPC and I am including all 3 subnets (East zone). The security group is my load balancer security group which allows all traffic on ports 40 and 443. I have tried changing security group so that it allows anyone on any port but that doesn't work.
My host port in my task definition is 0 and my container port is 3000 which is what I exposed in Dockerfile.
The healthcheck is just on the target port at path "/"
This answer summarize a checklist of points to verify when debugging this kind of error:
be the case, there is no route Path /healthcheck in the backend
service
The status code from /healthcheck is not 200
Might be the case that target port is invalid, configure it correctly, if an
application running on port 8080 or 3000 it should be 3000 or 8080
The security group is not allowing traffic on the target group
Application is not running in the container
My problem was the same. Check the inbound rule of the security group of the ALB, there should be something like this.
All traffic / All / All / "sg-xxxxxxxxxxxx" –.
sg-xxxxxxxxxxxx this should be the security group of your application load balancer.
Remember to check the outbound rule of your ALB security group. Target group health check actually issues the request from ALB. So if your ALB is not allowed to talk to your target, it will also fail.