Route53 health check shows OK while the endpoint is down - amazon-web-services

I'm not sure how it's possible, but I set up a Route 53 health check with email alerting if our endpoint goes down.
It is definitely down because the EC2 hosting it is powered off.
❯ telnet foo.io 443
Trying 18.18.18.18...
telnet: connect to address 18.18.18.18: Operation timed out
telnet: Unable to connect to remote host
Is it possible that the checker has cached something? Although we don't use anything in between and it's supposed to hit the EC2 directly.

I think you have left your health check disabled
That's what the doc states
Stops Route 53 from performing health checks. When you disable a health check, Route 53 stops aggregating the status of the referenced health checks.
After you disable a health check, Route 53 considers the status of the health check to always be healthy. If you configured DNS failover, Route 53 continues to route traffic to the corresponding resources.
Maybe that's why you see it passing

Related

ALB Target groups showing Unhealthy, though my application is running fine

I have microservices deployed in containers, which are running fine and we are able to access with ALBendpoint/microservice.
But my target group which attached to ALB is showing the "Unhealthy" status.
Errors in AWS console:
None of these Availability Zones contains a healthy target. Requests are being routed to all targets.
Health checks failed with these codes: [404]
I am seeing two issues here.
Why the application is running fine when the healthcheck fails. here is the explanation from AWS Docs:
If a target group contains only unhealthy registered targets, the load balancer nodes route requests across its unhealthy targets. Health checks for your target groups
How could you fix the health check while the instances are draining because of failed healthchecks.
404 means that the health check URL is not found. Confirm the health check configuration. your health check URL should respond HTTP 200 OK response. If your instances are draining repeatedly, you can temporarily set the health check rule to match HTTP 404 until your instances becomes healthy. Once you figure out the correct health check URL, you can set that.
Hope this helps.
In my case IIS server and resolved with the below steps.
Check the security groups - whether we have opened the required ports from ALB SG to EC2 SG.
Login to server and check does IIS server's default site has 443 port opened if your health-check is on 443. (whatever port you are using for health checks).
Use the curl command to troubleshoot the issue.
If you would like to check on HTTPS use the below command to check the response. Use -k or --insecure to ignore the SSL issue.
curl https://[serverIP] -k
For HTTP test use the below command.
curl http://[serverIP]
One of the reason fo this could be that ALB can not access the EC2 containers. I faced a similar issue in which my Drupal application was running but target group was showing unhealthy.
To resolve this, please check whether you have added ALB's security group on port 80 in EC2's security group.
By doing this, the issue will be resolved.
I was dealing with this issue for 1 day. I finally realized that I had removed the default server configuration from nginx.
this is needed to for de default path that the healthCheck checks

ECS Fargate + Network Load Balancer Healthcheck

I'm experiencing an issue with the following setup:
API Gateway -> VPC Link -> Private NLB -> Target Group -> AWS ECS Fargate
If I setup the NLB's Health Check to be TCP/HTTP on a specified endpoint, that endpoint gets hammered to the death with internal request (no requests are coming through the API Gateway, I checked):
My problem with this behaviour, other than having the health's endpoint spammed by my own architecture is that the application's functionality is suffering (I keep getting slow responses 1 out of 4 get request to the API).
I tried to modify the Health Check's behaviour to only TCP, same slow responses.
I tried temporarily switching to a public ALB, I'm incurring in double health-checks, separated by 30 seconds but my application is responding with an average of 100 ms.
So, as an example of what I mean by "double health-checks":
Health Check 1.1 at 00:00:00
Health Check 2.1 at 00:00:10
Health Check 1.2 at 00:00:30
Health Check 2.2 at 00:00:40
Any ideas?
TL/DR;
Enable the "Cross-Zone Load Balancing" NLB flag.
The issue was the "cross-availability zone" not checked out.
It seems that when a request gets processed by a NLB-node which resides in a different AZ from the one that it is trying to be redirecting, it tries to internally resolve the IP in the AZ, if it fails, it redirects the request to another NLB-node in the appropriate AZ, which will be able to do so, hence reaching the target.

AWS load balancer IIS

I have configured app load balancer on amazon. Set up DNS LB to route 53 with alias for A. Behind LB i have 2 instances with IIS. If i set up 2 sites on both instances, balancer automatically balance client by rotation
(as i know round robin). But, if i turn off site on IIS in one instance, load balancer continue go to that instance and if i go to exapmle.com i will have one time worked site and if refresh the page i will have error (because site turned off in IIS). Could you please tell me, how can i set up load balance to route traffic in working instance if one of them not working. Thank you
Load balancers continue to distribute the traffic on healthy servers. If it is not happening in your case, I would recheck the health check configuration under Target Groups.
You need to modify the port/path so that health checks start failing once the site is turned off. Only then, the load balancer will pass all traffic to healthy host, not the unhealthy host
What does the LB health checks say? If the back-end instances are not listening on the health check port then LB marks it as unhealthy and stops forwarding requests to it. If you are using Application loadbalancer then I think you can get the health check status within the target groups associated with the loadbalancer.

EC2 instance attached to a load balancer is showing Unhealthy status

I created a load balancer and assigned it one of the running EC2 instance. After creation, I navigated to Target Group section in the AWS Console under Load Balancing and when I selected the target group that was assigned to the load balancer, it shows registered instance status as "Unhealthy" and there was a message above registered instance pane that says "None of these Availability Zones contains a healthy target. Requests are being routed to all targets". While creating the load balancer, I selected all the subnets (availability zones).
settings I used for health check are mentioned below,
Protocol: HTTP
Path: /healthcheck.html
Port: traffic port
Healthy threshold: 3
Unhealthy threshold: 2
Timeout: 5
Interval: 10
Success codes: 200
So why does my registered instance status as "Unhealthy" and how can I rectify/resolve that to change the status to "In-service"?
Unhealthy indicates that the health check is failing for the instance.
Things to check:
Check that the instance is running a web server
Check that the web page at healthcheck.html responds with a valid 200 response
Check that instance has a security group that permits access on Port 80 (HTTP)
In my case health check configuration on ALB is / with https.
I resolved with below steps.
Check the security groups - whether we have opened the required ports from ALB SG to EC2 SG.
Login to server and check does IIS server's default site has 443 port opened if your health-check is on 443. (whatever port you are using for health checks).
Use the curl command to troubleshoot the issue.
If you would like to check on HTTPS use the below command to check the response. Use -k or --insecure to ignore the SSL issue.
curl https://[serverIP] -k
For HTTP test use the below command.
curl http://[serverIP]
If you are sharing the load balancer among several EC2 instances that run similar services, make sure each of your services run in a different port otherwise your service won't be reachable and therefore your health check won't pass

Why Elastic load balancer states "out of service", when I added all instances

I created a Elastic load balancer in my virtual private cloud.
I had added all my existing instances to Elastic load balancer, it shows "out of service" message with a hint "Instance has failed at least the UnhealthyThreshold number of health checks consecutively."
first need to test our health check log in server.
steps
connect to our server using SSH
run this command "tail -f /var/log/apache2/access.log", then you will get response code.(200 - OK, 302 - Redirecting).
Response code other than 200, means health check fails.
In my case I found that the response code was 302, it means that is redirecting issue.
Just I ran my url in browser, it redirecting to x.com/login
For that I need to change my ping path in Load balancer, just I opened my loadbalancer,and changed my ping path from '/' to '/login', then automatically my response changed from 302 to 200. and my instance is "In Service".
Here is the list of typical reasons this might happen: Good place to start:
http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/ts-elb-healthcheck.html
Problem: Instance(s) closing the connection to the load balancer.
Cause: Elastic Load Balancing terminates a connection if it is idle for more than 60 seconds. The idle connection is established when there is no read or write event taking place on both the sides of the load balancer (client to load balancer and load balancer to the back-end instance).
Solution: Set the timeout settings on your registered instances to at least 60 seconds.
Problem: Responses timing out.
Cause: When the load balancer performs a health check, the instance may be under significant load and may take longer than your configured timeout interval to respond.
Solution: Try adjusting the timeout on your health check settings.
Problem: Non-200 response received.
Cause: When the load balancer performs an HTTP/HTTPS health check, the instance must return a 200 HTTP code. Any other response code will be considered a failed health check.
Solution: Search your application logs for responses sent to the health check requests.
Problem: Failing public key authentication.
Cause: If you are using an HTTPS or SSL load balancer with back-end authentication enabled, the public key authentication will fail if the public key on the certificate does not match the public key configured on the load balancer.
Solution: Check if your SSL certificate needs to be updated. If your SSL certificate is current, try re-installing the certificate on your load balancer.