Why does Elastic Load Balancing report 'Out of Service'? - amazon-web-services

I am trying to set up Elastic Load Balancing (ELB) in AWS to split the requests between multiple instances. I have created several images of my webserver based on the same AMI, and I am able to ssh into each individually and access the site via each distinct public DNS.
I have added each of my instances to the load balancer, but they all come back with the Status: Out of Service because they failed the health check. I'm mostly confused because I can access each instance from its public DNS, but I get a timeout whenever I visit the load balancer DNS name.
I've been trying to read through all the docs and googling it, but I'm stuck. Any pointers or links in the right direction would be greatly appreciated.

I contacted AWS support about this same issue. Apparently their system doesn't know how to handle cases were all of the instances behind the ELB are stopped for an extended amount of time. AWS support can manually refresh the statuses, if you need them up immediately.
The suggested fix it to de-register the ec2 instances from the ELB instead of just stopping them and re-register them when you start again.

Health check is (by default) made by accessing index.html on each instance incorporated in load balancer. If you don't have index.html in document root of instance - default health check will fail. You can set custom protocol, port and path for health check when creating elastic load balancer.

Finally I got this working. The issue was with the Amazon Security Groups, because I've restricted the access to port 80 to few machines on my development area and the load balancer could not access the apache server on the instance. Once the load balancer gained access to my instance, it gets In Service.
I checked it with tail -f /var/log/apache2/access.log in my instance, to verify if the load balancer was trying to access my server, and to see the answer the server is giving to the load balancer.
Hope this helps.

If your web server is running fine, then it means the health check goes on a url that doesn't return 200.
A trick that works for me : go on the instance, type curl localhost:80/pathofyourhealthcheckurl
After you can adapt your health check url to always have a 200 response.

In my case, the rules on security groups assigned to the instance and the load balancer were not allowing traffic to pass between the two. This caused the health check to fail.

I to faced same issue , i changed Ping Protocol from https to ssl .. it worked !
Go to Health Check --> click on Edit Health Check -- > change Ping protocol from HTTPS to SSL
Ping Target SSL:443
Timeout 5 seconds
Interval 30 seconds
Unhealthy Threshold 5
Healthy Threshold 10

For anyone else that sees this thread as this isn't listed:
Check that the health check is checking the port that the responding server is listening on.
E.g. node.js running on port 3000 -> Point healthcheck to port 3000;
Not port 80 or 443. Those are what your ALB will be using.
I spent a morning on this. Yes.

I would like to provide you a general way to solve this problem. When you have set up you web server like apache or nginx, try to read the access log file to see what happened. In my occasion, it report 401 error because I have add the basic auth in nginx. Of course, just like #ivankoni remind, it may because of the document you check is not exist.

I was working on the AWS Tutorial on hosting a web app and ran into this problem. Step 7b states the following:
"Set Ping Path to /. This sends queries to your default page, whether
it is named index.html or something else."
They could have put the forward slash in quotations like this "/". Make sure you have that in your health checks and not this "/." .

Adding this because I've spent hours trying to figure it out...
If you configured your health check endpoint but it still says Out of Service, it might be because your server is redirecting the request (i.e. returning a 301 or 302 response).
For example, if your endpoint is supposed to be /app/health/ but you only enter /app/health (no trailing slash) into the health check endpoint field on your ELB, you will not get a 200 response, so the health check will fail.

I had a similar issue. The problem appears to have been caused due to my using a HTTP health check and also using .htaccess to password protect the site.

I got the same error, in my case had to copy the particular html file from s3 bucket to "/var/www/html" location. The same html referenced in load balancer path.
The issue resolved after copying html file.

I had this issue too, and it was due to both my inbound and outbound rule for the Load Balancer's Security Group only allowing HTTP traffic on port 80. I needed to add another rule for HTTPS traffic on port 443.

I was also facing that same issue,
where ELB (Classic-Load-Balancer) try to request /index.html not / (root) while health check.
If it unable to find /index.html resource it says 'OutOfService'. Be Sure index.html should be available.

Related

GCP external http load balancer 502 server error:"failed_to_connect_to_backend"

I have configured a http external load balancer on GCP and all my vm instances are healthy in backend.
But when i am trying to access my server(installed on VM) from frontend static IP that is reserved at load balancer it is giving me 502 status error.
As a result of which i am unable to launch my application server using load balancer. Help me fix this issue.
Thanking you in advance.
To troubleshoot 502 response from the Load Balancer due to "failed_to_connect_to_backend." I would check the followings:
Usually, "failed_to_connect_to_backend" error message indicates that the load balancer is failing to connect to backends, investigating URL map rules is also a good point to start. I would also suggest reviewing your Load Balancer's URL map to make sure that Host rules, Path matcher, and Path rules are correctly defined and comply with descriptions in this article.
Also check if the backend instances are exhausting their resources, If a backend server is overwhelmed, it will refuse incoming requests, potentially causing the load balancer to give up on it and return the specific 502 error you're experiencing. Also, check the output on how many established connections are present at any one time using 'netstat' and watch command.
I would also recommend testing again with the HTTP(S) request directly to the instance, request the same URL that reporting 502. You might do this test in another VM instance in your VPC network.
maybe you should check if the time taken for the API to return the response is exceeded the timeout that will trigger the 502. The default value is 30 seconds.
Ref: https://cloud.google.com/load-balancing/docs/backend-service#timeout-setting

HTTP ERROR 408 - After setting up kubernetes , along with AWS ELB and NGINX Ingress

I am finding it extremely hard to debug this issue, I have a Kubernetes cluster setup along with services for the pods, they are connected to the Nginx ingress and connected to was elb classic, which also connects to the AWS route53 DNS my domain name is connected to. Everything works fine with that, but then am facing an issue where my domains do not behave the way I would like them to.
My domains in the Nginx-ingress-rules are connected to a service which sends alive page when hit with a domain, now when I do that I get this page.
Please help me what what to do to resolve this quickly, thanks in advance!
Talk to you soon
enter image description here
While you are using web servers behind ELB you must know that they generate a lot of 408 responses due to their health checks.
Possible solutions:
1. Set RequestReadTimeout header=0 body=0
This disables the 408 responses if a request times out.
2. Disable logging for the ELB IP addresses with:
SetEnvIf Remote_Addr "10\.0\.0\.5" exclude_from_log
CustomLog logs/access_log common env=!exclude_from_log
3. Set up different port for ELB health check.
4. Adjust your request timeout higher than 60.
5. Make sure that the idle time configured on the Elastic Loadbalancer is slightly lower than the idle timeout configured for the Apache httpd running on each of the instances.
Take a look: amazon-aws-http-408, haproxy-elb, 408-http-elb.

ALB Target groups showing Unhealthy, though my application is running fine

I have microservices deployed in containers, which are running fine and we are able to access with ALBendpoint/microservice.
But my target group which attached to ALB is showing the "Unhealthy" status.
Errors in AWS console:
None of these Availability Zones contains a healthy target. Requests are being routed to all targets.
Health checks failed with these codes: [404]
I am seeing two issues here.
Why the application is running fine when the healthcheck fails. here is the explanation from AWS Docs:
If a target group contains only unhealthy registered targets, the load balancer nodes route requests across its unhealthy targets. Health checks for your target groups
How could you fix the health check while the instances are draining because of failed healthchecks.
404 means that the health check URL is not found. Confirm the health check configuration. your health check URL should respond HTTP 200 OK response. If your instances are draining repeatedly, you can temporarily set the health check rule to match HTTP 404 until your instances becomes healthy. Once you figure out the correct health check URL, you can set that.
Hope this helps.
In my case IIS server and resolved with the below steps.
Check the security groups - whether we have opened the required ports from ALB SG to EC2 SG.
Login to server and check does IIS server's default site has 443 port opened if your health-check is on 443. (whatever port you are using for health checks).
Use the curl command to troubleshoot the issue.
If you would like to check on HTTPS use the below command to check the response. Use -k or --insecure to ignore the SSL issue.
curl https://[serverIP] -k
For HTTP test use the below command.
curl http://[serverIP]
One of the reason fo this could be that ALB can not access the EC2 containers. I faced a similar issue in which my Drupal application was running but target group was showing unhealthy.
To resolve this, please check whether you have added ALB's security group on port 80 in EC2's security group.
By doing this, the issue will be resolved.
I was dealing with this issue for 1 day. I finally realized that I had removed the default server configuration from nginx.
this is needed to for de default path that the healthCheck checks

AWS load balancer and maintenance page

I'm using AWS Load Balancer with 3 EC2 servers, and I'm trying to serve a Maintenance page when site is under maintenance.
This page need to return 503 HTTP code, because it is a proper code for a maintenance mode and will prevent possible problems with SEO.
When I return 503 code from any of my servers, Load Balancer makes it "Not In Service", and when all servers return 503, website returns a blank page (because all servers are disconnected).
My questions are:
1) Is there any way to serve a custom static page with a message for visitors from Load balancer if there is no healthy servers?
2) Or how to configure Load Balancer's Health Check that it will not consider 503 as a reason to mark server as "unhealthy"?
Thanks!
I've been searching for a quick way to do this. We need to return a 503 error to the world during DB upgrade, but white list a few IPs of developers so they can test it before opening back up to public.
Found a one spot solution::
Go to the Loader Balancer in EC2 and select the load balancer you would like to target. Below, you should see Listeners. Click on a listener, and edit the rule. Create a rule like this:
Now everyone gets a pretty maintenance page returned with a 503 error code, and only two IP addresses in the first rule will be able to browse to the site. Order is important, where the two IP exceptions are on top, then it goes down the list. The last item is always there by default.
Listener Rules for Your Application Load Balancer:
https://docs.aws.amazon.com/elasticloadbalancing/latest/application/listener-update-rules.html
You could implement an additional route in your app server, let's say /hcm (for health check maintenance), that always responds 200 OK. When it's time for maintenance, you programmatically modify the ELB health check to use the /hcm target which returns 200 OK rather than / or /index.html, which both return 503 Service Unavailable. Revert these changes when exiting maintenance.
Might not meet your 503 requirement but a good option for this is using s3 and dns failover: https://aws.amazon.com/blogs/aws/create-a-backup-website-using-route-53-dns-failover-and-s3-website-hosting/
The load balancer will serve a 503 for you when you no longer have any healthy server behind it so you should not do anything special.
If you return anything but a 200 on the health check, ELB will take the machine out of the load balancer after it fails the configured number of health checks.
So to recap, you can potentially serve 503 from your app when in maintenance, but you have to return 200 for health checks all the time. If you don't care about the content of the page, you can simply remove the machines from the load balancer (or fail health checks) and the LB will do the right thing for you.

Instance status is OutOfService in Load balancer

I have created a load balancer in amazon AWS.I created the load balancer in order to set up an ssl in server which already had another domain with SSL.The load balancer was working fine till today.But sometime before I noticed that the status of the instance has changed to Outofservice.
Im new to aws and couldnt find what is going wrong.
My health check is set as
Please help out.
Here is my checklist to troubleshoot this type of issue
Is the Security group of your instance OK ? ELB needs to have access to your instance for the health check
Is your Web / App server correctly running on the instance ? Does it accept connection requests ?
Is the HTTP return code of your health check URL 200 ? If your healthcheck URL returns anything else (a 30x redirect for example), ELB will consider your instance invalid. You can check this with curl -I on Linux instances.
HTH
--Seb