HTTP ERROR 408 - After setting up kubernetes , along with AWS ELB and NGINX Ingress

HTTP ERROR 408 - After setting up kubernetes , along with AWS ELB and NGINX Ingress - amazon-web-services

I am finding it extremely hard to debug this issue, I have a Kubernetes cluster setup along with services for the pods, they are connected to the Nginx ingress and connected to was elb classic, which also connects to the AWS route53 DNS my domain name is connected to. Everything works fine with that, but then am facing an issue where my domains do not behave the way I would like them to.
My domains in the Nginx-ingress-rules are connected to a service which sends alive page when hit with a domain, now when I do that I get this page.
Please help me what what to do to resolve this quickly, thanks in advance!
Talk to you soon
enter image description here

While you are using web servers behind ELB you must know that they generate a lot of 408 responses due to their health checks.
Possible solutions:
1. Set RequestReadTimeout header=0 body=0
This disables the 408 responses if a request times out.
2. Disable logging for the ELB IP addresses with:
SetEnvIf Remote_Addr "10\.0\.0\.5" exclude_from_log
CustomLog logs/access_log common env=!exclude_from_log
3. Set up different port for ELB health check.
4. Adjust your request timeout higher than 60.
5. Make sure that the idle time configured on the Elastic Loadbalancer is slightly lower than the idle timeout configured for the Apache httpd running on each of the instances.
Take a look: amazon-aws-http-408, haproxy-elb, 408-http-elb.

Related

Getting 5xx error with AWS Application Load Balancer - fluctuating healthy and unhealthy target group

My web application on AWS EC2 + load balancer sometimes shows 500 errors. How do I know if the error is on the server side or the application side?
I am using Route 53 domain and ssl on my url. I set the ALB redirect requests on port 80 to 443, and forward requests on port 443 to the target group (the EC2). However, the target group is returning 5xx error code sometimes when handling the request. Please see the screenshots for the metrics and configurations for the ALB.
Target Group Metrics
Target Group Configuration
Load Balancer Metrics
Load Balancer Listeners
EC2 Metrics
Right now the web application is running unsteady, sometimes it returns a 502 or 503 service unavailable (seems like it's a connnection timeout).
I have set up the ALB idle timeout 4000 secs.
ALB configuration
The application is using Nuxt.js + PHP7.0 + MySQL + Apache 2.4.54.
I have set the Apache prefork worker Maxclient number as 1000, which should be enough to handle the requests on the application.
The EC2 is a t2.Large resource, the CPU and Memory look enough to handle the processing.
It seems like if I directly request the IP address but not the domain, the amount of 5xx errors significantly reduced (but still exists).
I also have Wordpress application host on this EC2 in a subdomain (CNAME). I have never encountered any 5xx errors on this subdomain site, which makes me guess there might be some errors in my application code but not on the server side.
Is the 5xx error from my application or from the server?
I also tried to add another EC2 in the target group see if they can have at lease one healthy instance to handle the requests. However, the application is using a third-party API and has strict IP whitelist policy. I did some research that the Elastic IP I got from AWS cannot be attached to 2 different EC2s.

First of all, if your application is prone to stutters, increase healthcheck retries and timeouts, which will affect your initial question of flapping health.
To what I see from your screenshot, most of your 5xx are due to either server or application (you know obviously better what's the culprit since you have access to their logs).
To answer your question about 5xx errors coming from LB: this happens directly after LB kicks out unhealthy instance and if there's none to replace (which shouldn't be the case because you're supposed to have ASG if you enable evaluation of target health for LB), it can't produce meaningful output and thus crumbles with 5xx.
This should be enough information for you to make adjustments and logs investigation.

Google cloud load balancer causing error 502 - failed_to_pick_backend

I've got an error 502 when I use google cloud balancer with CDN, the thing is, I am pretty sure I must have done something wrong setting up the load balancer because when I remove the load balancer, my website runs just fine.
This is how I configure my load balancer
here
Should I use HTTP or HTTPS healthcheck, because when I set up HTTPS
healthcheck, my website was up for a bit and then it down again
I have checked this link, they seem to have the same problem but it is not working for me.
I have followed a tutorial from openlitespeed forum to set Keep-Alive Timeout (secs) = 60s in server admin panel and configure instance to accepts long-lived connections ,still not working for me.
I have added these 2 firewall rules following this google cloud link to allow google health check ip but still didn’t work:
https://cloud.google.com/load-balancing/docs/health-checks#fw-netlb
https://cloud.google.com/load-balancing/docs/https/ext-http-lb-simple#firewall
When checking load balancer log message, it shows an error saying failed_to_pick_backend . I have tried to re-configure load balancer but it didn't help.
I just started to learn Google Cloud and my knowledge is really limited, it would be greatly appreciated if someone could show me step by step how to solve this issue. Thank you!

Posting an answer - based on OP's finding to improve user experience.
Solution to the error 502 - failed_to_pick_backend was changing Load Balancer from HTTP to TCP protocol and at the same type changing health check from HTTP to TCP also.
After that LB passes through all incoming connections as it should and the error dissapeared.
Here's some more info about various types of health checks and how to chose correct one.

The error message that you're facing it's "failed_to_pick_backend".
This error message means that HTTP responses code are generated when a GFE was not able to establish a connection to a backend instance or was not able to identify a viable backend instance to connect to
I noticed in the image that your health-check failed causing the aforementioned error messages, this Health Check failing behavior could be due to:
Web server software not running on backend instance
Web server software misconfigured on backend instance
Server resources exhausted and not accepting connections:
- CPU usage too high to respond
- Memory usage too high, process killed or can't malloc()
- Maximum amount of workers spawned and all are busy (think mpm_prefork in Apache)
- Maximum established TCP connections
Check if the running services were responding with a 200 (OK) to the Health Check probes and Verify your Backend Service timeout. The Backend Service timeout works together with the configured Health Check values to define the amount of time an instance has to respond before being considered unhealthy.
Additionally, You can see this troubleshooting guide to face some error messages (Including this).

Those experienced with Kubernetes from other platforms may be confused as to why their Ingresses are calling their backends "UNHEALTHY".
Health checks are not the same thing as Readiness Probes and Liveness Probes.
Health checks are an independent utility used by GCP's Load Balancers and perform the exact same function, but are defined elsewhere. Failures here will lead to 502 errors.
https://console.cloud.google.com/compute/healthChecks

Enabling SSL on Rails 4 with AWS Load Balancer Ngnix and Puma

I have tried unsuccessfully to configure SSL for my project.
My AWS load balancer is configured correctly and accepts the certificate keys. I have configured the listeners to route both port 80 traffic and port 443 traffic to my port 80 on the instance.
I would imagine that no further modification is necessary on the instance (Nginx and Puma) since everything is routed to port 80 on the instance. I have seen examples where the certificate is installed on the instances but I understand the load balancer is the SSL termination point so this is not necessary.
When accessing via http://www.example.com eveything works fine. However, accessing via https://www.example.com times out.
I would appreciate some help with the proper high-level setup.
Edit: I have not received any response to this question. I assume it is too general?
I would appreciate confirmation that the high level reasoning I am using is the right one. I should install the certificate in the load balancer only and configure the load balancer to accept connections on the 443 port, BUT route everything on the 80 port internally to the web server instances.

I just stumbled over this question as I had the same problem: All requests to https://myapp.com timed-out and I could not figure out why. Here in short how I could achieve (forced) HTTPS in a Rails app on AWS:
My app:
Rails 5 with enabled config.force_ssl = true (production.rb) - so all connections coming from HTTP will get re-routed to HTTPS in the Rails App. No need to set-up difficult nginx rules. The same app used the gem 'rack-ssl-enforcer' as it was on Rails 4.2.
Side note: AWS LoadBalancers used in the past HTTP GET requests to check the health of the instances (today they support HTTPS). Therefore exception rules had to be defined for the SSL enforcement: Rails 5: config.ssl_options = { redirect: { exclude: -> request { request.path =~ /health-check/ } } } (in production.rb) with a respective route to a controller in the Rails App.
Side note to side note: In Rails 5, the initializer new_framework_defaults.rb has already defined "ssl_options". Make sure to deactivate this before using the "ssl_options" rule in production.rb.
AWS:
Elastic Beanstalk set-up on AWS with a valid cert on the Load Balancer using two Listener rules:
HTTP 80 requests on LB get directed to HTTP 80 on the instances
HTTPS 443 requests on LB get directed to HTTP 80 on the instances (here the certificate needs to be applied)
You see that the Load Balancer is the termination point of SSL. All requests coming from HTTP will go through the LB and will then be redirected by the Rails App to HTTPS automatically.
The thing no one tells you
With this in place, the HTTPS request will still time-out (here I spent days! to figure out why). In the end it was an extremely simple issue: The Security Group of the LoadBalancer (in AWS console -> EC2 -> Security Groups) only accepts requests on Port 80 (HTTP) -> Just activate Port 443 (HTTPS) as well. It should then work (at least it did for me).

I don't know if you managed your problem but for whoever may find this question here is what I did to get it working.
I've been all day reading and found a mix of two configurations that at this moment are working
Basically you need to configure nginx to redirect to https, but some of the recommended configurations do nothing to the nginx config.
Basically I'm using this gist configuration:
https://gist.github.com/petelacey/e35c98f9a35063a89fa9
But from this configuration I added the command to restart the nginx server:
https://gist.github.com/KeithP/f8534c04d20c2b4e4b1d
My take on this is that when the eb deploy process manages to copy the config files nginx has already started(?) making those changes useless. Hence the need to manually restarted, if some has a better approach let us know

Michael Fehr's answer worked and should be the accepted answer. I had the same problem, adding the config.force_ssl = true is what I missed. With the remark that you don't need to add the ebs configuration file they say you have to add if you are using the load balancer. That can be misleading and they do not specify it in the docs

AWS ELB fails healthcheck with http ports other than 80

We've been using AWS for a while and have also setup many ELB's. The problem we have is that we have multiple sites running on multiple servers. All IIS 7.5 sites are on each of 3 web servers. We utilize ports 80 and 443 with all bindings for sites setup correctly with domains/subdomains. We have an ELB for each site.
The problem we have is that each ELB is currently setup with a healthcheck of HTTP:80/ so each ELB is not really checking the health of it's respective site.
What we'd like to do is setup each site to listen to a different extra port (i.e., 8082, 8083, etc.) and have each ELB's healthcheck check the site's extra port (i.e., HTTP:8082/, HTTP:8083/, etc.). The ports are opened on the firewalls and in the security groups correctly and we can hit each site on their respective extra port (i.e., http://web1.mysite.com:8082/).
AWS's documentation says you should be able to do what we're trying to do, but the healthchecks of the instances don't pass. I've even gone so far as to define a listener for the respective port. Just to confirm, I can set the check to be HTTP:80/ and the instance comes "InService", but when I change it to HTTP:8082/, it immediately goes out of service. This is driving me nuts so any help would be greatly appreciated.

We determined that we can fix this by setting up each site's extra port binding to it's corresponding port with a blank host name. Also, by assuring that the "Default Site" is turned off (in our case it was). The binding hostname was set to the site's name before (i.e., www.mysite.com).

Why does Elastic Load Balancing report 'Out of Service'?

I am trying to set up Elastic Load Balancing (ELB) in AWS to split the requests between multiple instances. I have created several images of my webserver based on the same AMI, and I am able to ssh into each individually and access the site via each distinct public DNS.
I have added each of my instances to the load balancer, but they all come back with the Status: Out of Service because they failed the health check. I'm mostly confused because I can access each instance from its public DNS, but I get a timeout whenever I visit the load balancer DNS name.
I've been trying to read through all the docs and googling it, but I'm stuck. Any pointers or links in the right direction would be greatly appreciated.

I contacted AWS support about this same issue. Apparently their system doesn't know how to handle cases were all of the instances behind the ELB are stopped for an extended amount of time. AWS support can manually refresh the statuses, if you need them up immediately.
The suggested fix it to de-register the ec2 instances from the ELB instead of just stopping them and re-register them when you start again.

Health check is (by default) made by accessing index.html on each instance incorporated in load balancer. If you don't have index.html in document root of instance - default health check will fail. You can set custom protocol, port and path for health check when creating elastic load balancer.

Finally I got this working. The issue was with the Amazon Security Groups, because I've restricted the access to port 80 to few machines on my development area and the load balancer could not access the apache server on the instance. Once the load balancer gained access to my instance, it gets In Service.
I checked it with tail -f /var/log/apache2/access.log in my instance, to verify if the load balancer was trying to access my server, and to see the answer the server is giving to the load balancer.
Hope this helps.

If your web server is running fine, then it means the health check goes on a url that doesn't return 200.
A trick that works for me : go on the instance, type curl localhost:80/pathofyourhealthcheckurl
After you can adapt your health check url to always have a 200 response.

In my case, the rules on security groups assigned to the instance and the load balancer were not allowing traffic to pass between the two. This caused the health check to fail.

I to faced same issue , i changed Ping Protocol from https to ssl .. it worked !
Go to Health Check --> click on Edit Health Check -- > change Ping protocol from HTTPS to SSL
Ping Target SSL:443
Timeout 5 seconds
Interval 30 seconds
Unhealthy Threshold 5
Healthy Threshold 10

For anyone else that sees this thread as this isn't listed:
Check that the health check is checking the port that the responding server is listening on.
E.g. node.js running on port 3000 -> Point healthcheck to port 3000;
Not port 80 or 443. Those are what your ALB will be using.
I spent a morning on this. Yes.

I would like to provide you a general way to solve this problem. When you have set up you web server like apache or nginx, try to read the access log file to see what happened. In my occasion, it report 401 error because I have add the basic auth in nginx. Of course, just like #ivankoni remind, it may because of the document you check is not exist.

I was working on the AWS Tutorial on hosting a web app and ran into this problem. Step 7b states the following:
"Set Ping Path to /. This sends queries to your default page, whether
it is named index.html or something else."
They could have put the forward slash in quotations like this "/". Make sure you have that in your health checks and not this "/." .

Adding this because I've spent hours trying to figure it out...
If you configured your health check endpoint but it still says Out of Service, it might be because your server is redirecting the request (i.e. returning a 301 or 302 response).
For example, if your endpoint is supposed to be /app/health/ but you only enter /app/health (no trailing slash) into the health check endpoint field on your ELB, you will not get a 200 response, so the health check will fail.

I had a similar issue. The problem appears to have been caused due to my using a HTTP health check and also using .htaccess to password protect the site.

I got the same error, in my case had to copy the particular html file from s3 bucket to "/var/www/html" location. The same html referenced in load balancer path.
The issue resolved after copying html file.

I had this issue too, and it was due to both my inbound and outbound rule for the Load Balancer's Security Group only allowing HTTP traffic on port 80. I needed to add another rule for HTTPS traffic on port 443.

I was also facing that same issue,
where ELB (Classic-Load-Balancer) try to request /index.html not / (root) while health check.
If it unable to find /index.html resource it says 'OutOfService'. Be Sure index.html should be available.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js