AWS Application ELB swallowing the 5XXs errors?

AWS Application ELB swallowing the 5XXs errors? - amazon-web-services

Since one week we are using "application" ELB for our applications. In ELB monitoring we couldn't see any 5XXs responses, even though there were many 5XXs in our application accesslogs.
Maybe it could be a configuration error!?

You are getting 5xx in the application logs but not in ELB metrics. If there is 5xx in application logs It's for that application which connects to.
It's not for the load balancer. So ELB is not receiving 504.

Related

Getting 5xx error with AWS Application Load Balancer - fluctuating healthy and unhealthy target group

My web application on AWS EC2 + load balancer sometimes shows 500 errors. How do I know if the error is on the server side or the application side?
I am using Route 53 domain and ssl on my url. I set the ALB redirect requests on port 80 to 443, and forward requests on port 443 to the target group (the EC2). However, the target group is returning 5xx error code sometimes when handling the request. Please see the screenshots for the metrics and configurations for the ALB.
Target Group Metrics
Target Group Configuration
Load Balancer Metrics
Load Balancer Listeners
EC2 Metrics
Right now the web application is running unsteady, sometimes it returns a 502 or 503 service unavailable (seems like it's a connnection timeout).
I have set up the ALB idle timeout 4000 secs.
ALB configuration
The application is using Nuxt.js + PHP7.0 + MySQL + Apache 2.4.54.
I have set the Apache prefork worker Maxclient number as 1000, which should be enough to handle the requests on the application.
The EC2 is a t2.Large resource, the CPU and Memory look enough to handle the processing.
It seems like if I directly request the IP address but not the domain, the amount of 5xx errors significantly reduced (but still exists).
I also have Wordpress application host on this EC2 in a subdomain (CNAME). I have never encountered any 5xx errors on this subdomain site, which makes me guess there might be some errors in my application code but not on the server side.
Is the 5xx error from my application or from the server?
I also tried to add another EC2 in the target group see if they can have at lease one healthy instance to handle the requests. However, the application is using a third-party API and has strict IP whitelist policy. I did some research that the Elastic IP I got from AWS cannot be attached to 2 different EC2s.

First of all, if your application is prone to stutters, increase healthcheck retries and timeouts, which will affect your initial question of flapping health.
To what I see from your screenshot, most of your 5xx are due to either server or application (you know obviously better what's the culprit since you have access to their logs).
To answer your question about 5xx errors coming from LB: this happens directly after LB kicks out unhealthy instance and if there's none to replace (which shouldn't be the case because you're supposed to have ASG if you enable evaluation of target health for LB), it can't produce meaningful output and thus crumbles with 5xx.
This should be enough information for you to make adjustments and logs investigation.

AWS - Difference between ELB_5XXs and HTTP_5XXs

In AWS Cloud watch, what is the difference between ELB_5xx and Http_5xx errors? And then there is also Backend_connection_errors.

HTTP errors served to the user can occur at either the load balancer or target level.
Any metrics that are prefixed with ELB originate at the load balancer level, there are many reasons why these can occur. For a 5XX error it could be a problem with connecting to your targets or could be that the throughput to the load balancer was too high (ELBs scale too based on traffic). More information about these errors are available in the Troubleshoot your Application Load Balancers document.
For the Http_ errors these will be coming from the target itself, essentially your application has returned this HTTP status back to the client. For debugging these you would look in your application logs to identify the root cause.

GCP external http load balancer 502 server error:"failed_to_connect_to_backend"

I have configured a http external load balancer on GCP and all my vm instances are healthy in backend.
But when i am trying to access my server(installed on VM) from frontend static IP that is reserved at load balancer it is giving me 502 status error.
As a result of which i am unable to launch my application server using load balancer. Help me fix this issue.
Thanking you in advance.

To troubleshoot 502 response from the Load Balancer due to "failed_to_connect_to_backend." I would check the followings:
Usually, "failed_to_connect_to_backend" error message indicates that the load balancer is failing to connect to backends, investigating URL map rules is also a good point to start. I would also suggest reviewing your Load Balancer's URL map to make sure that Host rules, Path matcher, and Path rules are correctly defined and comply with descriptions in this article.
Also check if the backend instances are exhausting their resources, If a backend server is overwhelmed, it will refuse incoming requests, potentially causing the load balancer to give up on it and return the specific 502 error you're experiencing. Also, check the output on how many established connections are present at any one time using 'netstat' and watch command.
I would also recommend testing again with the HTTP(S) request directly to the instance, request the same URL that reporting 502. You might do this test in another VM instance in your VPC network.

maybe you should check if the time taken for the API to return the response is exceeded the timeout that will trigger the 502. The default value is 30 seconds.
Ref: https://cloud.google.com/load-balancing/docs/backend-service#timeout-setting

How to change AWS ELB status to InService?

A WordPress application is deployed in AWS Elastic Beanstalk that has a load balancer. I see sometimes there is ELB 5XX error. To make the instance OutOfService for the higher number of unhealthy threshold count, I set Unhealthy Threshold to 10. But sometimes health check fails and health is Severe. I get sometimes the error "% of the requests to the ELB are failing with HTTP 5xx". I checked the ELB access logs and sometimes request get the timeout (504) error and after a consecutive number of 504, ELB makes the instance OutOfService. I am trying to fix which request is failing.
What I don't know, is it possible to make the instance "InService" as quickly as possible. Because sometimes instance is OutOfService for 2-3 hours, which is really bad. Is there any good way to handle this situation. I am really in trouble with this situation. Looks like after the service is out, I have nothing to do. I am relatively new to AWS. Please help.

To solve this issue:
1) HTTP 504 means timeout. The resource that the load balancer is accessing on your backend is failing to respond. Determine what the path for the healthcheck from the AWS console.
2) In your browser verify that you can access the healthcheck path going around the load balancer. This may mean temporarily assigning an EIP to the EC2 instance. If the load balancer healthcheck is "/test/myhealthpage.php" then use "http://REPLACE_WITH_EIP/test/myhealthpage.php". For HTTPS listeners use https in your path.
3) Debug why the path that you specified is timing out and fix it.
Note: Healthcheck paths should not be to pages that do complicated tests or operations. A healthcheck should be a quick and simple GO / NO GO type of page.

AWS CloudWatch Web Server Metrics

I have a few EC2 instances with NGINX installed using both ports 80 and 443. The instances are serving different applications so I'm not using an ELB.
I would like to create a CloudWatch alarm to make sure port 80 is always returning 200 HTTP status code. I realize there are several commercial solutions for this such as New Relic, etc, but this is the task I have at hand at the moment.
None of the EC2 metrics look to be able to accomplish this, and I cannot use any ELB metrics since I have no ELB.
What's the best way to resolve this?

You can definetly do this manually (send a request and update a metric directly sent to Cloudwatch). Monitor that metric.
Or you could look into Route53 health checks. You might get away with just configuring a health check there if you are already using Route53:
http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/dns-failover.html

Create a Route53 Heath Check. Supported protocols are TCP, HTTP, and HTTPS.
The HTTP/S protocol supports matching the response payload against a user-defined string so you can not only react to connectivity problems but also to unexpected content being returned to users.
For a more advanced monitoring enable Latency metrics which collect TTFB (time to first byte) and SSL handshake times.
You can then create alarms to get alerts when one your apps becomes inaccessible.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js