Why isn't elastic beanstalk restarting my unresponsive instance?

Why isn't elastic beanstalk restarting my unresponsive instance? - amazon-web-services

I have a java spring application with a memory leak. I've actually resolved the cause of the memory leak (not properly closing jdbc connections), but I noticed that when my application became unresponsive, elastic beanstalk did not restart my instance and I had to do it manually. Why didn't it restart on its own?
As you can see, all of the requests are failing. From the aws docs
If a health check URL is configured, Elastic Load Balancing expects a GET request that it sends to return a response of 200 OK. The application fails the health check if it fails to respond within 5 seconds or if it responds with any other HTTP status code. After 5 consecutive health check failures, Elastic Load Balancing takes the instance out of service.
I assumed that "takes the instance out of service" means that it terminates the instance and replaces it with a new one. Is this not the case? What's going on here?

Related

Getting 5xx error with AWS Application Load Balancer - fluctuating healthy and unhealthy target group

My web application on AWS EC2 + load balancer sometimes shows 500 errors. How do I know if the error is on the server side or the application side?
I am using Route 53 domain and ssl on my url. I set the ALB redirect requests on port 80 to 443, and forward requests on port 443 to the target group (the EC2). However, the target group is returning 5xx error code sometimes when handling the request. Please see the screenshots for the metrics and configurations for the ALB.
Target Group Metrics
Target Group Configuration
Load Balancer Metrics
Load Balancer Listeners
EC2 Metrics
Right now the web application is running unsteady, sometimes it returns a 502 or 503 service unavailable (seems like it's a connnection timeout).
I have set up the ALB idle timeout 4000 secs.
ALB configuration
The application is using Nuxt.js + PHP7.0 + MySQL + Apache 2.4.54.
I have set the Apache prefork worker Maxclient number as 1000, which should be enough to handle the requests on the application.
The EC2 is a t2.Large resource, the CPU and Memory look enough to handle the processing.
It seems like if I directly request the IP address but not the domain, the amount of 5xx errors significantly reduced (but still exists).
I also have Wordpress application host on this EC2 in a subdomain (CNAME). I have never encountered any 5xx errors on this subdomain site, which makes me guess there might be some errors in my application code but not on the server side.
Is the 5xx error from my application or from the server?
I also tried to add another EC2 in the target group see if they can have at lease one healthy instance to handle the requests. However, the application is using a third-party API and has strict IP whitelist policy. I did some research that the Elastic IP I got from AWS cannot be attached to 2 different EC2s.

First of all, if your application is prone to stutters, increase healthcheck retries and timeouts, which will affect your initial question of flapping health.
To what I see from your screenshot, most of your 5xx are due to either server or application (you know obviously better what's the culprit since you have access to their logs).
To answer your question about 5xx errors coming from LB: this happens directly after LB kicks out unhealthy instance and if there's none to replace (which shouldn't be the case because you're supposed to have ASG if you enable evaluation of target health for LB), it can't produce meaningful output and thus crumbles with 5xx.
This should be enough information for you to make adjustments and logs investigation.

AWS ELB Load Sharing Configuration

We have 3 EC2 Instances(Apache Web Server) running under AWS ELB, it sharing load correctly but whenever any of Web Server down i.e. Web1 having some issue i.e. Disk Full or Apache Crash then still ELB trying to send request to that server which is already not responding or don't have capacity to respond, hence user who is connected to that server are getting error.
Question : Is there way to identify Fail server and force ELB to stop passing request to failed server?
FYI: Auto Scaling is not enabled.

You need to configure health checks for your ELB. When the checks are failing, the elb will stop forwarding traffic to the unhealthy instance.

How to change AWS ELB status to InService?

A WordPress application is deployed in AWS Elastic Beanstalk that has a load balancer. I see sometimes there is ELB 5XX error. To make the instance OutOfService for the higher number of unhealthy threshold count, I set Unhealthy Threshold to 10. But sometimes health check fails and health is Severe. I get sometimes the error "% of the requests to the ELB are failing with HTTP 5xx". I checked the ELB access logs and sometimes request get the timeout (504) error and after a consecutive number of 504, ELB makes the instance OutOfService. I am trying to fix which request is failing.
What I don't know, is it possible to make the instance "InService" as quickly as possible. Because sometimes instance is OutOfService for 2-3 hours, which is really bad. Is there any good way to handle this situation. I am really in trouble with this situation. Looks like after the service is out, I have nothing to do. I am relatively new to AWS. Please help.

To solve this issue:
1) HTTP 504 means timeout. The resource that the load balancer is accessing on your backend is failing to respond. Determine what the path for the healthcheck from the AWS console.
2) In your browser verify that you can access the healthcheck path going around the load balancer. This may mean temporarily assigning an EIP to the EC2 instance. If the load balancer healthcheck is "/test/myhealthpage.php" then use "http://REPLACE_WITH_EIP/test/myhealthpage.php". For HTTPS listeners use https in your path.
3) Debug why the path that you specified is timing out and fix it.
Note: Healthcheck paths should not be to pages that do complicated tests or operations. A healthcheck should be a quick and simple GO / NO GO type of page.

Site Become unavailable when one instance under loadbalancer become unhealthy

We are using AutoScaling and Elastic Load Balancer from Amazon AWS. We are running three linux(ubuntu) server under a load balancer. When one of the three instance become unhealthy(Status check fails) our site become unavailable.
But other 2 instance was healthy at that time.

We figured out the actual cause of this problem. Actually apache was taking all the memory of the instance. At one point memory exhausted on the machine, so apache was not able to serve response.
Following is another question where the problem is described.
apache2 processes stuck in sending reply - W

When AWS ElasticBeanstalk scales to another server it seems to make it available before it is ready ?

When my Java application is deployed to Tomcat on Elastic-Beanastalk it takes a while (11 minutes) because it has to copy large data files from S3 and unzip them, but that is okay because this is all done in .ebextensions and the instance doesn't report itself ready until that is completed.
However, I have it configured for Autoscaling and it seems that when it decides it needs to start a new instance there is a period before the next instance has fully deployed that Elastic-Beanstalk will direct some application requests to this new server, of course because it is not ready it returns a 503 error.
But surely all calls should only go to the original instance until the second one is ready, has anyone else noticed this ?

Whether requests are directed to the new instance or not is decided by the Elastic Load Balancer (ELB). Your autoscaled instances are behind the ELB and ELB performs periodic health checks on your EC2 instances to decide whether traffic to your instances or not. By default the health check is TCP connect on port 80. So if ELB can establish a connection to port 80 on the Tomcat server, it will start sending traffic to the instance even before it is actually "ready".
The solution is to use a custom HTTP health check instead of the default TCP check. Set up your web app to return a 200 OK on a special path say '/health_ping'. Then configure the "Application Healthcheck URL" option to "/health_ping". You can do this using the following ebextension.
Create a file called .ebextensions/01-health-check.config in your app source with the following contents. Then deploy it to your environment.
option_settings:
- namespace: aws:elasticbeanstalk:application
option_name: Application Healthcheck URL
value: /health_ping
Read more about this option setting here.
You can also configure this in the web console or using the aws cli.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js