We have 3 EC2 Instances(Apache Web Server) running under AWS ELB, it sharing load correctly but whenever any of Web Server down i.e. Web1 having some issue i.e. Disk Full or Apache Crash then still ELB trying to send request to that server which is already not responding or don't have capacity to respond, hence user who is connected to that server are getting error.
Question : Is there way to identify Fail server and force ELB to stop passing request to failed server?
FYI: Auto Scaling is not enabled.
You need to configure health checks for your ELB. When the checks are failing, the elb will stop forwarding traffic to the unhealthy instance.
Related
My web application on AWS EC2 + load balancer sometimes shows 500 errors. How do I know if the error is on the server side or the application side?
I am using Route 53 domain and ssl on my url. I set the ALB redirect requests on port 80 to 443, and forward requests on port 443 to the target group (the EC2). However, the target group is returning 5xx error code sometimes when handling the request. Please see the screenshots for the metrics and configurations for the ALB.
Target Group Metrics
Target Group Configuration
Load Balancer Metrics
Load Balancer Listeners
EC2 Metrics
Right now the web application is running unsteady, sometimes it returns a 502 or 503 service unavailable (seems like it's a connnection timeout).
I have set up the ALB idle timeout 4000 secs.
ALB configuration
The application is using Nuxt.js + PHP7.0 + MySQL + Apache 2.4.54.
I have set the Apache prefork worker Maxclient number as 1000, which should be enough to handle the requests on the application.
The EC2 is a t2.Large resource, the CPU and Memory look enough to handle the processing.
It seems like if I directly request the IP address but not the domain, the amount of 5xx errors significantly reduced (but still exists).
I also have Wordpress application host on this EC2 in a subdomain (CNAME). I have never encountered any 5xx errors on this subdomain site, which makes me guess there might be some errors in my application code but not on the server side.
Is the 5xx error from my application or from the server?
I also tried to add another EC2 in the target group see if they can have at lease one healthy instance to handle the requests. However, the application is using a third-party API and has strict IP whitelist policy. I did some research that the Elastic IP I got from AWS cannot be attached to 2 different EC2s.
First of all, if your application is prone to stutters, increase healthcheck retries and timeouts, which will affect your initial question of flapping health.
To what I see from your screenshot, most of your 5xx are due to either server or application (you know obviously better what's the culprit since you have access to their logs).
To answer your question about 5xx errors coming from LB: this happens directly after LB kicks out unhealthy instance and if there's none to replace (which shouldn't be the case because you're supposed to have ASG if you enable evaluation of target health for LB), it can't produce meaningful output and thus crumbles with 5xx.
This should be enough information for you to make adjustments and logs investigation.
When I send requests using the ALB's DNS host, the listener's path, and the web services endpoint path, I don't get a response within the expected timeframe, which I've determined by successfully sending
requests directly to each of the tasks using their public ip addresses, they return successful responses.
For example:
The ALB's DNS entry: http://myapp-alb-11111111.us-west-1.elb.amazonaws.com
The web app, "abc", listens on port 80 for requests on "/api/health".
The web app is using "abc-svc/*" as the path in the listener.
The web app was assigned a public ip address of 10.88.77.66.
Sending a GET request to 'http://10.88.77.66/api/health' is successful.
Sending a GET request to 'http://myapp-alb-11111111.us-west-1.elb.amazonaws.com/abc-svc/api/health' does not return within several minutes, which is not expected behavior.
I've looked through the logs, but cannot find anything that is amiss. I'd appreciate any ideas or suggestions...
AWS CONFIGURATION
I have three docker images that are running in ECS. Each image is assigned to a separate service. Each service has a single task. Port 80 is open in the security group from the Internet to the ALB. Port 80 is open from the ALB to each task. The ALB's listener for port 80 is using path-based routing. There is a separate, unique path for each service. Each task contains a docker linux, spring boot 2, web service. Each web service's router has a "/api/health" route that expects a GET request with no parameters and returns a simple string. We are not using HTTP or SSL at this time.
Thank you for your time and interest.
Mike
There is a different reason for that but some of the common issues that you can debug
Check health check for each target group under LB target group, if its unhealthy LB will never route the traffic
Verify the target port is correct
Verify Target group associated properly with LB and is not showing unused.
Verify LB security group
Check the response from LB is it gateway timeout or service unavailbe if gateway timeout its not reachable if service unavailable probably restarting
Services Event logs, check that service is in steady-state or not, if not its mean restarting again and again
Check deployment logs of service, if you see unhealthy target group message then update the target group health path with status code
Okay this may seem like a dumb question but I have to say I'm a little stumped. I can't figure out why my ELB (Classic) --> EC2 health checks are failing in AWS.
So the way I have things configured, my ELB's health checks will try to ping the instance on TCP:8080.
I have 3 port listeners on my ELB, which from my understanding shouldn't matter because that deals with traffic originating externally but I'll list that just in case I'm overlooking something:
1.) HTTP:80 --> HTTP:8080
2.) HTTPS:443 --> HTTP:8080
3.) TCP:2222 --> TCP:2222
I've also tried to configure the health checks to ping TCP:2222 just in case and same deal..
Additionally, I've got Security groups for my EC2 that enable TCP:8080 and TCP:2222 from my ELB's Security Group.
Is the reason it's failing because while the EC2 instance allows the traffic, it still needs something to be running on that port to serve the traffic?
All the EC2 is currently is a simple AWS supplied Linux AMI.
Forgive me I'm a little new to the networking space.
Yes something needs to be running on the server. If there is no process running on the server, listening for and responding to requests on port 8080, then the health check will fail. You would want the health check to fail if your software running on your server crashed wouldn't you? An instance isn't "healthy" if it isn't responding to requests.
Also, the security group assigned to the EC2 instance needs to be configured to allow incoming traffic on those ports (8080 and 2222) originating from the ELB.
Is there any way to make an instance attached to an ELB unhealthy purposefully using boto ?
I tried few methods and non of them working so far.
Thanks for any help !!
No, this is not possible. There is no AWS API call that can change the health status of an instance. (Auto Scaling has this capability, but not Load Balancing).
You could use the deregister_instances() API call, which would effectively achieve the same result.
The Register or Deregister EC2 Instances for Your Classic Load Balancer documentation says:
Deregistering an EC2 instance removes it from your load balancer. The load balancer stops routing requests to an instance as soon as it is deregistered. If demand decreases, or you need to service your instances, you can deregister instances from the load balancer. An instance that is deregistered remains running, but no longer receives traffic from the load balancer, and you can register it with the load balancer again when you are ready.
When you deregister an instance, Elastic Load Balancing waits until in-flight requests have completed if connection draining is enabled.
Yeah, We can do that in the below scenario.
Let's assume that you have loadblancer(myloadbalancer), an instance attached with it and PingPath configuration as such below.
Ping Protocol: HTTP
Ping Port: 80
Ping Path: /
Just add boto3 code to edit the health check configuration as below and you can see the magic(Instance OutOfService).
client.configure_health_check(
LoadBalancerName='myloadbalancer',
HealthCheck={
'Target': 'HTTP:80/hjkx',
'Interval': 30,
'Timeout': 5,
'UnhealthyThreshold': 5,
'HealthyThreshold': 3
}
)
Two other options:
1. Temporarily disable the web server / process that's responding to the health check. In our case, we were running Java webapps with and nginx proxy in front of it. Shutting down the nginx proxy made the health check fail while the Java app would still be running.
2. Temporarily firewall the port that the ELB uses to perform the health check on. You could do this via a call to the AWS api.
When my Java application is deployed to Tomcat on Elastic-Beanastalk it takes a while (11 minutes) because it has to copy large data files from S3 and unzip them, but that is okay because this is all done in .ebextensions and the instance doesn't report itself ready until that is completed.
However, I have it configured for Autoscaling and it seems that when it decides it needs to start a new instance there is a period before the next instance has fully deployed that Elastic-Beanstalk will direct some application requests to this new server, of course because it is not ready it returns a 503 error.
But surely all calls should only go to the original instance until the second one is ready, has anyone else noticed this ?
Whether requests are directed to the new instance or not is decided by the Elastic Load Balancer (ELB). Your autoscaled instances are behind the ELB and ELB performs periodic health checks on your EC2 instances to decide whether traffic to your instances or not. By default the health check is TCP connect on port 80. So if ELB can establish a connection to port 80 on the Tomcat server, it will start sending traffic to the instance even before it is actually "ready".
The solution is to use a custom HTTP health check instead of the default TCP check. Set up your web app to return a 200 OK on a special path say '/health_ping'. Then configure the "Application Healthcheck URL" option to "/health_ping". You can do this using the following ebextension.
Create a file called .ebextensions/01-health-check.config in your app source with the following contents. Then deploy it to your environment.
option_settings:
- namespace: aws:elasticbeanstalk:application
option_name: Application Healthcheck URL
value: /health_ping
Read more about this option setting here.
You can also configure this in the web console or using the aws cli.