When I send requests using the ALB's DNS host, the listener's path, and the web services endpoint path, I don't get a response within the expected timeframe, which I've determined by successfully sending
requests directly to each of the tasks using their public ip addresses, they return successful responses.
For example:
The ALB's DNS entry: http://myapp-alb-11111111.us-west-1.elb.amazonaws.com
The web app, "abc", listens on port 80 for requests on "/api/health".
The web app is using "abc-svc/*" as the path in the listener.
The web app was assigned a public ip address of 10.88.77.66.
Sending a GET request to 'http://10.88.77.66/api/health' is successful.
Sending a GET request to 'http://myapp-alb-11111111.us-west-1.elb.amazonaws.com/abc-svc/api/health' does not return within several minutes, which is not expected behavior.
I've looked through the logs, but cannot find anything that is amiss. I'd appreciate any ideas or suggestions...
AWS CONFIGURATION
I have three docker images that are running in ECS. Each image is assigned to a separate service. Each service has a single task. Port 80 is open in the security group from the Internet to the ALB. Port 80 is open from the ALB to each task. The ALB's listener for port 80 is using path-based routing. There is a separate, unique path for each service. Each task contains a docker linux, spring boot 2, web service. Each web service's router has a "/api/health" route that expects a GET request with no parameters and returns a simple string. We are not using HTTP or SSL at this time.
Thank you for your time and interest.
Mike
There is a different reason for that but some of the common issues that you can debug
Check health check for each target group under LB target group, if its unhealthy LB will never route the traffic
Verify the target port is correct
Verify Target group associated properly with LB and is not showing unused.
Verify LB security group
Check the response from LB is it gateway timeout or service unavailbe if gateway timeout its not reachable if service unavailable probably restarting
Services Event logs, check that service is in steady-state or not, if not its mean restarting again and again
Check deployment logs of service, if you see unhealthy target group message then update the target group health path with status code
Related
My web application on AWS EC2 + load balancer sometimes shows 500 errors. How do I know if the error is on the server side or the application side?
I am using Route 53 domain and ssl on my url. I set the ALB redirect requests on port 80 to 443, and forward requests on port 443 to the target group (the EC2). However, the target group is returning 5xx error code sometimes when handling the request. Please see the screenshots for the metrics and configurations for the ALB.
Target Group Metrics
Target Group Configuration
Load Balancer Metrics
Load Balancer Listeners
EC2 Metrics
Right now the web application is running unsteady, sometimes it returns a 502 or 503 service unavailable (seems like it's a connnection timeout).
I have set up the ALB idle timeout 4000 secs.
ALB configuration
The application is using Nuxt.js + PHP7.0 + MySQL + Apache 2.4.54.
I have set the Apache prefork worker Maxclient number as 1000, which should be enough to handle the requests on the application.
The EC2 is a t2.Large resource, the CPU and Memory look enough to handle the processing.
It seems like if I directly request the IP address but not the domain, the amount of 5xx errors significantly reduced (but still exists).
I also have Wordpress application host on this EC2 in a subdomain (CNAME). I have never encountered any 5xx errors on this subdomain site, which makes me guess there might be some errors in my application code but not on the server side.
Is the 5xx error from my application or from the server?
I also tried to add another EC2 in the target group see if they can have at lease one healthy instance to handle the requests. However, the application is using a third-party API and has strict IP whitelist policy. I did some research that the Elastic IP I got from AWS cannot be attached to 2 different EC2s.
First of all, if your application is prone to stutters, increase healthcheck retries and timeouts, which will affect your initial question of flapping health.
To what I see from your screenshot, most of your 5xx are due to either server or application (you know obviously better what's the culprit since you have access to their logs).
To answer your question about 5xx errors coming from LB: this happens directly after LB kicks out unhealthy instance and if there's none to replace (which shouldn't be the case because you're supposed to have ASG if you enable evaluation of target health for LB), it can't produce meaningful output and thus crumbles with 5xx.
This should be enough information for you to make adjustments and logs investigation.
Here is what exists and works OK:
Kubernetes cluster in Google Cloud with deployed 8 workloads - basically GraphQL microservices.
Each of the workloads has a service that exposes port 80 via NEG (Network Endpoint Group). So, each workload has its ClusterIP in the 10.12.0.0/20 network. Each of the services has a custom namespace "microservices".
One of the workloads (API gateway) is exposed to the Internet via Global HTTP(S) Load Balancer. Its purpose is to handle all requests and route them to the right microservice.
Now, I needed to expose all of the workloads to the outside world so they can be reached individually without going through the gateway.
For this, I have created:
a Global Load Balancer, added backends (which referer to NEGs), configured routing (URL path defines which workload the request will go), and external IP
a Health Check that is used by Load Balancer for each of the backends
a firewall rule that allows traffic on TCP port 80 from the Google Health Check services 35.191.0.0/16, 130.211.0.0/22 to all hosts in the network.
The problem: Health Check fails and thus the load balancer does not work - it gives error 502.
What I checked:
logs show that the firewall rule allows traffic
logs for the Health Check show only changes I do to it and no other activities so I do not know what happens inside.
connected via SSH to the VM which hosts the Kubernetes node and checked that the clusterIPs (10.12.xx.xx) of each of workload return HTTP Status 200.
connected via SSH to a VM created for test purposes. From this VM I cannot reach any of the ClusterIPs (10.12.xx.xx)
It seems that for some reason traffic from the Health Check or my test VM does not get to the destination. What did I miss?
We have 3 EC2 Instances(Apache Web Server) running under AWS ELB, it sharing load correctly but whenever any of Web Server down i.e. Web1 having some issue i.e. Disk Full or Apache Crash then still ELB trying to send request to that server which is already not responding or don't have capacity to respond, hence user who is connected to that server are getting error.
Question : Is there way to identify Fail server and force ELB to stop passing request to failed server?
FYI: Auto Scaling is not enabled.
You need to configure health checks for your ELB. When the checks are failing, the elb will stop forwarding traffic to the unhealthy instance.
I have created an AWS Network Load Balancer, with TCP:80 (HTTP) listener. This listener forward requests to a Target Group called "My-TargetGroup."
I have created a Task Defintion, that points to a Docker Image of the Spring Boot service, that runs on port 8080. In ECS, When I created the ECS Service I selected "My-TargetGroup", with listener port at 80.
I can see that my ECS Service has one Task running successfully. However, I do not know how to test whether NLB is able to forward the request to the underlying spring boot service. For eg. in my Spring boot API, I have a the endpoint myapi/faq. How do I call this API through curl?. Basically, I will be calling this API end point as http/https method. So I want to now test this API as a get call through https protocol
You can try to use netcat command for a variety of connectivity tests. Here is the syntax
nc -v {host} {port}
With -v(verbose) option, you should ideally see an output if your server socket returns something on connection.
Before curl, I will suggest making the sure thing on the infrastructure side, cur l will never help you to debug the issue. curl may work in your case but normally network Load balancer work on netwrok layer so you can try with telnet.
telnet lb_endpoint 80
But what you need is to make sure is the target healthy?
So if the target is healthy application is running and LB should response and if not then check security group of LB.
If the target is unhealthy something wrong with ECS services so do need to debug LB.
I have a load balancer (LB) and an EC2 instance on AWS. My LB has my domain name associated and supports HTTP and HTTPS connections. It has a health-check configured to an endpoint on my EC2 instance (it's running node).
When trying to hit an endpoint via my domain name, the LB doesn't route traffic to my EC2 because it doesn't see it as a healthy instance. I can hit the endpoint directly with the IP address instead. What sort of response do I need to configure so that my EC2 can be recognized as healthy?
Edit: Using an application load balancer.
Edit 2: Health check configuration.
Protocol: HTTPS
Path : /callback
Port : 443
Healthy threshold : 5
Unhealthy threshold : 2
Timeout : 5
Interval : 30
Success codes : 200
You need to provide a path on the EC2 instance - you do NOT need to provide anything in DNS. It should look something like:
Protocol:HTTP
Port: 80
Path: / (or any valid URL on your host that's
a good example of your page working)
No DNS names need to be in there, remember - the ELB already knows which server(s) it's checking against, it just needs to know what to check on that server. Also make sure your security groups allow the ELB to talk to the server on the required ports.
Solved: with the application LB, all that is needed is a 200-level status code from a designated url. This means that you cannot return a simple text response like "Hello World" when they send their health check request.