I am seeing an strange issues with AWS ELB, I am getting High-Sum-HTTP-5XX from ELB but when I go to log I do not see any request in access log which have 5XX errors.
Does elb access log does not have 5XX errors reported there. Where can I see which request were having 5XX error it will help me to find root cause. I do not see anything in my server log as well.
I'm speculating, you are running a CLB (Classic Load Balancer). The access log with HTTP 5xx errors entries should be analyzed using elb_status_code and a backend_status_code
entries.
This could be off the topic but from AWS's documentation, it looks like some of these HTTP messages cannot be parsed by Classic Load Balancer (This could happen if there is reverse proxy in place on the instance that is sending an error that the ELB doesn't understand and hence are not recorded in the access logs. I could see the 404 errors in the access logs).
Related
We are trying to troubleshoot the load balancer of our applications and we cannot see why the load balancer is giving 502 errors. When I go to Logging and filter by severity or error code we don't see anything on the logs.
The type is External HTTPS Classic and the configuration of the load balancer is this
I add some aclarations per #John Hanley said:
I am checking the logs in google cloud logging: https://console.cloud.google.com/logs/query
The error is generated in the load balancer as if i check backend logs I don't see the request to have arrived to it. Also, looking at other stack overflow posts of Cloud Load Balancing it gives the same error.
Others can see the errors like the documentation says of statusDetails http failure messages: https://cloud.google.com/load-balancing/docs/https/https-logging-monitoring#gcloud:-classic-mode
But we don't see any of the errors registered, nor the petition with error code 502.
Edit: To mention that we see every OK response in the logging, just the errors and warnings aren't logged.
I have finally found out. Someone in the company set in the Log router sink an exclusion to http codes distinct to 200. That's why it wasn't being registered
I need to monitor the http 5xx errors coming in AWS ALB using cloudwatch and need to know what is the error and to set up alerts using clouwdatch .
I searched for a lot of methods , but couldnt get one .
Can someone help me with this to setup and monitoring and alert of 5xx errors in ALB using cloudwatch/Lamba/Slack??
We have a UI application deployed on Apache web server, within EC2 instance (t2.small). We have 2 such instances behind ELB (Classic load balacner), in 2 availability zones. From browser request is made to get javascript and css resources from server, however sometimes the network request to get these resources fails, throwing 504 timeout and sometimes 503. When we refresh the page multiple times, sometimes we get 504/503 and then on next refresh it loads fine.
I don't see any error in Apache access logs, and nothing useful in ELB logs.
Following response header is returned for 504 timeout, which I don't see for 2xx response:
"server: awselb/2.0".
I have tried keeping just one instance as well, but still this issue is reproducable.
Any debugging pointers appreciated. TIA
I have created a "Load Balancer" in Google Cloud and connected 2 virtual machines to it. When I send some requests to "Load Balancer", sometimes it gets passed to virtual machines attached to load balancer and sometimes it throws following error even health check is 100% OK at that time.
Error: Server Error
The server encountered a temporary error and could not complete your request.
Please try again in 30 seconds.
This answer was created to support the community based on the limited information delivered by the OP and the comments written above.
The most accurate decision to make when you try to determine the root cause of an HTTP load balancer issue is review the log entries.
According to the official google documentation. HTTP(S) Load Balancing log entries contain information useful for monitoring and debugging your HTTP(S) traffic.
Log entries contain the following types of information:
General information, such as severity, project ID, project number, and timestamp.
HttpRequest log fields. However, HttpRequest.protocol is not populated for HTTP(S) Load Balancing Cloud Logging logs.
A statusDetails field inside the structPayload. This field holds a string that explains why the load balancer returned the
HTTP status that it did. The tables below contain further
explanations of these log strings. The statusDetails field is not
available for regional external HTTP(S) load balancers.
Redirects (HTTP response status code 302 Found) issued from the load balancer are not logged. Redirects issued from the backend
instances are logged.
To enable the log entries in an HTTP Load Balancer please follow this guide.
The message “Error: Server Error The server encountered a temporary error and could not complete your request.” Could be caused for several reason reasons including:
There's no firewall rule configured to allow health checks.
The software on the backends isn't running.
In this page you can find a detailed guide to perform a complete troubleshooting related to general connectivity issues.
I found these posts related to HTTP Load balancer and 502 response, you can find useful information in these threads.
Debugging Load Balancer issues
Compute Engine HTTP Load Balancing 502 error
Google Cloud HTTP balancer returns 502 error
Error: Server Error The server encountered a temporary error and
could not complete your request. Please try again in 30
seconds.(GCP)
In my case issue was with health check not returning 200.
It returned 302 instead (Found) when calling default / and redirected to other url with 200 (which Loadbalancer checks ignored) and deemed that node as "unhealthy" and instead to route incoming http/s request to broken node removed it out of rotation and returned that 502 error message to client.
Error: Server Error The server encountered a temporary error and could not complete your request.
Please try again in 30 seconds.
Underneath my LoadBalancer was GKE cluster with gke ingress->service-> pod and no explicit liveness/readiness probes configured so by default healthchecks hit / with 302/Found/redirect.
After adding those probes to deployment manifest and pointing them to endpoint that return OK/200 (/-/healthy, /-/ready in my case of prometheus running inside the pod)issue was fixed.
Unfortunately gke ingress had un-informative message UNHEALTY only in annotations, so it took me a while to understand what causes that issue.
Since one week we are using "application" ELB for our applications. In ELB monitoring we couldn't see any 5XXs responses, even though there were many 5XXs in our application accesslogs.
Maybe it could be a configuration error!?
You are getting 5xx in the application logs but not in ELB metrics. If there is 5xx in application logs It's for that application which connects to.
It's not for the load balancer. So ELB is not receiving 504.