Google Load Balancer not logging errors

Google Load Balancer not logging errors - google-cloud-platform

We are trying to troubleshoot the load balancer of our applications and we cannot see why the load balancer is giving 502 errors. When I go to Logging and filter by severity or error code we don't see anything on the logs.
The type is External HTTPS Classic and the configuration of the load balancer is this
I add some aclarations per #John Hanley said:
I am checking the logs in google cloud logging: https://console.cloud.google.com/logs/query
The error is generated in the load balancer as if i check backend logs I don't see the request to have arrived to it. Also, looking at other stack overflow posts of Cloud Load Balancing it gives the same error.
Others can see the errors like the documentation says of statusDetails http failure messages: https://cloud.google.com/load-balancing/docs/https/https-logging-monitoring#gcloud:-classic-mode
But we don't see any of the errors registered, nor the petition with error code 502.
Edit: To mention that we see every OK response in the logging, just the errors and warnings aren't logged.

I have finally found out. Someone in the company set in the Log router sink an exclusion to http codes distinct to 200. That's why it wasn't being registered

Related

AWS - Difference between ELB_5XXs and HTTP_5XXs

In AWS Cloud watch, what is the difference between ELB_5xx and Http_5xx errors? And then there is also Backend_connection_errors.

HTTP errors served to the user can occur at either the load balancer or target level.
Any metrics that are prefixed with ELB originate at the load balancer level, there are many reasons why these can occur. For a 5XX error it could be a problem with connecting to your targets or could be that the throughput to the load balancer was too high (ELBs scale too based on traffic). More information about these errors are available in the Troubleshoot your Application Load Balancers document.
For the Http_ errors these will be coming from the target itself, essentially your application has returned this HTTP status back to the client. For debugging these you would look in your application logs to identify the root cause.

Google cloud load balancer causing error 502 - failed_to_pick_backend

I've got an error 502 when I use google cloud balancer with CDN, the thing is, I am pretty sure I must have done something wrong setting up the load balancer because when I remove the load balancer, my website runs just fine.
This is how I configure my load balancer
here
Should I use HTTP or HTTPS healthcheck, because when I set up HTTPS
healthcheck, my website was up for a bit and then it down again
I have checked this link, they seem to have the same problem but it is not working for me.
I have followed a tutorial from openlitespeed forum to set Keep-Alive Timeout (secs) = 60s in server admin panel and configure instance to accepts long-lived connections ,still not working for me.
I have added these 2 firewall rules following this google cloud link to allow google health check ip but still didn’t work:
https://cloud.google.com/load-balancing/docs/health-checks#fw-netlb
https://cloud.google.com/load-balancing/docs/https/ext-http-lb-simple#firewall
When checking load balancer log message, it shows an error saying failed_to_pick_backend . I have tried to re-configure load balancer but it didn't help.
I just started to learn Google Cloud and my knowledge is really limited, it would be greatly appreciated if someone could show me step by step how to solve this issue. Thank you!

Posting an answer - based on OP's finding to improve user experience.
Solution to the error 502 - failed_to_pick_backend was changing Load Balancer from HTTP to TCP protocol and at the same type changing health check from HTTP to TCP also.
After that LB passes through all incoming connections as it should and the error dissapeared.
Here's some more info about various types of health checks and how to chose correct one.

The error message that you're facing it's "failed_to_pick_backend".
This error message means that HTTP responses code are generated when a GFE was not able to establish a connection to a backend instance or was not able to identify a viable backend instance to connect to
I noticed in the image that your health-check failed causing the aforementioned error messages, this Health Check failing behavior could be due to:
Web server software not running on backend instance
Web server software misconfigured on backend instance
Server resources exhausted and not accepting connections:
- CPU usage too high to respond
- Memory usage too high, process killed or can't malloc()
- Maximum amount of workers spawned and all are busy (think mpm_prefork in Apache)
- Maximum established TCP connections
Check if the running services were responding with a 200 (OK) to the Health Check probes and Verify your Backend Service timeout. The Backend Service timeout works together with the configured Health Check values to define the amount of time an instance has to respond before being considered unhealthy.
Additionally, You can see this troubleshooting guide to face some error messages (Including this).

Those experienced with Kubernetes from other platforms may be confused as to why their Ingresses are calling their backends "UNHEALTHY".
Health checks are not the same thing as Readiness Probes and Liveness Probes.
Health checks are an independent utility used by GCP's Load Balancers and perform the exact same function, but are defined elsewhere. Failures here will lead to 502 errors.
https://console.cloud.google.com/compute/healthChecks

GCP external http load balancer 502 server error:"failed_to_connect_to_backend"

I have configured a http external load balancer on GCP and all my vm instances are healthy in backend.
But when i am trying to access my server(installed on VM) from frontend static IP that is reserved at load balancer it is giving me 502 status error.
As a result of which i am unable to launch my application server using load balancer. Help me fix this issue.
Thanking you in advance.

To troubleshoot 502 response from the Load Balancer due to "failed_to_connect_to_backend." I would check the followings:
Usually, "failed_to_connect_to_backend" error message indicates that the load balancer is failing to connect to backends, investigating URL map rules is also a good point to start. I would also suggest reviewing your Load Balancer's URL map to make sure that Host rules, Path matcher, and Path rules are correctly defined and comply with descriptions in this article.
Also check if the backend instances are exhausting their resources, If a backend server is overwhelmed, it will refuse incoming requests, potentially causing the load balancer to give up on it and return the specific 502 error you're experiencing. Also, check the output on how many established connections are present at any one time using 'netstat' and watch command.
I would also recommend testing again with the HTTP(S) request directly to the instance, request the same URL that reporting 502. You might do this test in another VM instance in your VPC network.

maybe you should check if the time taken for the API to return the response is exceeded the timeout that will trigger the 502. The default value is 30 seconds.
Ref: https://cloud.google.com/load-balancing/docs/backend-service#timeout-setting

(GCP) : Server Error The server encountered a temporary error and could not complete your request. Please try again in 30 seconds

I have created a "Load Balancer" in Google Cloud and connected 2 virtual machines to it. When I send some requests to "Load Balancer", sometimes it gets passed to virtual machines attached to load balancer and sometimes it throws following error even health check is 100% OK at that time.
Error: Server Error
The server encountered a temporary error and could not complete your request.
Please try again in 30 seconds.

This answer was created to support the community based on the limited information delivered by the OP and the comments written above.
The most accurate decision to make when you try to determine the root cause of an HTTP load balancer issue is review the log entries.
According to the official google documentation. HTTP(S) Load Balancing log entries contain information useful for monitoring and debugging your HTTP(S) traffic.
Log entries contain the following types of information:
General information, such as severity, project ID, project number, and timestamp.
HttpRequest log fields. However, HttpRequest.protocol is not populated for HTTP(S) Load Balancing Cloud Logging logs.
A statusDetails field inside the structPayload. This field holds a string that explains why the load balancer returned the
HTTP status that it did. The tables below contain further
explanations of these log strings. The statusDetails field is not
available for regional external HTTP(S) load balancers.
Redirects (HTTP response status code 302 Found) issued from the load balancer are not logged. Redirects issued from the backend
instances are logged.
To enable the log entries in an HTTP Load Balancer please follow this guide.
The message “Error: Server Error The server encountered a temporary error and could not complete your request.” Could be caused for several reason reasons including:
There's no firewall rule configured to allow health checks.
The software on the backends isn't running.
In this page you can find a detailed guide to perform a complete troubleshooting related to general connectivity issues.
I found these posts related to HTTP Load balancer and 502 response, you can find useful information in these threads.
Debugging Load Balancer issues
Compute Engine HTTP Load Balancing 502 error
Google Cloud HTTP balancer returns 502 error
Error: Server Error The server encountered a temporary error and
could not complete your request. Please try again in 30
seconds.(GCP)

In my case issue was with health check not returning 200.
It returned 302 instead (Found) when calling default / and redirected to other url with 200 (which Loadbalancer checks ignored) and deemed that node as "unhealthy" and instead to route incoming http/s request to broken node removed it out of rotation and returned that 502 error message to client.
Error: Server Error The server encountered a temporary error and could not complete your request.
Please try again in 30 seconds.
Underneath my LoadBalancer was GKE cluster with gke ingress->service-> pod and no explicit liveness/readiness probes configured so by default healthchecks hit / with 302/Found/redirect.
After adding those probes to deployment manifest and pointing them to endpoint that return OK/200 (/-/healthy, /-/ready in my case of prometheus running inside the pod)issue was fixed.
Unfortunately gke ingress had un-informative message UNHEALTY only in annotations, so it took me a while to understand what causes that issue.

AWS ELB Logs not showing 5xx errors in

I am seeing an strange issues with AWS ELB, I am getting High-Sum-HTTP-5XX from ELB but when I go to log I do not see any request in access log which have 5XX errors.
Does elb access log does not have 5XX errors reported there. Where can I see which request were having 5XX error it will help me to find root cause. I do not see anything in my server log as well.

I'm speculating, you are running a CLB (Classic Load Balancer). The access log with HTTP 5xx errors entries should be analyzed using elb_status_code and a backend_status_code
entries.
This could be off the topic but from AWS's documentation, it looks like some of these HTTP messages cannot be parsed by Classic Load Balancer (This could happen if there is reverse proxy in place on the instance that is sending an error that the ELB doesn't understand and hence are not recorded in the access logs. I could see the 404 errors in the access logs).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js