GCP external http load balancer 502 server error:"failed_to_connect_to_backend"

GCP external http load balancer 502 server error:"failed_to_connect_to_backend" - google-cloud-platform

I have configured a http external load balancer on GCP and all my vm instances are healthy in backend.
But when i am trying to access my server(installed on VM) from frontend static IP that is reserved at load balancer it is giving me 502 status error.
As a result of which i am unable to launch my application server using load balancer. Help me fix this issue.
Thanking you in advance.

To troubleshoot 502 response from the Load Balancer due to "failed_to_connect_to_backend." I would check the followings:
Usually, "failed_to_connect_to_backend" error message indicates that the load balancer is failing to connect to backends, investigating URL map rules is also a good point to start. I would also suggest reviewing your Load Balancer's URL map to make sure that Host rules, Path matcher, and Path rules are correctly defined and comply with descriptions in this article.
Also check if the backend instances are exhausting their resources, If a backend server is overwhelmed, it will refuse incoming requests, potentially causing the load balancer to give up on it and return the specific 502 error you're experiencing. Also, check the output on how many established connections are present at any one time using 'netstat' and watch command.
I would also recommend testing again with the HTTP(S) request directly to the instance, request the same URL that reporting 502. You might do this test in another VM instance in your VPC network.

maybe you should check if the time taken for the API to return the response is exceeded the timeout that will trigger the 502. The default value is 30 seconds.
Ref: https://cloud.google.com/load-balancing/docs/backend-service#timeout-setting

Related

Google cloud load balancer causing error 502 - failed_to_pick_backend

I've got an error 502 when I use google cloud balancer with CDN, the thing is, I am pretty sure I must have done something wrong setting up the load balancer because when I remove the load balancer, my website runs just fine.
This is how I configure my load balancer
here
Should I use HTTP or HTTPS healthcheck, because when I set up HTTPS
healthcheck, my website was up for a bit and then it down again
I have checked this link, they seem to have the same problem but it is not working for me.
I have followed a tutorial from openlitespeed forum to set Keep-Alive Timeout (secs) = 60s in server admin panel and configure instance to accepts long-lived connections ,still not working for me.
I have added these 2 firewall rules following this google cloud link to allow google health check ip but still didn’t work:
https://cloud.google.com/load-balancing/docs/health-checks#fw-netlb
https://cloud.google.com/load-balancing/docs/https/ext-http-lb-simple#firewall
When checking load balancer log message, it shows an error saying failed_to_pick_backend . I have tried to re-configure load balancer but it didn't help.
I just started to learn Google Cloud and my knowledge is really limited, it would be greatly appreciated if someone could show me step by step how to solve this issue. Thank you!

Posting an answer - based on OP's finding to improve user experience.
Solution to the error 502 - failed_to_pick_backend was changing Load Balancer from HTTP to TCP protocol and at the same type changing health check from HTTP to TCP also.
After that LB passes through all incoming connections as it should and the error dissapeared.
Here's some more info about various types of health checks and how to chose correct one.

The error message that you're facing it's "failed_to_pick_backend".
This error message means that HTTP responses code are generated when a GFE was not able to establish a connection to a backend instance or was not able to identify a viable backend instance to connect to
I noticed in the image that your health-check failed causing the aforementioned error messages, this Health Check failing behavior could be due to:
Web server software not running on backend instance
Web server software misconfigured on backend instance
Server resources exhausted and not accepting connections:
- CPU usage too high to respond
- Memory usage too high, process killed or can't malloc()
- Maximum amount of workers spawned and all are busy (think mpm_prefork in Apache)
- Maximum established TCP connections
Check if the running services were responding with a 200 (OK) to the Health Check probes and Verify your Backend Service timeout. The Backend Service timeout works together with the configured Health Check values to define the amount of time an instance has to respond before being considered unhealthy.
Additionally, You can see this troubleshooting guide to face some error messages (Including this).

Those experienced with Kubernetes from other platforms may be confused as to why their Ingresses are calling their backends "UNHEALTHY".
Health checks are not the same thing as Readiness Probes and Liveness Probes.
Health checks are an independent utility used by GCP's Load Balancers and perform the exact same function, but are defined elsewhere. Failures here will lead to 502 errors.
https://console.cloud.google.com/compute/healthChecks

Error: Server Error The server encountered a temporary error and could not complete your request. Please try again in 30 seconds.(GCP)

I've configured a HTTP(S) Load balancer as per the documentation on https://cloud.google.com/compute/docs/load-balancing/http/
When I try to access the site via the Public IP address associated with the Load balancer. I'm getting a 502 response with the message:
Error: Server Error
The server encountered a temporary error and could not complete your request.
Please try again in 30 seconds.
I believe this is coming from the load balancer.
Anyone have any insight into what might be going on, what more I should be looking at?

Have a look at the documentation Troubleshooting HTTP(S) Load Balancing section Unexplained 502 errors:
If 502 errors persist longer than a few minutes after you complete the
load balancer configuration, it's likely that either:
There's no firewall rule configured to allow health checks.
The software on the backends isn't running.
To verify that health check traffic reaches your backend VMs, enable
health check logging and search for successful log entries.
To create an ingress rule that allows traffic from the Google Cloud health checking systems (130.211.0.0/22 and 35.191.0.0/16) you can use Cloud Console or this command:
gcloud compute firewall-rules create fw-allow-health-check \
--network=default \
--action=allow \
--direction=ingress \
--source-ranges=130.211.0.0/22,35.191.0.0/16 \
--target-tags=allow-health-check \
--rules=tcp
In this command target tag allow-health-check used to identify VMs.

I had the same problem. After a day of searching, it was a health checker problem. The health test was on TCP, I changed it to HTTP, the problem was solved.

Could it be that the load balancer depends on you using the URL and not an IP address?
There are a couple of reasons that might be the case.
The URL points to the load balancer and the load balancer has a list of server IP addresses that service that URL; then it picks a server and forwards the request. To do that, it must receive the oritinal URL because the load balancer may be serving multiple sets of servers.
If the IP address points to the load balancer, it won't know which set of servers to choose from. If the IP address points to a server, the load balancer will be bypassed.
That's as much as I can think of...
Jam

(GCP) : Server Error The server encountered a temporary error and could not complete your request. Please try again in 30 seconds

I have created a "Load Balancer" in Google Cloud and connected 2 virtual machines to it. When I send some requests to "Load Balancer", sometimes it gets passed to virtual machines attached to load balancer and sometimes it throws following error even health check is 100% OK at that time.
Error: Server Error
The server encountered a temporary error and could not complete your request.
Please try again in 30 seconds.

This answer was created to support the community based on the limited information delivered by the OP and the comments written above.
The most accurate decision to make when you try to determine the root cause of an HTTP load balancer issue is review the log entries.
According to the official google documentation. HTTP(S) Load Balancing log entries contain information useful for monitoring and debugging your HTTP(S) traffic.
Log entries contain the following types of information:
General information, such as severity, project ID, project number, and timestamp.
HttpRequest log fields. However, HttpRequest.protocol is not populated for HTTP(S) Load Balancing Cloud Logging logs.
A statusDetails field inside the structPayload. This field holds a string that explains why the load balancer returned the
HTTP status that it did. The tables below contain further
explanations of these log strings. The statusDetails field is not
available for regional external HTTP(S) load balancers.
Redirects (HTTP response status code 302 Found) issued from the load balancer are not logged. Redirects issued from the backend
instances are logged.
To enable the log entries in an HTTP Load Balancer please follow this guide.
The message “Error: Server Error The server encountered a temporary error and could not complete your request.” Could be caused for several reason reasons including:
There's no firewall rule configured to allow health checks.
The software on the backends isn't running.
In this page you can find a detailed guide to perform a complete troubleshooting related to general connectivity issues.
I found these posts related to HTTP Load balancer and 502 response, you can find useful information in these threads.
Debugging Load Balancer issues
Compute Engine HTTP Load Balancing 502 error
Google Cloud HTTP balancer returns 502 error
Error: Server Error The server encountered a temporary error and
could not complete your request. Please try again in 30
seconds.(GCP)

In my case issue was with health check not returning 200.
It returned 302 instead (Found) when calling default / and redirected to other url with 200 (which Loadbalancer checks ignored) and deemed that node as "unhealthy" and instead to route incoming http/s request to broken node removed it out of rotation and returned that 502 error message to client.
Error: Server Error The server encountered a temporary error and could not complete your request.
Please try again in 30 seconds.
Underneath my LoadBalancer was GKE cluster with gke ingress->service-> pod and no explicit liveness/readiness probes configured so by default healthchecks hit / with 302/Found/redirect.
After adding those probes to deployment manifest and pointing them to endpoint that return OK/200 (/-/healthy, /-/ready in my case of prometheus running inside the pod)issue was fixed.
Unfortunately gke ingress had un-informative message UNHEALTY only in annotations, so it took me a while to understand what causes that issue.

Does Google Cloud HTTPS load balancer log back end errors?

Looking for way to debug why backend for NIFI is failing. I created a NIFI cluster (verison 1.9.0, HDF 3.1.1.4, AMBARI 2.7.3) on Google cloud. Created HTTPS load balancer terminating https front end, and back end is the instance group for SSL enabled NIFI cluster. Getting a 502 back end error in the browser when I hit the url for the load balancer. Is there a way for Google Cloud to log the error ? There must be an error returned somewhere to troubleshoot the root cause. I don't see messages in the nifi log or the vm instance /var/log/messages. Stackdriver hasn't shown me errors. I created the keystore and truststore and followed the NIFI SSL enable instructions. It might be related to the SSL configs, or possibly firewall rules are not correct. But I am looking for some more helpful information to find the error.

If I am understanding the question properly, you are looking for a way to get HTTPS load balancer logs due to back end errors and your intention is to find out the root cause.Load balancer basically return 502 error due to unhealthy backend services or for unhealthy backend VM 's.If your stackdriver logging is enabled, you might get this log using advanced filter or can search by selecting the load balancer name and look for/search 502:
Advanced filter for 502 responses due to failures to connect to backends:
resource.type="http_load_balancer"
resource.labels.url_map_name="[URL Map]"
httpRequest.status=502
jsonPayload.statusDetails="failed_to_connect_to_backend"
Advanced filter for 502 responses due to backend timeouts:
resource.type="http_load_balancer"
resource.labels.url_map_name="[URL Map]"
httpRequest.status=502
jsonPayload.statusDetails="backend_timeout"
Advanced filter for 502 responses due to prematurely closed connections:
resource.type="http_load_balancer"
resource.labels.url_map_name="[URL Map]"
httpRequest.status=502
jsonPayload.statusDetails="backend_connection_closed_before_data_sent_to_client"
The URL Map is same as the name of the load balancer for HTTP(S) for cloud console.If we create the various components of the load balancer manually, need to use the URL Map for advanced filter.
Most common root causes for "failed_to_connect_to_backend" are: 1. Firewall blocking traffic, 2. Web server software not running on backend instance, 3. Web server software misconfigured on backend instance, 4. Server resources exhausted and not accepting connections (CPU usage too high to respond, Memory usage too high, process killed ,the maximum amount of workers spawned and all are busy, Maximum established TCP connections), 5. Poorly written server implementation struggling under load or non-standard behavior.
Most common root causes for “backend_timeout” are 1. the backend instance took longer than the Backend Service timeout to respond, meaning either the application is overloaded or the Backend Service Timeout is set too low, 2. The backend instance didn't respond at all (Crashing during a request).
Most Common Root causes for” backend_connection_closed_before_data_sent_to_client” is usually caused because the keepalive configuration parameter for the web server software running on the backend instance is less than the fixed (10 minute) keepalive (HTTP idle) timeout of the GFE. There are some situations where the backend may close a connection too soon while the GFE is still sending the HTTP request.

The previous response was spot on. The nifi ssl configuration is misconfigured, causing the backend health check to fail with a bad certificate. I will open a new question to address the nifi ssl configuration.

http 502 errors when new instance is being created in a group

We are using cross region load balancing. When we get heavy traffic all at once, within 1 region, it begins to spin up new instances. While it is starting new instances, we get random HTTP 502 errors. Screenshots of configurations below. Is there any way to avoid the 502 errors while it is scaling up?
Image links of configuration below.
Instance Group Configuration (same setting on all regions)
Load Balancer
Thanks in advance for the help!

HTTP load balancer and the instances will have different external IPs.
1) Try accessing through one instance's external IP first to make sure the backend works. If it doesn't work, usually it's firewall settings problem.
2) HTTP 502 from load balancer usually indicates the health check of the load balancer thought the backend is unhealthy, check your health check config then.
See another similar question Google Load-balancer randomly failing requests to backend

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js