Load balancer giving failed_to_pick_backend with internet network endpoint group - google-cloud-platform

I have a load balancer setup pointing to an external url via internet network endpoint group (internet NEG)
Now the load balancer returns 502 status code & I see failed_to_pick_backend in the logs. Also the monitoring tab of the load balancer shows INVALID_BACKEND next to the internet NEG I've defined. I've attached screenshots of the view for clarity, latter one is the one that's failing. I've checked the NEGs and they seem identical.
All the suggestions so far mention fixing health checks, but as seen from the docs, internet NEGs does not support health checks.
I was able to create working setup through the UI, but when replicating the setup via terraform, things starts to fail. The only difference I saw was that the setup done via UI, the appropriate forwarding rule had ipVersion: IPV4, but that was not possible to setup through terraform since it takes either ipVersion or ip and I gave the resource ip.
So, what could cause failed_to_pick_backend & INVALID_BACKEND with setup like mine?

I found the answer to my question from another post: https://serverfault.com/a/1065279/965524
google_compute_global_network_endpoint needs to be created before google_compute_backend_service is created so you need to set depends_on = [google_compute_global_network_endpoint.endpoint] to your google_compute_backend_service. Otherwise you will hit errors like described in the question.

Related

Cannot access a public ALB

I have been trying to troubleshoot some connection issues, and I'm struggling with a relatively simple setup.
On my (relatively new) AWS account, I create a new Application Load balancer. I configure it in the following way:
Internet facing
Use the default VPC that came with the account
Across all availability zones
Uses default security group for VPC
Listens on HTTP:80 and returns a fixed response (status 404)
When I then try and use the new dns name assigned, it just hangs. When using curl -v I can see it says:
Trying :80
dig also responds with 3 IPs (I'm assuming for the different zones).
It feels like I'm missing something obvious, but I'm struggling to find it myself.
Can anyone see what I may be missing?
Can you please share a print screen of the default security group and LB configuration?
I am almost sure that the default security group has opened ALL inbound traffic but only for itself (security group).

aws application load balancer metrics not showing

I have created an aws application load balancer. I am trying to test something out on it and I have configured the target group and everything. When I try to hit the load balancer I get a bad gateway error (502), which is expected. However these metrics are not showing up in the monitoring section of the load balancer. I submitted around 5 requests.
Furthermore even after registering an ECS service, I still get bad gatewawy. This is what I see on the load balancer/target groups after registering the service
I have also allowed all traffic inbound and outbound from the two security groups (the security group used by the ECS service and the security group used by the load balancer)
However under the registered target when creating the ECS service I specified two availability zones, but it shows only one registered.
figured it out and its kind of silly. My VPN/network was blocking the call going out to the ALB. I'm not sure why, maybe some sort of network policy. But the url looks something like this my-lb-1123366532.us-west-1.elb.amazonaws.com I wasted almost a day trying to figure this out. I'm just putting it out here in case it helps someone.

gcp classic loadbalancer vs modern loadbalancer doesn't work with websocket

We are having some issues with getting websockets to work with a load balancer in google cloud. We narrowed it down to a difference between the classic load balancer (works fine) and the Https Loadbalancer with advanced traffic management that is selected by default but marked as a preview (does not work).
We have an instance group that definitely supports websockets. I.e. we can connect to it via the ip address.
We set up a load balancer and went for the one with traffic management. That worked fine for normal requests but all the websocket requests fail with a 502. We did not select http/2 (which is documented as not working for this). We tried all sorts of things to get this working. Even though it is documented that this should work out of the box it clearly doesn't.
$ websocat wss://lb.tryformation.com/websocket/messages
websocat: WebSocketError: Received unexpected status code (502 Bad Gateway)
websocat: error running
As a last resort, I then set up a classic lb with the same configuration, same instance group, same health check, same certificate, etc. And this worked on the first try.
So, clearly the new style loadbalancer does not work as advertised when it comes to websockets. The question is: why? Is this a known issue or is there something I should configure to get websockets working with that?
We're fine using the classic lb as it works. But I would like to understand the issue.
FWIW:
Assuming you're using GCP's Global External HTTP(S) "modern" Load Balancer, the documentation states under GCP CLB Overview > WebSocket support states:
The global external HTTP(S) load balancer with advanced traffic management capability does not support Websockets. Websockets work with the global external HTTP(S) load balancer (classic) and regional external HTTP(S) load balancer as expected.
If you're using the regional "modern" LB, keep in mind that these "modern" Load Balancers are still in Preview. I'm sure you've seen this, but I'm only noting this because I've had experience with GCP products in the past that claimed to "support websockets" while in "Preview", but didn't work correctly until avaiable in GA.
Since you didn't provide more details It's impossibler to reproduce it - hence try to conclude anything - there are just too many variables.
From your description it looks like some issue with traffic management in https load balacing - if you can reproduce it you can at Google's IssueTracker - under the load balancing component and describe the issue in more detail; provide detailed reproductions steps and if possible your setup that you used (or any other details that - after that someone will get back to you :)

"ELB health is failing or not available for all instances" + "Request timed out" on Elastic Beanstalk

I have been trying for two days to get rid of this error. It also often says "100.0 % of the requests are failing with HTTP 5xx". I have been reading about troubleshooting from here https://aws.amazon.com/premiumsupport/knowledge-center/elb-fix-failing-health-checks-alb/ but nothing is working. I have tried changing the health check path from '/' to '/healthCheck' as that has worked for some other people.
INFO:
I am using an application load balancer so that I can use HTTPS. I am using t3.micro although I have tried t3.small and t3.medium.
Here are my load balancer settings in the configuration part of console:
My security group for the instance has the same two inbound rules at source 0.0.0.0/0 and outbounds all traffic to 0.0.0.0/0.
And here is some target group info:
Where is the best place to look for this error?
Based on the comments.
The cause of the issue is undetermined. Thus it was decided to make new EB environment in an effort to address the problems.
I had a similar problem, not sure yet how prevalent it will be but it happened to me twice after a number of broken deploys during testing. I even created a specific endpoint to return 200 without luck. It seems to be more of a load balancer problem than an instance problem. A new environment for me cleared up the problem.

http 502 errors when new instance is being created in a group

We are using cross region load balancing. When we get heavy traffic all at once, within 1 region, it begins to spin up new instances. While it is starting new instances, we get random HTTP 502 errors. Screenshots of configurations below. Is there any way to avoid the 502 errors while it is scaling up?
Image links of configuration below.
Instance Group Configuration (same setting on all regions)
Load Balancer
Thanks in advance for the help!
HTTP load balancer and the instances will have different external IPs.
1) Try accessing through one instance's external IP first to make sure the backend works. If it doesn't work, usually it's firewall settings problem.
2) HTTP 502 from load balancer usually indicates the health check of the load balancer thought the backend is unhealthy, check your health check config then.
See another similar question Google Load-balancer randomly failing requests to backend