Slow response when "Unmanaged Instance Group" added to HTTPS Load Balancer - google-cloud-platform

HTTPS Load Balancer Proxy works great with Managed Instance Group but not with unmanaged instance group. We have added few Unmanaged Instance Group to the backend and have instructed Proxy to direct specific traffic to unmanaged group e.g. https://test.example.com to unmanaged instance group. When the testing is done we can stop the instances in unmanaged instance groups. However stopping individual VM instances with in managed group is not possible.
Every thing is working as expected. However, browser takes 10-15 seconds (not always but mostly) to display the page and randomly receives 500 error. It seems that instances in unmanaged group are stopped or Load Balancer does some house keeping which takes long to respond.
Any help or suggestions to fix the response time would be highly appreciated. Direct accessing the web server by avoiding the load balancer works as expected but https can't be used as only Proxy Server has the SSL certificate.

I'm taking an educated guess here based on your detailed description of symptoms.
As you noticed there's something going on "behind the scenes" of the load balancer and either health checks are failing or some other feature that is responsible for "updating" load balancer that test backend is shut off.
This shouldn't be happening and it looks like a bug.
At this poing I think the best way for you is report a new issue at Google's Issuetracker and include detailed description of what happens. You may link to this question too :)

Related

aws application load balancer metrics not showing

I have created an aws application load balancer. I am trying to test something out on it and I have configured the target group and everything. When I try to hit the load balancer I get a bad gateway error (502), which is expected. However these metrics are not showing up in the monitoring section of the load balancer. I submitted around 5 requests.
Furthermore even after registering an ECS service, I still get bad gatewawy. This is what I see on the load balancer/target groups after registering the service
I have also allowed all traffic inbound and outbound from the two security groups (the security group used by the ECS service and the security group used by the load balancer)
However under the registered target when creating the ECS service I specified two availability zones, but it shows only one registered.
figured it out and its kind of silly. My VPN/network was blocking the call going out to the ALB. I'm not sure why, maybe some sort of network policy. But the url looks something like this my-lb-1123366532.us-west-1.elb.amazonaws.com I wasted almost a day trying to figure this out. I'm just putting it out here in case it helps someone.

Load balancer giving failed_to_pick_backend with internet network endpoint group

I have a load balancer setup pointing to an external url via internet network endpoint group (internet NEG)
Now the load balancer returns 502 status code & I see failed_to_pick_backend in the logs. Also the monitoring tab of the load balancer shows INVALID_BACKEND next to the internet NEG I've defined. I've attached screenshots of the view for clarity, latter one is the one that's failing. I've checked the NEGs and they seem identical.
All the suggestions so far mention fixing health checks, but as seen from the docs, internet NEGs does not support health checks.
I was able to create working setup through the UI, but when replicating the setup via terraform, things starts to fail. The only difference I saw was that the setup done via UI, the appropriate forwarding rule had ipVersion: IPV4, but that was not possible to setup through terraform since it takes either ipVersion or ip and I gave the resource ip.
So, what could cause failed_to_pick_backend & INVALID_BACKEND with setup like mine?
I found the answer to my question from another post: https://serverfault.com/a/1065279/965524
google_compute_global_network_endpoint needs to be created before google_compute_backend_service is created so you need to set depends_on = [google_compute_global_network_endpoint.endpoint] to your google_compute_backend_service. Otherwise you will hit errors like described in the question.

gcp classic loadbalancer vs modern loadbalancer doesn't work with websocket

We are having some issues with getting websockets to work with a load balancer in google cloud. We narrowed it down to a difference between the classic load balancer (works fine) and the Https Loadbalancer with advanced traffic management that is selected by default but marked as a preview (does not work).
We have an instance group that definitely supports websockets. I.e. we can connect to it via the ip address.
We set up a load balancer and went for the one with traffic management. That worked fine for normal requests but all the websocket requests fail with a 502. We did not select http/2 (which is documented as not working for this). We tried all sorts of things to get this working. Even though it is documented that this should work out of the box it clearly doesn't.
$ websocat wss://lb.tryformation.com/websocket/messages
websocat: WebSocketError: Received unexpected status code (502 Bad Gateway)
websocat: error running
As a last resort, I then set up a classic lb with the same configuration, same instance group, same health check, same certificate, etc. And this worked on the first try.
So, clearly the new style loadbalancer does not work as advertised when it comes to websockets. The question is: why? Is this a known issue or is there something I should configure to get websockets working with that?
We're fine using the classic lb as it works. But I would like to understand the issue.
FWIW:
Assuming you're using GCP's Global External HTTP(S) "modern" Load Balancer, the documentation states under GCP CLB Overview > WebSocket support states:
The global external HTTP(S) load balancer with advanced traffic management capability does not support Websockets. Websockets work with the global external HTTP(S) load balancer (classic) and regional external HTTP(S) load balancer as expected.
If you're using the regional "modern" LB, keep in mind that these "modern" Load Balancers are still in Preview. I'm sure you've seen this, but I'm only noting this because I've had experience with GCP products in the past that claimed to "support websockets" while in "Preview", but didn't work correctly until avaiable in GA.
Since you didn't provide more details It's impossibler to reproduce it - hence try to conclude anything - there are just too many variables.
From your description it looks like some issue with traffic management in https load balacing - if you can reproduce it you can at Google's IssueTracker - under the load balancing component and describe the issue in more detail; provide detailed reproductions steps and if possible your setup that you used (or any other details that - after that someone will get back to you :)

Is that possible to sticky a AWS Classic Load Balancer session forever?

This question is for the infrastructure pros, hope anyone reaches this text.
I’m currently using a setup with one EC2 instance behind a classic load balancer on AWS running a websocket express based server. I always planed to scale my application so I started it behind a LB.
Now I’m on time to startup another instance, but I have this major problem: My websocket leaves a program running on the server - even when the user is out of the website - and return to show the program log to the user when he comes back to the website.
Of course if the user connects to another instance on the load balancer, he will not be able to access a program running on another instance. So the only solution is to connect a user to the same EC2 instance, always.
I searched a lot but I didn’t find anything related, besides sticky sessions based on cookies. The problem of this solution is that it expires after sometime, and I want my user to access the program log again no matter how much time he spent without doing it.
So my question is: Is there a way to sticky a user connection with the same EC2 instance using a AWS classic load balancer?
In a way that new users follow the standard algorithm, going to be connected to the lower used instance, and old users keeps going to the same EC2 every new connection. Is that possible?
Otherwise I’ll not be able to scale my application delivering, because the main purpose of this server is to connect this running program with a specific user.
I don't think you can customize CLB for that. But ALB just recently introduced Application Cookie Stickiness:
Application Load Balancer (ALB) now supports Application-based cookie stickiness. This new feature helps customers ensure that clients connect to the same load balancer target for the duration of their session using application cookies. This enables customers to achieve a consistent client-server experience with greater controls such as the flexibility to set custom cookie names and criteria for client-target stickiness within a target group.
Thus maybe, if you can migrate from CLB into ALB, the application-level cookies could be solution to your issue.

Setting up a loadbalancer behind a proxy server on Google Cloud Compute engine

I am looking to build a scalable REST webservice on the Google Cloud Compute Engine but have a couple of requirements that I am not sure how best to implement.
Structure so far:
2 Instances running a REST webservice connected to a MySQL Cloud database.
(number of instances to scale up in the future)
Load balancer to split request between the two or more Instances.
this part is fine.
What I need next is that the traffic (POST requests from instances to an external webservice) must come from a single IP address. I assume these requests can not route back through the public IP of the load balancer?
I get the impression the solution to this is to route all requests from instances though a 3rd instance running squid. Is this the best way to do this? (side question)
Now to my main question:
I have been reading about ApiAxle which sounds like a nice proxy for Web Services, giving some good access control, throttling and reporting capabilities.
Can I have an instance running ApiAxle followed by a google cloud Load Balancer which shares the request from the proxy to the backend instances that do the leg work and feed the response back through the ApiAxle proxy, thus having everything though a single IP visible to clients using the API? (letting me add new instances to the pool to add capacity.)
and Would the proxy be much of a bottle neck?
Thanks in advance.
/Dave
(new to this, so sorry if its a stupid question because I cant find anything like this on the web)
Sounds like you need to NAT on your outbound traffic so it appears to come from one IP address. You need to do that via a third instance since Google LB stack doesn't provide this. GCLB works only with inbound connections on the load-balanced IP.
You can setup source-NAT using advanced routing, or you can use a proxy as you suggested.