gcp classic loadbalancer vs modern loadbalancer doesn't work with websocket - google-cloud-platform

We are having some issues with getting websockets to work with a load balancer in google cloud. We narrowed it down to a difference between the classic load balancer (works fine) and the Https Loadbalancer with advanced traffic management that is selected by default but marked as a preview (does not work).
We have an instance group that definitely supports websockets. I.e. we can connect to it via the ip address.
We set up a load balancer and went for the one with traffic management. That worked fine for normal requests but all the websocket requests fail with a 502. We did not select http/2 (which is documented as not working for this). We tried all sorts of things to get this working. Even though it is documented that this should work out of the box it clearly doesn't.
$ websocat wss://lb.tryformation.com/websocket/messages
websocat: WebSocketError: Received unexpected status code (502 Bad Gateway)
websocat: error running
As a last resort, I then set up a classic lb with the same configuration, same instance group, same health check, same certificate, etc. And this worked on the first try.
So, clearly the new style loadbalancer does not work as advertised when it comes to websockets. The question is: why? Is this a known issue or is there something I should configure to get websockets working with that?
We're fine using the classic lb as it works. But I would like to understand the issue.

FWIW:
Assuming you're using GCP's Global External HTTP(S) "modern" Load Balancer, the documentation states under GCP CLB Overview > WebSocket support states:
The global external HTTP(S) load balancer with advanced traffic management capability does not support Websockets. Websockets work with the global external HTTP(S) load balancer (classic) and regional external HTTP(S) load balancer as expected.
If you're using the regional "modern" LB, keep in mind that these "modern" Load Balancers are still in Preview. I'm sure you've seen this, but I'm only noting this because I've had experience with GCP products in the past that claimed to "support websockets" while in "Preview", but didn't work correctly until avaiable in GA.

Since you didn't provide more details It's impossibler to reproduce it - hence try to conclude anything - there are just too many variables.
From your description it looks like some issue with traffic management in https load balacing - if you can reproduce it you can at Google's IssueTracker - under the load balancing component and describe the issue in more detail; provide detailed reproductions steps and if possible your setup that you used (or any other details that - after that someone will get back to you :)

Related

Load balancer giving failed_to_pick_backend with internet network endpoint group

I have a load balancer setup pointing to an external url via internet network endpoint group (internet NEG)
Now the load balancer returns 502 status code & I see failed_to_pick_backend in the logs. Also the monitoring tab of the load balancer shows INVALID_BACKEND next to the internet NEG I've defined. I've attached screenshots of the view for clarity, latter one is the one that's failing. I've checked the NEGs and they seem identical.
All the suggestions so far mention fixing health checks, but as seen from the docs, internet NEGs does not support health checks.
I was able to create working setup through the UI, but when replicating the setup via terraform, things starts to fail. The only difference I saw was that the setup done via UI, the appropriate forwarding rule had ipVersion: IPV4, but that was not possible to setup through terraform since it takes either ipVersion or ip and I gave the resource ip.
So, what could cause failed_to_pick_backend & INVALID_BACKEND with setup like mine?
I found the answer to my question from another post: https://serverfault.com/a/1065279/965524
google_compute_global_network_endpoint needs to be created before google_compute_backend_service is created so you need to set depends_on = [google_compute_global_network_endpoint.endpoint] to your google_compute_backend_service. Otherwise you will hit errors like described in the question.

Load Balancers methodology (L4, L7) and AWS

Recently I have been trying to catch up with the knowledge that I'm missing around Load Balancing internals and I have found this great article
But it made me think about more questions than before;)
Till now I understood that if we talk about L4 LB we can differentiate:
LB terminate type - that creates for each incoming connection, a new one to backend.
LB passthrough type - that might be split into NAT, or Direct routing one (eventually with tunneling)
Now one of my questions that came to my mind is that how does it fit into AWS world - what type of LB is AWS Network Load Balancer in that case?
The next thing is about L7 LB's.
Does layer 7 LB also relies on NAT, or Direct routing? Or it's completely beyond that? When it's quite a lot of materials around layer 4, typically layer 7 is really poor in terms of proper articles covering internals - I know only top products like: haproxy or nginx, but still don't get the difference between them :(
I will be very thankful for anyone who might provide me with at least some piece of advice how connect the dots there :)
what type of LB is AWS Network Load Balancer in that case?
Network Load Balancer is a layer 4 load balancing, check the information provided on the amazon docs:
A Network Load Balancer functions at the fourth layer of the Open Systems Interconnection (OSI) model. It can handle millions of requests per second. After the load balancer receives a connection request, it selects a target from the target group for the default rule. It attempts to open a TCP connection to the selected target on the port specified in the listener configuration.
https://docs.aws.amazon.com/elasticloadbalancing/latest/network/introduction.html
For the second question.
Does layer 7 LB also relies on NAT, or Direct routing? Or it's completely beyond that?
In the amazon world application load balancer is the only wich support layer 7 LB and works like mention in the article
"The client makes a single HTTP/2 TCP connection to the load balancer. The load balancer then proceeds to make two backend connections"
So for the client is direct connection with the load balancer, and that connection is splited to be served at backend pools.

Slow response when "Unmanaged Instance Group" added to HTTPS Load Balancer

HTTPS Load Balancer Proxy works great with Managed Instance Group but not with unmanaged instance group. We have added few Unmanaged Instance Group to the backend and have instructed Proxy to direct specific traffic to unmanaged group e.g. https://test.example.com to unmanaged instance group. When the testing is done we can stop the instances in unmanaged instance groups. However stopping individual VM instances with in managed group is not possible.
Every thing is working as expected. However, browser takes 10-15 seconds (not always but mostly) to display the page and randomly receives 500 error. It seems that instances in unmanaged group are stopped or Load Balancer does some house keeping which takes long to respond.
Any help or suggestions to fix the response time would be highly appreciated. Direct accessing the web server by avoiding the load balancer works as expected but https can't be used as only Proxy Server has the SSL certificate.
I'm taking an educated guess here based on your detailed description of symptoms.
As you noticed there's something going on "behind the scenes" of the load balancer and either health checks are failing or some other feature that is responsible for "updating" load balancer that test backend is shut off.
This shouldn't be happening and it looks like a bug.
At this poing I think the best way for you is report a new issue at Google's Issuetracker and include detailed description of what happens. You may link to this question too :)

AWS Application load balancer rule not working for cookies ? What could have gone wrong?

I work in the dev-ops team at my company. Recently, we shifted to aws's application load balancer and we are forwarding the request based on a cookie's value. For some reason, the rule isn't working and AWS doesn't support logs to get information on why a rule faied.
There could be 2 reasons for this, that we can think of:
Load balancer isn't able to read the cookie: We don't think this should be the issue as the applications under this load balancer are able to read and also print the cookies.
The load balancer doesn't read subsequent cookies after the first request: We have raised a concern with AWS on this and they are still to get back.
Meanwhile, can anyone point to any possible issues which we might be overlooking?

Amazon ELB forwarding http request changes request.RemoteAddress

We are using Amazon EC2 services to host our play application on live. I have a quite important problem with Elastic Load Balancer. In my application I need request remote address and I am using play framework controller's request.remoteAddress property for it. However on Amazon it is stored always as load balancers ip address which is misleading us as we cannot track request remote address.
Is there something like a setting in Amazon ELB Configuration for forwarded requests? In Apache i think there is a solution for that but I have skimmed through ELB documentation and could not find any clue.
i think you can use this when you a behind an proxy or load balancer:
String ip = Http.Request.current().headers.get("x-forwarded-for")
I think I found an elegant solution that Play Framework have a support such as XForwardedSupport. I am planning to test it in a short while.
Details of XForwardedSupport is here