Load balancing gRPC requests using one of AWS Load Balancers - amazon-web-services

I'm trying to work out whether I could use one of the (A/E/N)LBs to load balance gRPC traffic. A simple round robin would suffice in our case.
I've read that ALB doesn't fully support HTTP2 and therefore can't be used with gRPC. Specifically lack of support of sending HTTP2 traffic downstream and lack of support for trailer headers was mentioned. Is it still true?
Couldn't find any definitive answers with regards to NLBs or "classic" ELBs. Any hints?

As of October 29, 2020, Application Load Balancers now support HTTP/2 and gRPC load balancing. From the announcement:
To use the feature on your ALB, choose HTTPS as your listener protocol, gRPC as the protocol version for your target group and register instance or IP as targets for the configured target group. ALB provides rich content based routing features that will let you inspect gRPC calls and route them to the appropriate target group based on the service and method requested. Within a target group, ALB will use gRPC specific health checks to determine availability of targets and provide gRPC specific access logs to monitor your traffic.
The support for gRPC and end-to-end HTTP/2 is available for existing and new Application Load Balancers at no extra charge in all AWS Regions. To learn more, please refer to the blog post, demo, and the ALB documentation.

Using gRPC on AWS had some major challenges. Without full HTTP/2 support on AWS Application Load Balancer, you have to spin up and manage your own load balancers. Neither NLB and ELB are viable alternatives on AWS due to issues with traffic to and from the same host, dynamic port mappings, SSL termination complications, and sub-optimal client and server-side round-robining of TCP connections.
gRPC demonstrated performance improvements, however, it would take considerable infrastructure efforts to adopt, whether it be using LBs such as Nginx or Envoy; or setting up a service mesh with something of the likes of Istio. Another possibility would be to make use of thick client load balancing, though this would also require additional service discovery infrastructures such as Consul or ZooKeeper.
AWS recently announced a new service called AWS App Mesh. AWS App Mesh supports HTTP2 and gRPC services
gRPC can now model and manage their inter-service communications using AWS App Mesh.
Reference:
https://aws.amazon.com/about-aws/whats-new/2019/11/aws-app-mesh-now-supports-http2-and-grpc-services/
https://aws.amazon.com/app-mesh/
https://docs.aws.amazon.com/app-mesh/latest/userguide/what-is-app-mesh.html

Related

Enforce AES-256 to AWS elastic beanstalk

Disclaimer: I am fully aware that AES-128 is considered secure but we have wierd governmental requirements.
We run a server that provides a websocket interface with our clients as an elastic beanstalk application on AWS. It has an application load balancer in front of it which handles the HTTPS termination. We have a strange requirement on our system where all channels need to have > 200 bits encryption.
When our clients (which are IoT devices) establishes the connection the agreed on encryption becomes AES-128 (because all security policies in AWS accepts AES-128 and the devices do to).
The only way to, on the server-side, enforce AES-256 is to use the classic load balancer and add the ciphers ourselves. However, the classic load balancer does not support websocket.
Is there any possible way of circumventing this? Or do we need to add our own encryption to our channel to fulfill the requirements.
I believe that the best you could do with an Application Load Balancer (ALB) is to configure it to use the FIPS ELBSecurityPolicy-FS-1-2-Res-2020-10 security policy, however it will still be possible to negotiate ECDHE-ECDSA-AES128-GCM-SHA256 and ECDHE-RSA-AES128-GCM-SHA256. based on the table in the docs which will allow AES-128 as encryption method.
Another option would be to put an WebSocket API Gateway but the ciphers are pretty much the same and you might need to deal with the throttling in that case which is probably not the best thing to do considering the IoT clients.
Putting CloudFront in front of the ALB is not going to cut it either, as it has the same approach and the ciphers in the security policies for it are essentially the same
The security policy of the Network Load Balancer (NLB) is actually the same as the one of the ALB.
Essentially all possible AWS services are relying on the same security policies.
Which leads us to the two final options:
trying somehow to force it on client ends, which is most likely not possible
or replacing the ALB with a Network Load Balancer (which supports WebSockets) as suggested by #Mark B, setting up TCP listeners on it and handling the SSL yourself server side in your EB application which varies based on your application platform, but you should be able to enforce stricter (AES256) ciphers.

Does global layer 7 http(s) load balancer has option to rate limiting?

I use global http(s) load balancer for backend services running on Kubernetes cluster. I didn't find any information on how to limit number of requests in a time window from one IP. There is Cloud Armor, but there also simple IP, region, and header based access can be performed. Could you please share how can I perform IP based rate limitation on global http load balancer on google cloud to provide defence against DoS attacks.
Edit:
The backend service on running on Kubernetes cluster is a symfony server with web interface. I want to use Cloud CDN for the server therefore I had to use gce ingress instead of ingress-nginx. On google cloud, gce ingress creates a global HTTP(s) load balancer and ingress-nginx creates TCP load balancer.
In the nginx-ingress, I could simply use nginx.ingress.kubernetes.io/limit-rps annotation, which helps in limiting flood of http requests. I want to do the similar configuration on my global HTTP(s) load balancer. In the current setting, I observed that flood of http requests are sent to the load balancer which are forwarded to the symfony server and at one point latency of request increases. Which makes the liveness probe fail for the pod.
Cloud Armor is a WAF that you can configure to protect your service against DoS attacks, especially by blocking specific IPs.
Rate limiting isn't to protect your service against DDoS. Indeed, if the attack flood your rate limiting service, your valid IPs and the bad IPs won't be served, because your service is flooded: it's a denial of service
Rate limiting helps to preserve resource for legit users. But some can try to overcome some constraint by using in a different (wrong/bad) manner your APIs.
For example, you have a paid API to export all the customer. You have a free API to request 1 customer. A user can say "Hey I don't want to pay, I will request in a loop the single customer API to create my daily report like that!". It's a misuse of the single customer API and you can protect it against this misuse with rate limiting
You can use Cloud Endpoint and ESP (Extensible Service Proxy). I wrote an article with an ESP deployed on Cloud Run, but you can deploy it on K8S also.
You can also use API gateway which is the managed service of ESP, that will be soon plugable on HTTPS load balancer (to use it in addition to WAF protection).

Are there two levels of load balancing when using Istio Destination Rules?

As far as I understood, Istio Destination Rules can define load balancing policies to reach a subset of a service, e.g. subset based on different versions of the service. So the Destination Rules are the first level of load balancing.
The request will eventually reach a K8s service which is generally implemented by kube-proxy. Kube-proxy does a simple load-balancing with the pods in its back-end. Here is the second level of load balancing.
Is there a way to remove the second load-balancer? For example, could we create a lot of services instances that offer the same service and can be load-balanced by Destination Rules and then have only one pod per service instance, so that kube-proxy does not apply load-balancing?
According to istio documentation:
Istio’s traffic routing rules let you easily control the flow of traffic and API calls between services. Istio simplifies configuration of service-level properties like circuit breakers, timeouts, and retries, and makes it easy to set up important tasks like A/B testing, canary rollouts, and staged rollouts with percentage-based traffic splits. It also provides out-of-box failure recovery features that help make your application more robust against failures of dependent services or the network.
Istio’s traffic management model relies on the Envoy proxies that are deployed along with your services. All traffic that your mesh services send and receive (data plane traffic) is proxied through Envoy, making it easy to direct and control traffic around your mesh without making any changes to your services.
If you’re interested in the details of how the features described in this guide work, you can find out more about Istio’s traffic management implementation in the architecture overview. The rest of this guide introduces Istio’s traffic management features.
This means that the istio service mesh is communicating via envoy proxy which in turn relies on kubernetes networking.
We can have an example where a VirtualService that is using istio ingress gateway load-balances it's traffic to two different services based on labels. Then those services can have multiple pods.
Istio load-balancing in this case works only on (layer 7) which results with route to specific endpoint (one of the services) and relies on kubernetes to handle connections and the rest including service round-robin load-balancing (layer 4) in case of multiple pods.
The advantage of having single service with multiple pods is obviously easier configuration and management. In case of 1 pod per service, each service would need to be reconfigured separately and loses all of its ability to scale features.
There is a great video on Youtube which partially covers this topic:
Life of a packet through Istio by Matt Turner.
I highly recommend watching as it explains how istio works on a fundamental level.

AWS Application Load Balancer WebSocket metadata based stickiness?

We have a cluster of some service. The clients connect to the cluster via Websocket. The clients are targeted to nodes based on the group they belong to (lets call it a "conference").
In other words, a whole group of clients (conference) is served by one particular node. So, the target node should be selected based on some metadata sent when initiating the WebSocket connection.
Client Tom Hanks connects to Actors conference -> LB routes to node EU Server
Client Tom Hanks connects to Tesla fans conference -> LB routes to node USA Server
Client Ada Zizkova connects to Actors conference -> LB routes to node EU Server
Client Ada Zizkova connects to Tesla fans conference -> LB routes to node USA Server
...
Notice that this is NOT a HTTP session based stickiness. The HTTP session is the same one for the same user.
All this is what we would like to have. But currently we are at the simple AWS Elastic Load Balancer and we are about to implement this stickiness in-house and bypass the ELB.
Before doing that, I am looking into whether the ALB could do what I described above. Can't find anything, just this: Does an Application Load Balancer support WebSockets? Which looks like a general connection stickiness. See AWS docs here.
How can I do a metadata-based WebSocket stickiness with ALB? (Or with something else in AWS).
For most of the applications, you can use AWS ELB(Classic Load Balancers) by "Sticky Sessions" feature.
By default, a Classic Load Balancer routes each request independently to the registered instance with the smallest load. However, you can use the sticky session feature (also known as session affinity), which enables the load balancer to bind a user's session to a specific instance. This ensures that all requests from the user during the session are sent to the same instance.
The key to managing sticky sessions is to determine how long your load balancer should consistently route the user's request to the same instance.
Also, the WebSockets connections are inherently sticky. If the client requests a connection upgrade to WebSockets, the target that returns an HTTP 101 status code to accept the connection upgrade is the target used in the WebSockets connection. After the WebSockets upgrade is complete, cookie-based stickiness is not used.
For more information, read the following doc on the AWS website:
https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-sticky-sessions.html
Eventually, you can use AWS ALB (Application Load Balancer), ALB supports Web Sockets.
Just replace the ELB with the ALB and enable sticky sessions.
The Application Load Balancer is designed to handle streaming, real-time, and WebSocket workloads in an optimized fashion. Instead of buffering requests and responses, it handles them in streaming fashion. This reduces latency and increases the perceived performance of your application.
For more information about AWS ALB, read the following doc on the AWS website:
https://docs.aws.amazon.com/elasticloadbalancing/latest/application/introduction.html

Is HTTPS->HTTP behind load balancer considered secure?

I have a secure web API in the AWS cloud and I'm trying to figure out the best way to put it behind a load balancer without compromising security.
Right now, all communications are conventionally encrypted end-to-end. The API server has a Let's Encrypt certificate, which is used to treat all messages exchanged with clients. Unless the encryption is broken, nobody besides the server and its clients can view the raw contents of messages.
If I start using a load balancer and allow multiple instances of my server to run concurrently, I'll have to give up on LE and use centralized certificate management (e.g. ACM). AWS conveniently supports linking ACM-generated certificates to load balancer HTTPS listeners. This is especially useful for automatic renewal. However, the load balancer would then remove the encryption layer, and all communications with the instances of my server would be decrypted from that point on.
I'm not too comfortable having my raw data traveling in a public cloud. Still, I'd welcome a second opinion on this.
My question therefore is: Is it considered secure to have load balancer strip HTTPS encryption layer and forward all traffic as HTTP to internal server instances?
Since I can guess the answer, I would appreciate any suggestions on how to deploy load balancing securely.
I consider it secure because each AWS VPC is isolated from another.
The traffic of one VPC cannot be captured in another VPC. Of course whether AWS VPC technology is secure remains to be seen as others have said.
Also check out the documentation from EBS about secure end-to-end encryption. It says that:
Terminating secure connections at the load balancer and using HTTP on the backend may be sufficient for your application. Network traffic between AWS resources cannot be listened to by instances that are not part of the connection, even if they are running under the same account.