AWS Application Load Balancer Web Socket Issues - amazon-web-services

I have used EB to create an environment with Tomcat 8, Java 8 configured as load balanced, auto scaling.
Deployed a WebSocket server on this.
On the EC2 running instance, my websocket client (tyrus API) is able to communicate over websocket, for example ws://ip/chat
Now I need a TCP connection (count) based auto scaling strategy, for which switched to Application Load Balancer (ALB) with a target group pointing to this EC2 instance.
Stickiness has been enabled and ALB is using HTTP listener on port 80 with listener rule "/chat" pointing to this target group.
All involved SG have All TCP in and out traffic enabled for testing.
Invoking ws://ELB/chat results doesnt work resulting in a 404:
Caused by: org.glassfish.tyrus.core.HandshakeException: Response code was not 101: 404
Any inputs on how this should be configured.
Final aim is to be able to communicate with WebSocket server over ALB and then auto scale based on TCP "ActiveConnectionCount"

Related

Regarding TLS termination at application load balancer AWS

I have application in container , which runs on protocol HTTP with port 1429. Conainer is deployed in AWS EKS. I have conferred the ALB with certificate. Listener port is HTTPS and port 443.
I need to terminate TLS at ALB and forward request http to 1429.
I configured ingress target port as 1429.
I am getting target TLS Negotiation Error in cloud watch metrics.
Any suggestions on this.
I would double check that the target group protocol is set to HTTP. Seeing as your application is deployed to EKS you could port-forward to the port in question and make a HTTP curl request to ensure that no TLS errors are thrown and that the request is handled as expected.

AWS: Using the application load balancer with ECS - Requests not reaching tasks

When I send requests using the ALB's DNS host, the listener's path, and the web services endpoint path, I don't get a response within the expected timeframe, which I've determined by successfully sending
requests directly to each of the tasks using their public ip addresses, they return successful responses.
For example:
The ALB's DNS entry: http://myapp-alb-11111111.us-west-1.elb.amazonaws.com
The web app, "abc", listens on port 80 for requests on "/api/health".
The web app is using "abc-svc/*" as the path in the listener.
The web app was assigned a public ip address of 10.88.77.66.
Sending a GET request to 'http://10.88.77.66/api/health' is successful.
Sending a GET request to 'http://myapp-alb-11111111.us-west-1.elb.amazonaws.com/abc-svc/api/health' does not return within several minutes, which is not expected behavior.
I've looked through the logs, but cannot find anything that is amiss. I'd appreciate any ideas or suggestions...
AWS CONFIGURATION
I have three docker images that are running in ECS. Each image is assigned to a separate service. Each service has a single task. Port 80 is open in the security group from the Internet to the ALB. Port 80 is open from the ALB to each task. The ALB's listener for port 80 is using path-based routing. There is a separate, unique path for each service. Each task contains a docker linux, spring boot 2, web service. Each web service's router has a "/api/health" route that expects a GET request with no parameters and returns a simple string. We are not using HTTP or SSL at this time.
Thank you for your time and interest.
Mike
There is a different reason for that but some of the common issues that you can debug
Check health check for each target group under LB target group, if its unhealthy LB will never route the traffic
Verify the target port is correct
Verify Target group associated properly with LB and is not showing unused.
Verify LB security group
Check the response from LB is it gateway timeout or service unavailbe if gateway timeout its not reachable if service unavailable probably restarting
Services Event logs, check that service is in steady-state or not, if not its mean restarting again and again
Check deployment logs of service, if you see unhealthy target group message then update the target group health path with status code

aws ECS, ECS instance is not registered to ALB target group

I create ECS service and it runs 1 ecs instance and I can see the instance is registered as a target of the load balancer.
Now I trigger a Auto Scaling Group (by just incrementing desired instance count) to launch a new instance.
The instance is launched and added to the ECS cluster. (I can see it on ECS instances tab)
But the instance is not added to the ALB target. (I expect to see 2 instances in the following image, but I only see 1)
I can edit AutoScalingGroup 's target group like the following
Then I see the following .
But the health check fails. It seems the 80 port is not reachable.
Although I have port 80 open for public in the security group for the instance. (Also, instance created from ecs service uses dynamic port mapping but instance created by ALS does not)
So AutoScalingGroup can launch new instance but my load balancer never gives traffic to the new instance.
I did try https://aws.amazon.com/premiumsupport/knowledge-center/troubleshoot-unhealthy-checks-ecs/?nc1=h_ls and it shows I can connect to port 80 from host to the docker container by something like curl -v http://${IPADDR}/health.
So it must be the case that there's something wrong with host port 80 (load balancer can't connect to it).
But it is also the case the security group setting is not wrong, because the working instance and this non working instance is using the same SG.
Edit
Because I used dynamic mapping, my webserver is running on some random port.
As you can see the instance started by ecs service has registered itself to target group with random port.
However instance started by ALB has registered itself to target group with port 80.
The instance will not be added to the target group if it's not healthy. So you need to fix the health check first.
From your first instance, your mapped port is 32769 so I assume if this is the same target group and if it is the same application then the port in new instance should be 32769.
When you curl the IP endpoint curl -I -v http://${IPADDR}/health. is the HTTP status code was 200, if it is 200 then it should be healthy if it's not 200 then update the backend http-status code or you can update health check HTTP status code.
I assume that you are also running ECS in both instances, so ECS create target group against each ECS services, are you running some mix services that you need target group in AS group? if you are running dynamic port then remove the health check path to traffic port.
Now if we look the offical possible causes for 502 bad Gateway
Dynamic port mapping is a feature of container instance in Amazon Elastic Container Service (Amazon ECS)
Dynamic port mapping with an Application Load Balancer makes it easier
to run multiple tasks on the same Amazon ECS service on an Amazon ECS
cluster.
With the Classic Load Balancer, you must statically map port numbers
on a container instance. The Classic Load Balancer does not allow you
to run multiple copies of a task on the same instance because the
ports conflict. An Application Load Balancer uses dynamic port mapping
so that you can run multiple tasks from a single service on the same
container instance.
Your created target group will not work with dynamic port, you have to bind the target group with ECS services.
dynamic-port-mapping-ecs
HTTP 502: Bad Gateway
Possible causes:
The load balancer received a TCP RST from the target when attempting to establish a connection.
The load balancer received an unexpected response from the target, such as "ICMP Destination unreachable (Host unreachable)", when attempting to establish a connection. Check whether traffic is allowed from the load balancer subnets to the targets on the target port.
The target closed the connection with a TCP RST or a TCP FIN while the load balancer had an outstanding request to the target. Check whether the keep-alive duration of the target is shorter than the idle timeout value of the load balancer.
The target response is malformed or contains HTTP headers that are not valid.
The load balancer encountered an SSL handshake error or SSL handshake timeout (10 seconds) when connecting to a target.
The deregistration delay period elapsed for a request being handled by a target that was deregistered. Increase the delay period so that lengthy operations can complete.
http-502-issues
It seems you know the root cause, which is that port 80 is failing the health check and thats why it is never added to ALB. Here is what you can try
First, check that your service is listening on port 80 on the new host. You can use command like netcat
nv -v localhost 80
Once you know that the service is listening, the recommended way to allow your ALB to connect to your host is to add a Security group inbound rule for your instance to allow traffic from your ALB security group on port 80

how to configure the aws elastic beanstalk load balancer for different layers?

I read the aws documentation inside the elastic beanstalk program where aws is responsible for scaling the servers and auto managing it. In the same documentation there is an option for changing and configuring the load balancer. In my case I want to change it to balance the requests that come to the servers on the IP network layer (L3), but it says that only HTTP and TCP can be listened and balanced.
I am developing a chat application backend that need to be developed with scaling in considerations. how can I configure the load balancer to listen on L3 ?
the chat application in order to work it must make the tcp connection with the server not the load balancer so that's why I must load the packets on the IP layer to the server so the server can establish a tcp connection with the app ( if I am wrong and I can do it on the tcp layer tell me ).
If I can't, does that give me another option or I will just be forced on using ec2 and handle all the system management overhead myself and create my own load balancer ?
ELB Classic operates at either Layer 4 or Layer 7. Those are the options.
the chat application in order to work it must make the tcp connection with the server not the load balancer so that's why I must load the packets on the IP layer to the server so the server can establish a tcp connection with the app.
You're actually incorrect about this. If you need to know the client's source IP address, you can enable the Proxy Protocol on your ELB, and support this in your server code.
When the ELB establishes each new connection to the instance, with the Proxy Protocol enabled, the ELB emits a single line preamble containing the 5-way tuple describing the external connection, which your application can interpret. Then it opens up the L4 connection's payload streams and is transparent for the remainder of the connection.
http://docs.aws.amazon.com/elasticloadbalancing/latest/classic/enable-proxy-protocol.html

Websocket timeouts using AWS Application Load Balancer

I'm getting gateway time-outs when trying to use a port specifically for websockets using an Application Load Balancer inside an Elastic Beanstalk environment.
The web application and websocket server is held within a Docker container, the application runs fine however wss://domain.com:8080 will just time out.
Here is the Load balancer listeners, using the SSL cert for wss.
The target group it points to is accepting 'Protocol' of HTTP (I've tried HTTPS) and forwards to 8080 onto an EC2 instance. Or.. It should be. (Doesn't appear to be an option for TCP on Application Load Balancers).
I've had a look over the Application Load Balancer logs and it looks like the it reaches the target group, but times out between it's connection to the EC2 instance, and I'm stumped on why.
All AWS Security Groups have been opened on all traffic for the time being, I've checked the host and found that the port is open and being listened to by Nginx which will route to the correct port to the docker container:
docker ps also shows me:
And once inside the container I can see that the port is being listened to by the Websocket server:
So it can't be the EC2 instance itself, can it? Is there an issue routing websockets via ports in an ALB?
-- Edit --
Current SG of the ALB:
The EC2 instance SG:
Accepted answer here seems to be "open Security Groups for EC2 (web server) and ALB inbound & outbound communication on required ports since websockets need two way communication."
This is incorrect and the reason why it solved the problem is coincidental.
Let me explain:
"Websockets needs two way communication..." - Sure but the TCP sessions is only ever opened from one way - from the client.
You don't have to allow any outbound connections from the EC2 instance (web server) in order to use web sockets.
Of course the ALB needs to be able to do TCP connections to the EC2 instance. But not to the client. Why? Well the ALB is accepting TCP connections (usually on port 80 and 443). It is setting up a TCP session that was initiated by the client. It is then trying to set up a new TCP session to the web server behind the ALB. This should be done on the port that you decided to have the web server listening on. The Security Group around the ALB needs to be able to do outbound connections on this port to the web server. This is the reason why "open up everything" worked. It has nothing to do with "two way communication".
You could use any ports of course but you don't need to use any other ports than 80 & 443 (such as 8080) on both the Load Balancer or the EC2.
Websockets need two way communication, make sure security groups attached to all resources (EC2 & ALB) allow both inbound & outbound communication on required ports.