Websocket timeouts using AWS Application Load Balancer - amazon-web-services

I'm getting gateway time-outs when trying to use a port specifically for websockets using an Application Load Balancer inside an Elastic Beanstalk environment.
The web application and websocket server is held within a Docker container, the application runs fine however wss://domain.com:8080 will just time out.
Here is the Load balancer listeners, using the SSL cert for wss.
The target group it points to is accepting 'Protocol' of HTTP (I've tried HTTPS) and forwards to 8080 onto an EC2 instance. Or.. It should be. (Doesn't appear to be an option for TCP on Application Load Balancers).
I've had a look over the Application Load Balancer logs and it looks like the it reaches the target group, but times out between it's connection to the EC2 instance, and I'm stumped on why.
All AWS Security Groups have been opened on all traffic for the time being, I've checked the host and found that the port is open and being listened to by Nginx which will route to the correct port to the docker container:
docker ps also shows me:
And once inside the container I can see that the port is being listened to by the Websocket server:
So it can't be the EC2 instance itself, can it? Is there an issue routing websockets via ports in an ALB?
-- Edit --
Current SG of the ALB:
The EC2 instance SG:

Accepted answer here seems to be "open Security Groups for EC2 (web server) and ALB inbound & outbound communication on required ports since websockets need two way communication."
This is incorrect and the reason why it solved the problem is coincidental.
Let me explain:
"Websockets needs two way communication..." - Sure but the TCP sessions is only ever opened from one way - from the client.
You don't have to allow any outbound connections from the EC2 instance (web server) in order to use web sockets.
Of course the ALB needs to be able to do TCP connections to the EC2 instance. But not to the client. Why? Well the ALB is accepting TCP connections (usually on port 80 and 443). It is setting up a TCP session that was initiated by the client. It is then trying to set up a new TCP session to the web server behind the ALB. This should be done on the port that you decided to have the web server listening on. The Security Group around the ALB needs to be able to do outbound connections on this port to the web server. This is the reason why "open up everything" worked. It has nothing to do with "two way communication".
You could use any ports of course but you don't need to use any other ports than 80 & 443 (such as 8080) on both the Load Balancer or the EC2.

Websockets need two way communication, make sure security groups attached to all resources (EC2 & ALB) allow both inbound & outbound communication on required ports.

Related

amqplib & AWS MQ: Socket closed abruptly during opening handshake

So the current issue I have is that before I was able to connect properly to my rabbitMQ cluster that was hosted on AWS MQ. After I changed its IP visibility to private I had to create some configuration to access the cluster from outside the VPC.
Current example of how the cluster is accessed:
mq.example.com -> Load balancer (w/target group to cluster host IP & TLS port 5671) in public VPC -> Cluster in private VPC.
I've done the same thing for the web console. Now the web console works perfectly, so the issue isn't necessarily with the load balancing or a certificate issue. I then checked out if the issue could be with the code I wrote, but that is also not the case since sometimes from inside the services it connects, but sometimes it then doesn't. It throws the error: "Socket closed abruptly during opening handshake".
I think I believe where the issue may arise from, however I don't really have a proper view on how to solve it. I believe the issue has to do with the fact that the service has go through the load balancer first before it can connect to the rabbit cluster. I just don't know what to do about it and most documentation on amqplib is obscure as it is. I haven't found any (documented) similar issue with AWS MQ & a load balancer.
So my question, specifically is: How would I be able to resolve the fact that sometimes my services connect and don't connect to the cluster when they go through the load balancer?
Good to know: I use AWS MQ for rabbit, amqplib for the client connection, amqps as the protocol, web console works with the same setup but services don't.
For people who run into this issue later on I have found a solution:
When creating a Network Load Balancer to route traffic to your cluster you have to assign it a target group. Make sure to NOT DO THIS: Do not register both port 5671 (amqps) and 443 (web console) to the same target group. During routing issues will arise like this.
Instead do the following:
Create two target groups on aws EC2:
TG1: Register: TLS - 443 (web console)
TG2: Register: TLS - 5671 (amqps)
Your NLB that is configured to simple routing & alias for IPV4 connections then needs the following listeners:
Listener 1: TLS - 443 and assign it to TG1
Listener 2: TLS - 5671 and assign it to TG2
This should then make sure whenever you connect there is no confusion for the microservice you're trying to connect to the cluster.
You can then connect to your web console with your subdomain:
eg. webconsole.example.com
and to your services: eg. amqps://cluster.example.com:5671 as host (how your host is formatted depends on the library you're using for the clientside)

AWS Security Group connection tracking failing for responses with a body in ASP.NET Core app running in ECS + Fargate

In my application:
ASP.NET Core 3.1 with Kestrel
Running in AWS ECS + Fargate
Services run in a public subnet in the VPC
Tasks listen only in the port 80
Public Network Load Balancer with SSL termination
I want to set the Security Group to allow inbound connections from anywhere (0.0.0.0/0) to port 80, and disallow any outbound connection from inside the task (except, of course, to respond to the allowed requests).
As Security Groups are stateful, the connection tracking should allow the egress of the response to the requests.
In my case, this connection tracking only works for responses without body (just headers). When the response has a body (in my case, >1MB file), they fail. If I allow outbound TCP connections from port 80, they also fail. But if I allow outbound TCP connections for the full range of ports (0-65535), it works fine.
I guess this is because when ASP.NET Core + Kestrel writes the response body it initiates a new connection which is not recognized by the Security Group connection tracking.
Is there any way I can allow only responses to requests, and no other type of outbound connection initiated by the application?
So we're talking about something like that?
Client 11.11.11.11 ----> AWS NLB/ELB public 22.22.22.22 ----> AWS ECS network router or whatever (kubernetes) --------> ECS server instance running a server application 10.3.3.3:8080 (kubernetes pod)
Do you configure the security group on the AWS NLB or on the AWS ECS? (I guess both?)
Security groups should allow incoming traffic if you allow 0.0.0.0/0 port 80.
They are indeed stateful. They will allow the connection to proceed both ways after it is established (meaning the application can send a response).
However firewall state is not kept for more than 60 seconds typically (not sure what technology AWS is using), so the connection can be "lost" if the server takes more than 1 minute to reply. Does the HTTP server take a while to generate the response? If it's a websocket or TCP server instead, does it spend whole minutes at times without sending or receiving any traffic?
The way I see it. We've got two stateful firewalls. The first with the NLB. The second with ECS.
ECS is an equivalent to kubernetes, it must be doing a ton of iptables magic to distribute traffic and track connections. (For reference, regular kubernetes works heavily with iptables and iptables have a bunch of -very important- settings like connection durations and timeouts).
Good news is. If it breaks when you open inbound 0.0.0.0:80, but it works when you open inbound 0.0.0.0:80 + outbound 0.0.0.0:*. This is definitely an issue due to the firewall dropping the connection, most likely due to losing state. (or it's not stateful in the first place but I'm pretty sure security groups are stateful).
The drop could happen on either of the two firewalls. I've never had an issue with a single bare NLB/ELB, so my guess is the problem is in the ECS or the interaction of the two together.
Unfortunately we can't debug that and we have very little information about how this works internally. Your only option will be to work with the AWS support to investigate.

AWS EC2 security group https vs tcp vs ssh

I am confused about configuring the EC2 security group settings.
There are three options (TCP, SSH, HTTPS) and each of them requires you to add an IP/port number.
For context, in my work I'm usually running Flask apps over EC2 and I only want particular people to view them. My question is understanding the difference between TCP, SSH, and HTTPs but more importantly which of these are important for me to configure.
Within the EC2 Console, under Security Groups:
SSH and HTTPS in the Type dropdown, are presets which set the port to 22 and 443 respectively.
TCP is the protocol. Both SSH and HTTPS are TCP.
If you're running a server which you want to expose on a non standard port, you can select Custom TCP Rule, then set the port acordingly.
You should probably have one security group that allows SSH traffic, then assign this security group to the EC2 instances you wish to shell into:
Then have a separate security group that allows the webserver traffic, in this case I also have one for port 80, aswell as 443:
Of course you will then need a server running on that EC2 instance to receive the traffic. This might be a reverse proxy like nginx, which then proxies traffic to the correct port for your app server (run your flask app with something like gunicorn in production).
If nginx and gunicorn are running on the same box, and say gunicorn serves on port 8000, then you wouldn't need a security group for this as it's loopback traffic. Your nginx configuration points to port 8000.
However if you have a separate EC2 instance running gunicorn, you might wish to set up a secuirty group for this to allow internal traffic from your VPC CIDR range:
I only want particular people to view them
This is probably a job for authentication on the app, as oppose to security groups, unless your certain of the public IPs from which you wish people to connect.
In the above examples above a Source of 0.0.0.0/0 is allowing traffic from anywhere to reach that port. The console has a convenient dropdown which lets you set My IP if you only want to allow traffic from the IP you're using to connect to the console. Otherwise you'd need to manually calculate the CIDR blocks.
Hope this helps. It probably raises more questions.
Https/Http are important for you. Both are used with websites. Https is http over SSL, meaning more secure than http. You just need these.
Http/https uses TCP port 80 and 443 by default.
SSH is used to securely access a Unix based server.

AWS Application Load Balancer not working

I have a EC2 cluster with just one EC2 instance, where two services are running:
api1, listening at port 8080
api2, listening at port 9090
If I make requests against EC2 instance and those ports, both APIs work fine.
Now, I want to create a load balancer so I can make requests against http://{load_balancer_ip}/api1 and http://{load_balancer_ip}/api2, but I'm not able to.
I have created two target groups, both with just one instance (the only one I have)
TargetGroup1: Port 8080 and the EC2 instance registered on port 8080
TargetGroup2: Port 9090 and the EC2 instance registered on port 9090
Then, I have created a load balancer with one listener on port 80 and these two path rules:
When /api1, forward to TargetGroup1
When /api2, forward to TargetGroup2
When I make requests against http://{load_balancer_ip}/api1 or http://{load_balancer_ip}/api2 nothing happens; I don't get any response.
What am I missing?
Ok, I found what's happening thanks to this question's first comment:
AWS Application Load Balancer (ALB) path based routing not functioning as expected
Load balancer is not rewriting the URL and my APIs are listening at /, but load balancer is redirecting all the path /api1.
Solved!
(I couldn't mark it as duplicated because question above does not have any accepted answer)

AWS forward port 8000 from elb to port 8000 of EC2

I have en ELB with multiple EC2 instances registered in target groups. I am using port a php application which is running properly. It has SSL.
I want to use port 8000 for my node application. What I would like to do is I want to forward my-elb-address:8000 to any-ec2-ip:8000. So when i access the domain attached to ELB witjh port 8000 it would forward that to ec2 with port 8000. How can I accomplish this? Is their any other way of ELB listening and forwarding multiple ports?
I have added listener for port 80,443 and 8000 in my ELB. Please help
Classic ELB
Using the "classic" ELB you can define custom rules for forwarding the ports in the AWS dashboard:
Mind that the requests will be forwarded to all the available instances, which means in the example above (supposing php is running on the 80, node.js on the 8000) all the instances must have both the services running. If the services are instead on different instances you will need two different load balancers, one per port.
Application ELB
Another option is to use an "application" ELB (ALB).
This option will allow to have single load balancer with fine-grained rules that will allow, for each protocol, to forward the request to a set of instances.
create a "default" ALB
add a new target group (see entry under the Load Balancing section in the sidebar) listening on your custom port
register the instances running your node.js application (right click on the target group)
bind the target group to the listeners of your ALB
Another solution could be, specifying path-based rules, to use only one port (443) and forward only the requests under /to_nodejs to the port 8000.