AWS Security Group connection tracking failing for responses with a body in ASP.NET Core app running in ECS + Fargate - amazon-web-services

In my application:
ASP.NET Core 3.1 with Kestrel
Running in AWS ECS + Fargate
Services run in a public subnet in the VPC
Tasks listen only in the port 80
Public Network Load Balancer with SSL termination
I want to set the Security Group to allow inbound connections from anywhere (0.0.0.0/0) to port 80, and disallow any outbound connection from inside the task (except, of course, to respond to the allowed requests).
As Security Groups are stateful, the connection tracking should allow the egress of the response to the requests.
In my case, this connection tracking only works for responses without body (just headers). When the response has a body (in my case, >1MB file), they fail. If I allow outbound TCP connections from port 80, they also fail. But if I allow outbound TCP connections for the full range of ports (0-65535), it works fine.
I guess this is because when ASP.NET Core + Kestrel writes the response body it initiates a new connection which is not recognized by the Security Group connection tracking.
Is there any way I can allow only responses to requests, and no other type of outbound connection initiated by the application?

So we're talking about something like that?
Client 11.11.11.11 ----> AWS NLB/ELB public 22.22.22.22 ----> AWS ECS network router or whatever (kubernetes) --------> ECS server instance running a server application 10.3.3.3:8080 (kubernetes pod)
Do you configure the security group on the AWS NLB or on the AWS ECS? (I guess both?)
Security groups should allow incoming traffic if you allow 0.0.0.0/0 port 80.
They are indeed stateful. They will allow the connection to proceed both ways after it is established (meaning the application can send a response).
However firewall state is not kept for more than 60 seconds typically (not sure what technology AWS is using), so the connection can be "lost" if the server takes more than 1 minute to reply. Does the HTTP server take a while to generate the response? If it's a websocket or TCP server instead, does it spend whole minutes at times without sending or receiving any traffic?
The way I see it. We've got two stateful firewalls. The first with the NLB. The second with ECS.
ECS is an equivalent to kubernetes, it must be doing a ton of iptables magic to distribute traffic and track connections. (For reference, regular kubernetes works heavily with iptables and iptables have a bunch of -very important- settings like connection durations and timeouts).
Good news is. If it breaks when you open inbound 0.0.0.0:80, but it works when you open inbound 0.0.0.0:80 + outbound 0.0.0.0:*. This is definitely an issue due to the firewall dropping the connection, most likely due to losing state. (or it's not stateful in the first place but I'm pretty sure security groups are stateful).
The drop could happen on either of the two firewalls. I've never had an issue with a single bare NLB/ELB, so my guess is the problem is in the ECS or the interaction of the two together.
Unfortunately we can't debug that and we have very little information about how this works internally. Your only option will be to work with the AWS support to investigate.

Related

amqplib & AWS MQ: Socket closed abruptly during opening handshake

So the current issue I have is that before I was able to connect properly to my rabbitMQ cluster that was hosted on AWS MQ. After I changed its IP visibility to private I had to create some configuration to access the cluster from outside the VPC.
Current example of how the cluster is accessed:
mq.example.com -> Load balancer (w/target group to cluster host IP & TLS port 5671) in public VPC -> Cluster in private VPC.
I've done the same thing for the web console. Now the web console works perfectly, so the issue isn't necessarily with the load balancing or a certificate issue. I then checked out if the issue could be with the code I wrote, but that is also not the case since sometimes from inside the services it connects, but sometimes it then doesn't. It throws the error: "Socket closed abruptly during opening handshake".
I think I believe where the issue may arise from, however I don't really have a proper view on how to solve it. I believe the issue has to do with the fact that the service has go through the load balancer first before it can connect to the rabbit cluster. I just don't know what to do about it and most documentation on amqplib is obscure as it is. I haven't found any (documented) similar issue with AWS MQ & a load balancer.
So my question, specifically is: How would I be able to resolve the fact that sometimes my services connect and don't connect to the cluster when they go through the load balancer?
Good to know: I use AWS MQ for rabbit, amqplib for the client connection, amqps as the protocol, web console works with the same setup but services don't.
For people who run into this issue later on I have found a solution:
When creating a Network Load Balancer to route traffic to your cluster you have to assign it a target group. Make sure to NOT DO THIS: Do not register both port 5671 (amqps) and 443 (web console) to the same target group. During routing issues will arise like this.
Instead do the following:
Create two target groups on aws EC2:
TG1: Register: TLS - 443 (web console)
TG2: Register: TLS - 5671 (amqps)
Your NLB that is configured to simple routing & alias for IPV4 connections then needs the following listeners:
Listener 1: TLS - 443 and assign it to TG1
Listener 2: TLS - 5671 and assign it to TG2
This should then make sure whenever you connect there is no confusion for the microservice you're trying to connect to the cluster.
You can then connect to your web console with your subdomain:
eg. webconsole.example.com
and to your services: eg. amqps://cluster.example.com:5671 as host (how your host is formatted depends on the library you're using for the clientside)

AWS EC2 security group https vs tcp vs ssh

I am confused about configuring the EC2 security group settings.
There are three options (TCP, SSH, HTTPS) and each of them requires you to add an IP/port number.
For context, in my work I'm usually running Flask apps over EC2 and I only want particular people to view them. My question is understanding the difference between TCP, SSH, and HTTPs but more importantly which of these are important for me to configure.
Within the EC2 Console, under Security Groups:
SSH and HTTPS in the Type dropdown, are presets which set the port to 22 and 443 respectively.
TCP is the protocol. Both SSH and HTTPS are TCP.
If you're running a server which you want to expose on a non standard port, you can select Custom TCP Rule, then set the port acordingly.
You should probably have one security group that allows SSH traffic, then assign this security group to the EC2 instances you wish to shell into:
Then have a separate security group that allows the webserver traffic, in this case I also have one for port 80, aswell as 443:
Of course you will then need a server running on that EC2 instance to receive the traffic. This might be a reverse proxy like nginx, which then proxies traffic to the correct port for your app server (run your flask app with something like gunicorn in production).
If nginx and gunicorn are running on the same box, and say gunicorn serves on port 8000, then you wouldn't need a security group for this as it's loopback traffic. Your nginx configuration points to port 8000.
However if you have a separate EC2 instance running gunicorn, you might wish to set up a secuirty group for this to allow internal traffic from your VPC CIDR range:
I only want particular people to view them
This is probably a job for authentication on the app, as oppose to security groups, unless your certain of the public IPs from which you wish people to connect.
In the above examples above a Source of 0.0.0.0/0 is allowing traffic from anywhere to reach that port. The console has a convenient dropdown which lets you set My IP if you only want to allow traffic from the IP you're using to connect to the console. Otherwise you'd need to manually calculate the CIDR blocks.
Hope this helps. It probably raises more questions.
Https/Http are important for you. Both are used with websites. Https is http over SSL, meaning more secure than http. You just need these.
Http/https uses TCP port 80 and 443 by default.
SSH is used to securely access a Unix based server.

Tcp level Information on Ec2

I'm trying to get TCP timestamp from the packets for clock skewing purposes on my application which is hosted on EC2. In my network I have an ALB.
So my question is how do I get TCP level packet information in my app ? Since ALB filters out all the OSI Layers except application level (HTTP)
If the only reason to get access to TCP packet is to detect timestamp and correct clock drift, I would suggest to configure your EC2 instance to use NTP time server instead.
https://aws.amazon.com/blogs/aws/keeping-time-with-amazon-time-sync-service/
That being said, the ALB is not "removing" TCP information from network packets. HTTP connections made to your application are still transported over IP and TCP. If you need low level access to network packets from an app, I would suggest to look at the pCAP library which is used by TCPDUMP and many other tool to capture network traffic on an interface.
https://www.tcpdump.org/
[UPDATED to include comments]
It is important to understand the TCP connection between your client and the ALB is terminated at the ALB level. The ALB creates a second TCP connection to forward HTTP requests to your EC2 instance. The ALB does not remove information from TCP/IP, it just creates a second, independent and new connection. Usually the only information you want to propagate from the initial TCP connection is the source IP address. The ALB, like most load balancers and proxies, captures this information from the original connection (the one received from the client) and embed the information in an HTTP header called X-Forwarded-For.
This is documented at https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/x-forwarded-headers.html
If you want to capture other information from the original connection, I am afraid it will not be possible using ALB. (but I also would be very curious about the use case, i.e. WHAT you're trying to achieve)

Websocket timeouts using AWS Application Load Balancer

I'm getting gateway time-outs when trying to use a port specifically for websockets using an Application Load Balancer inside an Elastic Beanstalk environment.
The web application and websocket server is held within a Docker container, the application runs fine however wss://domain.com:8080 will just time out.
Here is the Load balancer listeners, using the SSL cert for wss.
The target group it points to is accepting 'Protocol' of HTTP (I've tried HTTPS) and forwards to 8080 onto an EC2 instance. Or.. It should be. (Doesn't appear to be an option for TCP on Application Load Balancers).
I've had a look over the Application Load Balancer logs and it looks like the it reaches the target group, but times out between it's connection to the EC2 instance, and I'm stumped on why.
All AWS Security Groups have been opened on all traffic for the time being, I've checked the host and found that the port is open and being listened to by Nginx which will route to the correct port to the docker container:
docker ps also shows me:
And once inside the container I can see that the port is being listened to by the Websocket server:
So it can't be the EC2 instance itself, can it? Is there an issue routing websockets via ports in an ALB?
-- Edit --
Current SG of the ALB:
The EC2 instance SG:
Accepted answer here seems to be "open Security Groups for EC2 (web server) and ALB inbound & outbound communication on required ports since websockets need two way communication."
This is incorrect and the reason why it solved the problem is coincidental.
Let me explain:
"Websockets needs two way communication..." - Sure but the TCP sessions is only ever opened from one way - from the client.
You don't have to allow any outbound connections from the EC2 instance (web server) in order to use web sockets.
Of course the ALB needs to be able to do TCP connections to the EC2 instance. But not to the client. Why? Well the ALB is accepting TCP connections (usually on port 80 and 443). It is setting up a TCP session that was initiated by the client. It is then trying to set up a new TCP session to the web server behind the ALB. This should be done on the port that you decided to have the web server listening on. The Security Group around the ALB needs to be able to do outbound connections on this port to the web server. This is the reason why "open up everything" worked. It has nothing to do with "two way communication".
You could use any ports of course but you don't need to use any other ports than 80 & 443 (such as 8080) on both the Load Balancer or the EC2.
Websockets need two way communication, make sure security groups attached to all resources (EC2 & ALB) allow both inbound & outbound communication on required ports.

Request time out when pinging server on AWS

In order to check the health of a server I have, I want to write a function I can call in order to check whether my service is online.
I used command prompt to ping the IP address of the server, however all of the packets were lost due to request time outs.
I'm guessing I don't need to have a dedicated function related to handle being pinged, and I believe that it is due to the server security protocols denying the request. Currently the server only allows inbound traffic of HTTP requests, and I believe this to be the problem.
For an AWS instance, what protocol rule do I need to add in order to accept ping requests?
In the Security Group for the EC2 instance you should allow inbound ICMP.