Why is AWS NACL stateless? - amazon-web-services

From what I read, stateless firewalls are used more for packet filtering. Why is AWS NACL stateless?
NACLs force too big a range of ports to be opened for the ephemeral ports.
Is there a way to create stateful firewalls on AWS other than Security Groups? Security Groups feel too granular and may get omitted by mistake.

Network Access Control Lists (ACLs) mimic traditional firewalls implemented on hardware routers. Such routers are used to separate subnets and allow the creation of separate zones, such as a DMZ. They purely filter based upon the content of the packet. That is their job.
Security Groups are an added capability in AWS that provides firewall-like capabilities at the resource level. (To be accurate, they are attached to Elastic Network Interfaces, ENIs). They are stateful, meaning that they allow return traffic to flow.
In general, the recommendation is to leave NACLs at their default settings (allow all traffic IN & OUT). They should only be changed if there is a specific need to block certain types of traffic at the subnet level.
Security Groups are the ideal way to control stateful traffic going in and out of a VPC-attached resource. They are THE way to create stateful firewalls. There is no other such capability provided by a VPC. If you wanted something different, you could route traffic through an Amazon EC2 instance acting as a NAT and then you would have full control over how it behaves.

NACl is stateless. That means by default access is denied at inbound and outbound. If you allow some traffic (TCP or other) inbound, outbound has to be explicitly allowed (of course if you want that).

Related

Downside of using public subnet with strict whitelisting vs using private subnet for AWS cloud resources

AWS recommends using private subnets for private resources.
Use private subnets for your instances if they should not be accessed directly from the internet. Use a bastion host or NAT gateway for internet access from an instance in a private subnet.
However, I want to understand the rationale on how is this better on putting the resource, lets say an EC2 instance on a public subnet. Then add some very strict security group to prevent public access. How is this the less secure approach? Or is it technically the same outcome security wise?
I've never heard of a security group failing, so if you properly configure your security group with a restricted list of IP addresses/ports, you should be secure.
BUT
In a typical cloud-deployed application, you do not have or want strictly-controlled access. Instead, the typical cloud-deployed application is a web-app that exposes port 80 to the world.
And once you expose any port to the world, your security is entirely dependent on what is listening to that port. Do you have a vulnerability in your web-server? You've now given your attacker the ability to access resources inside your network. If your server has AWS access keys, then the attacker has them as well.
The goal of putting your servers in a private subnet, with a load balancer in front of them, is to reduce your attack surface. It's presumably less likely that attackers will be able to find an exploit in an ALB (versus Apache, nginx, or whatever you're using), and presumably more likely that AWS will be able to mitigate any such exploit faster than you can (because they don't need to wait for patches to become available from an external maintainer).
Of course, the code you wrote could have an exploit that's triggered from a standard HTTP(S) request. However, even in this case, you can reduce blast radius by controlling what your application can access. An instance with a public IP can access anything on the Internet unless you strictly control the egress rules in its security group. In a private subnet, it can only access stuff within the VPC.
So, ultimately, it's a matter of simplicity: yes, you can craft a secure environment where every host is on the Internet. That was, in fact, the way that AWS worked prior to the introduction of VPCs. But it's easier to rely on the VPC to provide a base level of security (just like, in non-cloud deployments, you rely on the corporate firewall to provide a base level of security).

Why is it that an NLB in AWS does not require a Security Group?

In AWS, while configuring CLB and ALB type of Load balancers, it is mandatory to associate a Security Group. This association helps in limiting the type of traffic to the Load balancer. Why is a Security Group not required for an NLB? Is it not a security risk? I know the best guess here could be - "AWS designed it this way" but their documentation does not seem to explain the reasoning / advantage on omitting security group configuration for NLB.
NLB is not an exception. NAT gateway also does not have SGs.
The major difference between ALB, CLB and NLB (and NAT) is that their network interfaces (ENI) have different Source/dest. check setting.
For ALB and CLB, the Source/dest. check is true. For NLB and NAT gateway, the option is false. Although I don't know the technical reasons why there are no SGs for NLB and NAT, I think a part of the reason could be due to the Source/dest. check settings:
Indicates whether source/destination checks are performed, where the instance must be the source or destination of any traffic it sends or receives.
Thus, in my view the reason is due to intended purpose of NAT and NLB, rather than a technical inability of AWS to provide SGs on them. Their main purpose is to act as a proxy. NLB nor NAT generally do not interfere with the traffic, and mostly just pass it through. Its up to the destinations to determine if the traffic is allowed or not. Thus NAT nor NLB don't use SGs. They only way to block incoming traffic to them is through NACLs.
In contrast, ALB and CLB take active part in the transfer of traffic as they inspect all requests. Therefore, they also have ability to decide whether the traffic is allowed or not.
I guess a security group is not required for a Network Load Balancer (NLB) because it behaves transparently by preserving the source IP for the associated target instances. That is, you can still specify security groups - but at the target level directly instead of the load balancer. So conceptually, it does not make much of a difference (when using EC2 instances behind an NLB) where the SGs are specified. Although, some people point out it might be tricky to restrict the IP range for the NLB health check. [1] Moreover, I think it might be more convenient to specify security group rules once (centrally) at the load balancer instead of attaching a specific security group to each EC2 instance which is a target of an NLB. These two can be seen as shortcomings of the NLB compared to the other two load balancers.
Technically, the NLB is built on a completely new technology compared to the ALB/CLB. Some of the differences are pointed out on reddit by an AWS employee [2]:
At a high level, Classic (CLB) and Application (ALB) Load Balancers are a collection of load balancing resources connected to your VPC by a collection of Elastic Network Interfaces (ENIs). They have listeners that accept requests from clients and route them to your targets (ALB & NLB) / backends (CLB). In the same vein, a Network Load Balancer (NLB) is a similar grouping of load balancing resources connected to your VPC, but using an AWS Hyperplane ENI, instead of a regular ENI. A Hyperplane ENI is a distributed construct that integrates with EC2's Software Defined Network (SDN) to transparently connect multiple underlying load balancing resources via a single IP address.
Everyone who did not hear the term Hyperplane before, feel free to check out the corresponding re:Invent session. [3] Hyperplane is used for NAT Gateway, PrivateLink and Lambda's improved VPC Networking [4].
Given how much Hyperplane is capable to do and also given the fact that it is built on EC2, I see no reason why AWS could not have implemented SGs for NLBs if they wanted to. I agree with #Marcin that this is probably by design.
[1] https://forums.aws.amazon.com/thread.jspa?threadID=263245
[2] https://www.reddit.com/r/aws/comments/cwbkw4/behind_the_scenes_what_is_an_aws_load_balancer/#t1_eyb2gji
[3] https://www.youtube.com/watch?v=8gc2DgBqo9U#t=33m40s
[4] https://aws.amazon.com/de/blogs/compute/announcing-improved-vpc-networking-for-aws-lambda-functions/
NLB works at the fourth layer of the OSI model, the communication goes through the network load balancer, and the connection details reach to target, in this case, the EC2 instances receives the client IP and the instance security group have to allow source client's IPs.
ALB works at the seventh layer of the OSI model, the communication reach to ALB listener and then it opens a connection to targets, the EC2 instance receives the ALB IPs instead of clients IPs
For more details,
https://docs.aws.amazon.com/elasticloadbalancing/latest/network/introduction.html
https://docs.aws.amazon.com/elasticloadbalancing/latest/network/target-group-register-targets.html

AWS EC2 > IGW Outbound Traffic Filtering by Domain or URL

I have EC2 Instance with Windows Server, i'm using it only with RDP. Can I somehow block outbound traffic from browser to specific domain (eg. abc.example.com) or URL? I'd rather do it in the AWS Dashboard so that RDP users do not have access to whitelist this domain/URL.
How can I achieve this? Thanks!
There is not a native AWS solution for this, of course you could allow only specific IP addresses. The problem with this is for domains these may change, when you look at Cloud you can see that many services (such as load balancers and CDNs) will be changing their IP address.
The ideal solutions is that you would deploy a software (running on EC2) that is able to filter outbound traffic based on domain name. There are solutions on the AWS Marketplace, as well as filtering using a dumb proxy.
The network setup would involve you creating a number of subnets containing the EC2 instances. These would have a route table forwarding all traffic (0.0.0.0/0) to a NAT.
Then for all applications that need to have their outbound traffic filtered they would update their route table to route all traffic (0.0.0.0/0) to the ENI of one of the filtering hosts (ideally in the same AZ).
More information: https://aws.amazon.com/blogs/security/how-to-set-up-an-outbound-vpc-proxy-with-domain-whitelisting-and-content-filtering/
You can configure Access Control Lists (ACL) and Security Groups to filter outbound traffic. However, both of those tools only allow filtering based on IP address, not domain.
If you are confident that the IP addresses won't change, then you can configure these services. If you aren't interested in maintaining a blacklist, then you might need to check on some OS-level limits.
The simplest and easiest way is to implement an Aviatrix FQDN egress filter. It just serves the purpose from a centralized user interface to whitelist/blacklist the URLs in every VPC.
Next Generation Firewall (NGFW) implementation, just to achieve URL / FQDN filtering is an overkill, esp. from the cost point of view whereas proxy implementation has its complexity and doesn't provide centralized control, every VPC has to be managed separately.
The easiest way is to get an Aviatrix launch partner like SDxWORx, enable it with discounted PAYG pricing.
https://aws.amazon.com/marketplace/pp/prodview-laruhupdkcpuy/

AWS Workspace Security Group Egress Requirments

I need to restrict some workspaces internet access to approved IPs. The easiest (according to my understanding) would be to modify the d-xxxxxxxxxx_workspacesMembers security group Outbound rules. To test I just removed all Outbound rules (meaning no outbound access), but it seems like the workspace won't start up.
The short question is, where can I find a list of outbound access requirements so that I can whitelist them? All I can find are client internet requirements: https://docs.aws.amazon.com/workspaces/latest/adminguide/workspaces-port-requirements.html
The longer question is what is the best way to restrict outbound access? I'm not opposed to something like a squid proxy, but our requirements aren't that complex, a simple IP whitelist would be fine.
From my point of view, the right approach would be to use a Firewall Appliance or the AWS Network Firewall (or maybe a endpoint protection) to control the traffic.
From: Security Groups for Your WorkSpaces
Do not modify or delete the _controllers and the _workspacesMembers security groups. If you modify or delete these security groups, your WorkSpaces won't function correctly, and you won't be able to recreate these groups and add them back.
Alternatively also windows firewall rules rolled out via GPO should work, but not the best approach from my point of view.

AWS Trusted Advisor and ephemeral ports

I get "Action recommended" (Red !) on running AWS Trusted Advisor when I open ephemeral ports (1024-65535) in Security Group to allow communication between ALB and EC2 Container service. Is it something I should be worried about or not to trust AWS Trusted Advisor?
Original Answer
Security groups are stateful, meaning that traffic initiated from the instance to another source will have all return traffic related to that outbound request (ie. ephemeral ports) allowed. It's really the NACL in VPCs where you have to actually allow ephemeral traffic as it's not stateful and doesn't understand return traffic like security groups do.
That said for ALB -> instance traffic you won't need to open those ports in the sec group because the sec group will allow traffic initiated from within the ALB (to the instance) and related ephemeral port traffic coming back to the ALB.
Your instances will simply need whatever port that's being checked (port 80/8080/etc.) since it's traffic coming from the outside. However it doesn't need anything for allowing traffic outbound to the ALB ephemeral ports since those are initiated from inside the instance as well as being attached to the incoming port allowed traffic.
Edit:
After a lot of working around with an EC2 instance to try and explain this I found a few faults in the original explanation. I'll leave the original explanation here as I think it's important to know mistakes happen.
At any rate, let's go for the more in depth answer here.
NACL (Network Access Control Lists)
These are stateless firewalls. Basically it has no idea that the outgoing ephemeral port traffic is related to the incoming HTTP traffic. It's also a priority type system. Basically you number your rules in the order you want them to be evaluated by, lowest to highest. The moment it hits a rule that matches the traffic it applies it. You can also explicitly deny traffic.
The main disadvantage here is that NACL only allows 20 rules each way (for a total of 40 rules) whereas security groups allow you 50 rules each way (for a total of 100 rules). That said, if you start to run out of security group rules for whatever reason it's always possible to take common traffic rules and apply them to the NACL. NACLs would also be something to consider in high compliance environments where you absolutely must block certain traffic as explicit DENY rules are possible versus Security Groups which are exclusively permissive rules.
Security Groups
Security groups, unlike NACL can only have permissive effect rules. DENY is simply the lack of a permissive role. However, under certain circumstances explained below security groups will track traffic and automatically add a rule for permitting traffic in the other direction.
Security groups by default have a rule that allows all outbound traffic. The idea here is that if it's initiated from your instance a good majority of the use cases it's okay. Now if a hacker theoretically gets access to the system through a service exploit then they would now have the ability to have outbound traffic pretty much wherever they want.
What you could do here is remove the outbound traffic rule in your security group. In this case you would have the following:
Traffic originating from the instance would be denied
If an incoming rule was accepted, outbound traffic would be allowed regardless of the lack of outbound rules
If an outbound rule was added (say port 80) than a call out from the instance to an external server on port 80 would be allowed. Traffic related to that port that was incoming would also be allowed.
Security Groups also track connections (which is why they are called stateful) to allow traffic from the other direction related automatically. However it only tracks this if traffic would otherwise be denied.
For example if you didn't remove the outbound rule that allows all access, the security group would have no need to be stateful as there's no need to add rules. It does however need to be stateful when the traffic would otherwise not be allowed. There's no real solid documentation I can find on how it does it, but I theorize that it's around the three way TCP handshake. Essential it starts allowing traffic in the other direction when a SYN comes in or goes out to an allowed port. Then it fully tracks when the rest of the handshake (SYN+ACK -> ACK) is completed. When connection close related packets come then it potentially removes the tracking.
With this in mind it's best that you be more permissive with outgoing traffic if possible when dealing with high capacity front facing services, as I can imagine the tracking starting to slow things down to a noticeable speed.
Recommendations
Kill the NACL rules and just allow all traffic in and out. Let the stateful security groups handle things for you.
Put the instances behind the ALB in private subnets. That will block outside traffic since there will be no route.
However you'll want a NAT Gateway that lets your private instances reach out to the internet for important things like getting package updates from distro servers.
Security group for backend instances: allow whatever port the ELB expects inbound traffic. Allow all outbound traffic.
Security group for ALB: Allow inbound traffic for whatever port (80 or 443 I would assume) and allow all outbound traffic.
Create what's called a bastion instance. It's simply an EC2 instance that only allows SSH (or RDP for windows instances). You use this as your gateway to login to private subnet instances. This should allow all outbound traffic in the security group, and allow SSH traffic inward only to your IPs that should be authorized to access it. This is very important because if you don't restrict IPs random bots scanning the Amazon public IP space (usually from China or Russia which have a huge IP space) and randomly trying to connect to port 22. You just don't want to deal with that especially since the possibility of a remote login exploit is always greater than 0%.