I get "Action recommended" (Red !) on running AWS Trusted Advisor when I open ephemeral ports (1024-65535) in Security Group to allow communication between ALB and EC2 Container service. Is it something I should be worried about or not to trust AWS Trusted Advisor?
Original Answer
Security groups are stateful, meaning that traffic initiated from the instance to another source will have all return traffic related to that outbound request (ie. ephemeral ports) allowed. It's really the NACL in VPCs where you have to actually allow ephemeral traffic as it's not stateful and doesn't understand return traffic like security groups do.
That said for ALB -> instance traffic you won't need to open those ports in the sec group because the sec group will allow traffic initiated from within the ALB (to the instance) and related ephemeral port traffic coming back to the ALB.
Your instances will simply need whatever port that's being checked (port 80/8080/etc.) since it's traffic coming from the outside. However it doesn't need anything for allowing traffic outbound to the ALB ephemeral ports since those are initiated from inside the instance as well as being attached to the incoming port allowed traffic.
Edit:
After a lot of working around with an EC2 instance to try and explain this I found a few faults in the original explanation. I'll leave the original explanation here as I think it's important to know mistakes happen.
At any rate, let's go for the more in depth answer here.
NACL (Network Access Control Lists)
These are stateless firewalls. Basically it has no idea that the outgoing ephemeral port traffic is related to the incoming HTTP traffic. It's also a priority type system. Basically you number your rules in the order you want them to be evaluated by, lowest to highest. The moment it hits a rule that matches the traffic it applies it. You can also explicitly deny traffic.
The main disadvantage here is that NACL only allows 20 rules each way (for a total of 40 rules) whereas security groups allow you 50 rules each way (for a total of 100 rules). That said, if you start to run out of security group rules for whatever reason it's always possible to take common traffic rules and apply them to the NACL. NACLs would also be something to consider in high compliance environments where you absolutely must block certain traffic as explicit DENY rules are possible versus Security Groups which are exclusively permissive rules.
Security Groups
Security groups, unlike NACL can only have permissive effect rules. DENY is simply the lack of a permissive role. However, under certain circumstances explained below security groups will track traffic and automatically add a rule for permitting traffic in the other direction.
Security groups by default have a rule that allows all outbound traffic. The idea here is that if it's initiated from your instance a good majority of the use cases it's okay. Now if a hacker theoretically gets access to the system through a service exploit then they would now have the ability to have outbound traffic pretty much wherever they want.
What you could do here is remove the outbound traffic rule in your security group. In this case you would have the following:
Traffic originating from the instance would be denied
If an incoming rule was accepted, outbound traffic would be allowed regardless of the lack of outbound rules
If an outbound rule was added (say port 80) than a call out from the instance to an external server on port 80 would be allowed. Traffic related to that port that was incoming would also be allowed.
Security Groups also track connections (which is why they are called stateful) to allow traffic from the other direction related automatically. However it only tracks this if traffic would otherwise be denied.
For example if you didn't remove the outbound rule that allows all access, the security group would have no need to be stateful as there's no need to add rules. It does however need to be stateful when the traffic would otherwise not be allowed. There's no real solid documentation I can find on how it does it, but I theorize that it's around the three way TCP handshake. Essential it starts allowing traffic in the other direction when a SYN comes in or goes out to an allowed port. Then it fully tracks when the rest of the handshake (SYN+ACK -> ACK) is completed. When connection close related packets come then it potentially removes the tracking.
With this in mind it's best that you be more permissive with outgoing traffic if possible when dealing with high capacity front facing services, as I can imagine the tracking starting to slow things down to a noticeable speed.
Recommendations
Kill the NACL rules and just allow all traffic in and out. Let the stateful security groups handle things for you.
Put the instances behind the ALB in private subnets. That will block outside traffic since there will be no route.
However you'll want a NAT Gateway that lets your private instances reach out to the internet for important things like getting package updates from distro servers.
Security group for backend instances: allow whatever port the ELB expects inbound traffic. Allow all outbound traffic.
Security group for ALB: Allow inbound traffic for whatever port (80 or 443 I would assume) and allow all outbound traffic.
Create what's called a bastion instance. It's simply an EC2 instance that only allows SSH (or RDP for windows instances). You use this as your gateway to login to private subnet instances. This should allow all outbound traffic in the security group, and allow SSH traffic inward only to your IPs that should be authorized to access it. This is very important because if you don't restrict IPs random bots scanning the Amazon public IP space (usually from China or Russia which have a huge IP space) and randomly trying to connect to port 22. You just don't want to deal with that especially since the possibility of a remote login exploit is always greater than 0%.
Related
Here is my security group, inbound and outbound rules for the EC2 instance in AWS.
My understanding was that if I block every outbound traffic i will not be able to able to ssh into the system even if the inbound connection is allowed.
I did go through many documentation on it and did not really understand how the system is sending back data to ssh connection when the outbound rules are not allowing it.
Does this mean, a web server will still work without any outbound rules, provided ports for inbound, let's say 80, 443 are opened ?
The SSH connection is still working because security groups are stateful which means that if a connection/traffic can get inside, it can go outside. NACLs on the other hand are stateless which means that the challenge/test happens on entry and exit of traffic.
So right now I have 4 subnets per availability zone: The internet facing "entrypoint" subnet (associated with a load balancer), the generic "service" subnet for internal computation, the "database" subnet for all things data related, and the "external request" subnet for making requests out to the internet. This defines essentially 4 classes of EC2 instances.
I am supposed to now create security groups for these 4 classes of EC2 instances. What I'm wondering is how to do that correctly (I am using terraform).
Can I create 1 security group for "ingress" (incoming) traffic, and a 2nd security group for "egress" (outgoing) traffic, for each class, for each connection type?
So basically, I want this. I want the internet entrypoint to talk to the service. The service can only respond to requests from the internet, it doesn't make any external internet requests itself. The service can talk to the database and the external requesting class. The database can only talk to the service, and the external request can only respond back to the service. The entrypoint can come in as HTTP or HTTPS (or websockets, is that just HTTPS?). It comes in on port 443. This is the load balancer. It then converts the request to HTTP and connects to the compute with port 3000. Should I have a separate port for each different connection type? (So the service layer would have 1 port for the database to respond to like 4000, 1 port for the external request layer to respond to like 5000, etc.). Or does that part matter? Lets say we have the ports thing though.
sg1 (security group 1): ingress 443 -> 3000 (load balancer -> service)
sg2: egress 3000 -> internet? is that 0.0.0.0/0? I don't want it to make free requests out, only to connected clients.
sg3: ingress 3000 -> 4000 (service -> database), specifying the database subnet
sg3: egress 4000 -> 3000 (database -> service), specifying the service subnet, etc.
Am I on the right track? I am new to this and trying to figure it out. Any guidance would be much appreciated, I've been reading the AWS docs for the past week but there's little in terms of best practices and architecture.
You can specify upto 5 individual security groups per ENI (Elastic Network Interface). All available rules are evaluated whenever either the inbound or outbound ingress rule is established.
Regarding communication, security group rules establish a tunnel (allowing stateful communication) during any network communication allowing bi-directional communication as long as the initial connection was allowed by the security group.
Security groups are stateful — if you send a request from your instance, the response traffic for that request is allowed to flow in regardless of inbound security group rules. Responses to allowed inbound traffic are allowed to flow out, regardless of outbound rules.
For example:
Inbound rule allows SSH on port 22 from a specific IP address, no outbound rules for port 22 exist. A user can safely SSH to the server with no connection issues, but is unable to SSH to another server. Add outbound rules if the server should be able to speak outbound, by default it will be allow all.
From this above example this means if you allow no outbound rules for HTTP/HTTPS only inbound connections over HTTP/HTTPS will allow it to return. Also be aware for patching that you will not be able to download from the internet.
Regarding the source, perhaps rather than specifying subnets you can reference the logical security group name instead. This would mean if a resource in any subnet has that security group attached the target resource would allow inbound access (this only works if the connection is private host to private host).
The source of the traffic and the destination port or port range. The source can be another security group, an IPv4 or IPv6 CIDR block, a single IPv4 or IPv6 address, or a prefix list ID.
I would recommend trying to keep the resource realm within a single security group (i.e. DB server all in a single security group) primarily to reduce the overhead of management.
More information is available at the Security groups for your VPC page.
From what I read, stateless firewalls are used more for packet filtering. Why is AWS NACL stateless?
NACLs force too big a range of ports to be opened for the ephemeral ports.
Is there a way to create stateful firewalls on AWS other than Security Groups? Security Groups feel too granular and may get omitted by mistake.
Network Access Control Lists (ACLs) mimic traditional firewalls implemented on hardware routers. Such routers are used to separate subnets and allow the creation of separate zones, such as a DMZ. They purely filter based upon the content of the packet. That is their job.
Security Groups are an added capability in AWS that provides firewall-like capabilities at the resource level. (To be accurate, they are attached to Elastic Network Interfaces, ENIs). They are stateful, meaning that they allow return traffic to flow.
In general, the recommendation is to leave NACLs at their default settings (allow all traffic IN & OUT). They should only be changed if there is a specific need to block certain types of traffic at the subnet level.
Security Groups are the ideal way to control stateful traffic going in and out of a VPC-attached resource. They are THE way to create stateful firewalls. There is no other such capability provided by a VPC. If you wanted something different, you could route traffic through an Amazon EC2 instance acting as a NAT and then you would have full control over how it behaves.
NACl is stateless. That means by default access is denied at inbound and outbound. If you allow some traffic (TCP or other) inbound, outbound has to be explicitly allowed (of course if you want that).
We are seeing extensive clock drift on our EC2 instances, to the point where various services are being affected. Elastic Beanstalk eventually rules long-running instances as unhealthy, citing clock drift and lack of NTP syncing. Why is this happening?
Related reading:
[1] http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html#configure_ntp
[2] http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/vpc.html
"Ensure that your VPC security groups and network ACLs allow outbound UDP traffic on port 123 to avoid these issues."
Despite what the docs say, if you are locking down your inbound ACL, there is one more step. While the AWS security groups are stateful (that is, an allowed outbound connection on a given port, will also be allowed back in, without explicitly allowing said port on the inbound rules), the ACLs are not stateful. This means that if you lock down your ACLs on the inbound, you must also allow inbound UDP connections on port 123.
You can edit the /etc/ntp.conf file, and use NIST IP addresses from http://tf.nist.gov/tf-cgi/servers.cgi, instead of DNS names. Then in your ACL, you can lock down the inbound UDP rules to these IP addresses (for example: 216.229.0.179/32).
That should do it.
I am seeking some guidance on the best approach to take with EC2 security groups and services with dynamic IP's. I want to make use of services such as SendGrid, Elastic Cloud etc which all use dyanmic IP's over port 80/443. However access to Port 80/443 is closed with the exception of whitelisted IPs. So far the solutions I have found are:
CRON Job to ping the service, take IP's and update EC2 Security Group via EC2 API.
Create a new EC2 to act as a proxy with port 80/443 open. New server communicates with Sendgrid/ElasticCloud, inspects responses and returns parts to main server.
Are there any other better solutions?
Firstly, please bear in mind that security groups in AWS are stateful, meaning that, for example, if you open ports 80 and 443 to all destinations (0.0.0.0/0) in your outbound rules, your EC2 machines will be able to connect to remote hosts and get the response back even if there are no inbound rules for a given IP.
However, this approach works only if the connection is always initiated by your EC2 instance and remote services are just responding. If you require the connections to your EC2 instances to be initiated from the outside, you do need to specify inbound rules in security group(s). If you know a CIDR block of their public IP addresses, that can solve the problem as you can specify it as a destination in security group rule. If you don't know IP range of the hosts that are going to reach your machines, then access restriction at network level is not feasible and you need to implement some form of authorisation of the requester.
P.S. Please also bear in mind that there is a soft default limit of 50 inbound or outbound rules per security group.