Getting NIST NTP working in AWS EC2 with locked down ACLs - amazon-web-services

We are seeing extensive clock drift on our EC2 instances, to the point where various services are being affected. Elastic Beanstalk eventually rules long-running instances as unhealthy, citing clock drift and lack of NTP syncing. Why is this happening?

Related reading:
[1] http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html#configure_ntp
[2] http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/vpc.html
"Ensure that your VPC security groups and network ACLs allow outbound UDP traffic on port 123 to avoid these issues."
Despite what the docs say, if you are locking down your inbound ACL, there is one more step. While the AWS security groups are stateful (that is, an allowed outbound connection on a given port, will also be allowed back in, without explicitly allowing said port on the inbound rules), the ACLs are not stateful. This means that if you lock down your ACLs on the inbound, you must also allow inbound UDP connections on port 123.
You can edit the /etc/ntp.conf file, and use NIST IP addresses from http://tf.nist.gov/tf-cgi/servers.cgi, instead of DNS names. Then in your ACL, you can lock down the inbound UDP rules to these IP addresses (for example: 216.229.0.179/32).
That should do it.

Related

AWS Security Group rules: How does ssh connection to EC2 still works when I have removed outbound rules

Here is my security group, inbound and outbound rules for the EC2 instance in AWS.
My understanding was that if I block every outbound traffic i will not be able to able to ssh into the system even if the inbound connection is allowed.
I did go through many documentation on it and did not really understand how the system is sending back data to ssh connection when the outbound rules are not allowing it.
Does this mean, a web server will still work without any outbound rules, provided ports for inbound, let's say 80, 443 are opened ?
The SSH connection is still working because security groups are stateful which means that if a connection/traffic can get inside, it can go outside. NACLs on the other hand are stateless which means that the challenge/test happens on entry and exit of traffic.

How to properly create security groups for instance classes in AWS?

So right now I have 4 subnets per availability zone: The internet facing "entrypoint" subnet (associated with a load balancer), the generic "service" subnet for internal computation, the "database" subnet for all things data related, and the "external request" subnet for making requests out to the internet. This defines essentially 4 classes of EC2 instances.
I am supposed to now create security groups for these 4 classes of EC2 instances. What I'm wondering is how to do that correctly (I am using terraform).
Can I create 1 security group for "ingress" (incoming) traffic, and a 2nd security group for "egress" (outgoing) traffic, for each class, for each connection type?
So basically, I want this. I want the internet entrypoint to talk to the service. The service can only respond to requests from the internet, it doesn't make any external internet requests itself. The service can talk to the database and the external requesting class. The database can only talk to the service, and the external request can only respond back to the service. The entrypoint can come in as HTTP or HTTPS (or websockets, is that just HTTPS?). It comes in on port 443. This is the load balancer. It then converts the request to HTTP and connects to the compute with port 3000. Should I have a separate port for each different connection type? (So the service layer would have 1 port for the database to respond to like 4000, 1 port for the external request layer to respond to like 5000, etc.). Or does that part matter? Lets say we have the ports thing though.
sg1 (security group 1): ingress 443 -> 3000 (load balancer -> service)
sg2: egress 3000 -> internet? is that 0.0.0.0/0? I don't want it to make free requests out, only to connected clients.
sg3: ingress 3000 -> 4000 (service -> database), specifying the database subnet
sg3: egress 4000 -> 3000 (database -> service), specifying the service subnet, etc.
Am I on the right track? I am new to this and trying to figure it out. Any guidance would be much appreciated, I've been reading the AWS docs for the past week but there's little in terms of best practices and architecture.
You can specify upto 5 individual security groups per ENI (Elastic Network Interface). All available rules are evaluated whenever either the inbound or outbound ingress rule is established.
Regarding communication, security group rules establish a tunnel (allowing stateful communication) during any network communication allowing bi-directional communication as long as the initial connection was allowed by the security group.
Security groups are stateful — if you send a request from your instance, the response traffic for that request is allowed to flow in regardless of inbound security group rules. Responses to allowed inbound traffic are allowed to flow out, regardless of outbound rules.
For example:
Inbound rule allows SSH on port 22 from a specific IP address, no outbound rules for port 22 exist. A user can safely SSH to the server with no connection issues, but is unable to SSH to another server. Add outbound rules if the server should be able to speak outbound, by default it will be allow all.
From this above example this means if you allow no outbound rules for HTTP/HTTPS only inbound connections over HTTP/HTTPS will allow it to return. Also be aware for patching that you will not be able to download from the internet.
Regarding the source, perhaps rather than specifying subnets you can reference the logical security group name instead. This would mean if a resource in any subnet has that security group attached the target resource would allow inbound access (this only works if the connection is private host to private host).
The source of the traffic and the destination port or port range. The source can be another security group, an IPv4 or IPv6 CIDR block, a single IPv4 or IPv6 address, or a prefix list ID.
I would recommend trying to keep the resource realm within a single security group (i.e. DB server all in a single security group) primarily to reduce the overhead of management.
More information is available at the Security groups for your VPC page.

What does Outbound traffic mean for an AWS Security Group assigned to an AWS RDS instance?

Pressing "Launch DB Instance" in the AWS RDS management console is the equivalent of launching a server daemon, if one were to do-it-oneself.
The console also has a setting to associate a "Security Group" to the DB Instance.
The Security Group itself has settings for "Inbound" and for "Outbound" traffic.
The "Inbound" traffic means requests to the server originating from some client somewhere.
What does "Outbound" traffic mean? Are these simply the responses of the db server? In that case, wouldn't it make sense for Inbound and Outbound to always have the same port range and IP addresses?
Relation to previous questions:
This RDS instance is to be coupled with an ElasticBeanstalk instance, not a VPC.
No, outbound traffic rules doesn't affect the responses coming from DB server for external requests(e.g Query, Update, Write & etc.) since security groups are stateful:
Security groups are stateful — if you send a request from your
instance, the response traffic for that request is allowed to flow in
regardless of inbound security group rules. Responses to allowed
inbound traffic are allowed to flow out, regardless of outbound rules.
Outbound traffic rules in Security Group is used for purposes like downloading patches from external sources for the DB Server in RDS.
AWS security groups are stateful which means you do not need to open the outbound for responses - open only inbound for requests. If you think your instances will be sending requests to certain IPs (for example: to upgrade/install a package), then you need to open the IP/port for that request.

AWS Trusted Advisor and ephemeral ports

I get "Action recommended" (Red !) on running AWS Trusted Advisor when I open ephemeral ports (1024-65535) in Security Group to allow communication between ALB and EC2 Container service. Is it something I should be worried about or not to trust AWS Trusted Advisor?
Original Answer
Security groups are stateful, meaning that traffic initiated from the instance to another source will have all return traffic related to that outbound request (ie. ephemeral ports) allowed. It's really the NACL in VPCs where you have to actually allow ephemeral traffic as it's not stateful and doesn't understand return traffic like security groups do.
That said for ALB -> instance traffic you won't need to open those ports in the sec group because the sec group will allow traffic initiated from within the ALB (to the instance) and related ephemeral port traffic coming back to the ALB.
Your instances will simply need whatever port that's being checked (port 80/8080/etc.) since it's traffic coming from the outside. However it doesn't need anything for allowing traffic outbound to the ALB ephemeral ports since those are initiated from inside the instance as well as being attached to the incoming port allowed traffic.
Edit:
After a lot of working around with an EC2 instance to try and explain this I found a few faults in the original explanation. I'll leave the original explanation here as I think it's important to know mistakes happen.
At any rate, let's go for the more in depth answer here.
NACL (Network Access Control Lists)
These are stateless firewalls. Basically it has no idea that the outgoing ephemeral port traffic is related to the incoming HTTP traffic. It's also a priority type system. Basically you number your rules in the order you want them to be evaluated by, lowest to highest. The moment it hits a rule that matches the traffic it applies it. You can also explicitly deny traffic.
The main disadvantage here is that NACL only allows 20 rules each way (for a total of 40 rules) whereas security groups allow you 50 rules each way (for a total of 100 rules). That said, if you start to run out of security group rules for whatever reason it's always possible to take common traffic rules and apply them to the NACL. NACLs would also be something to consider in high compliance environments where you absolutely must block certain traffic as explicit DENY rules are possible versus Security Groups which are exclusively permissive rules.
Security Groups
Security groups, unlike NACL can only have permissive effect rules. DENY is simply the lack of a permissive role. However, under certain circumstances explained below security groups will track traffic and automatically add a rule for permitting traffic in the other direction.
Security groups by default have a rule that allows all outbound traffic. The idea here is that if it's initiated from your instance a good majority of the use cases it's okay. Now if a hacker theoretically gets access to the system through a service exploit then they would now have the ability to have outbound traffic pretty much wherever they want.
What you could do here is remove the outbound traffic rule in your security group. In this case you would have the following:
Traffic originating from the instance would be denied
If an incoming rule was accepted, outbound traffic would be allowed regardless of the lack of outbound rules
If an outbound rule was added (say port 80) than a call out from the instance to an external server on port 80 would be allowed. Traffic related to that port that was incoming would also be allowed.
Security Groups also track connections (which is why they are called stateful) to allow traffic from the other direction related automatically. However it only tracks this if traffic would otherwise be denied.
For example if you didn't remove the outbound rule that allows all access, the security group would have no need to be stateful as there's no need to add rules. It does however need to be stateful when the traffic would otherwise not be allowed. There's no real solid documentation I can find on how it does it, but I theorize that it's around the three way TCP handshake. Essential it starts allowing traffic in the other direction when a SYN comes in or goes out to an allowed port. Then it fully tracks when the rest of the handshake (SYN+ACK -> ACK) is completed. When connection close related packets come then it potentially removes the tracking.
With this in mind it's best that you be more permissive with outgoing traffic if possible when dealing with high capacity front facing services, as I can imagine the tracking starting to slow things down to a noticeable speed.
Recommendations
Kill the NACL rules and just allow all traffic in and out. Let the stateful security groups handle things for you.
Put the instances behind the ALB in private subnets. That will block outside traffic since there will be no route.
However you'll want a NAT Gateway that lets your private instances reach out to the internet for important things like getting package updates from distro servers.
Security group for backend instances: allow whatever port the ELB expects inbound traffic. Allow all outbound traffic.
Security group for ALB: Allow inbound traffic for whatever port (80 or 443 I would assume) and allow all outbound traffic.
Create what's called a bastion instance. It's simply an EC2 instance that only allows SSH (or RDP for windows instances). You use this as your gateway to login to private subnet instances. This should allow all outbound traffic in the security group, and allow SSH traffic inward only to your IPs that should be authorized to access it. This is very important because if you don't restrict IPs random bots scanning the Amazon public IP space (usually from China or Russia which have a huge IP space) and randomly trying to connect to port 22. You just don't want to deal with that especially since the possibility of a remote login exploit is always greater than 0%.

AWS Security Group for RDS - Outbound rules

I have a security group assigned to an RDS instance which allows port 5432 traffic from our EC2 instances.
However, this security group has all outbound traffic enabled for all traffic for all IP's.
Is this a security risk? What should be the ideal outbound security rule?
In my perspective, the outbound traffic for the RDS security group should be limited to port 5432 to our EC2 instances, is this right?
What should be the ideal outbound security rule? In my perspective, the outbound traffic for the RDS security group should be limited to port 5432 to our EC2 instances, is this right?
It is a good idea to have a clear control over outbound connections as well.
In your RDS group: delete all outbound rules (by default, there is rule that allows outbound connections to all ports and IP's -> just delete this "all-anywhere" rule).
Your DB will receive inbound requests through port 5432 from your EC2 instance, and RDS will respond back to your EC2 instance through the very same connection, no outbound rules need to be defined in this case at all.
By default, all Amazon EC2 security groups:
Deny all inbound traffic
Allow all outbound traffic
You must configure the security group to permit inbound traffic. Such configuration should be limited to the minimal possible scope. That is, the fewest protocols necessary and smallest IP address ranges necessary.
Outbound access, however, is traditionally kept open. The reason for this is that you would normally "trust" your own systems. If they wish to access external resources, let them do so.
You are always welcome to restrict Outbound access, especially for sensitive systems. However, determining which ports to keep open may be a challenge. For example, instances may want to download Operating System updates, access Amazon S3 or send emails.
When using Security Goups (as opposed to ACL rules) all inbound traffic is automatically allowed in outbound traffic so outbound rules may be empty in your case.
Is this a security risk? What should be the ideal outbound security
rule? In my perspective, the outbound traffic for the RDS security
group should be limited to port 5432 to our EC2 instances, is this
right?
It's a risk only if you RDS is in a public subnet inside your VPC.
Best practices recommend in your scenario to have a public subnet within your web server and a private subnet for all private resources (RDS, other private services, etc).
As you can see in the image, hosting your RDS inside a private subnet there is no way to access it from outside your VPC