aws vpc endpoints - how it works? - amazon-web-services

I am trying to understand the concept of how VPC endpoints work and I am not sure that I understand the AWS documentation. For example, I have a private S3 bucket and I have an EKS cluster. So if my bucket is private I believe that traffic from the EKS cluster to S3 does not go through the internet, but only through the AWS network. But in a case my s3 bucket was public, then probably I will need to set up the VPC endpoint, so traffic will not leave the AWS. The same logic I would expect with ECR, if it is private you load images to your EKS through AWS network.
So what is the exact case when you need to use VPC endpoint within your AWS account (not from on-prem or another VPC)?

VPC endpoints are typically used with public AWS services (such as S3, DynamoDB, ECR, etc.) when the client applications are hosted inside your VPC and you do not want to route traffic via public Internet, which would otherwise result in a number of hops to reach the AWS service.
Imagine a situation when you have an app running on an EC2 instance, which is deployed to a private subnet of your VPC (i.e. a Pod in your EKS cluster). This app reads/writes data from/to AWS S3. If you do not use a VPC endpoint, your traffic will first reach your NAT gateway, then your VPC's Internet gateway out to the public Internet. Eventually, it will hit AWS S3. The response will travel back via the same route.
Same thing with ECR (i.e. a new instance of your Kubernetes Pod started by the kubelet). It's better (i.e. quicker) to pick the shortest route to download a Docker image from ECR rather than traverse a number of switches/routers. With a VPC endpoint your traffic will first hit the VPC endpoint (without leaving your private subnet) and then reach e.g. ECR directly (traffic does not leave the Amazon network).
As correctly mentioned by #jarmod, one should differentiate between routing (Layer 3 in the OSI model) and authentication/authorization (Layer 7). For example, you can use a VPC endpoint to reach AWS S3, but not be authorized (or even unauthenticated) to e.g. read a file from an S3 bucket.
Hope this clarifies the idea behind using VPC endpoints.

Related

Is the connection from EC2 to AWS Service (like dynamodb) happening within the AWS Network, or via public internet?

I have VPC with couple of subnets containing EC2 instances.
The EC2 instances have code that invokes various AWS services like dybamodb.
Is the connection from EC2 to AWS Service (like dynamodb) happening within the AWS Network, or via public internet?
Is there any way to control this?
Is the connection from EC2 to AWS Service (like dynamodb) happening within the AWS Network, or via public internet?
Technically the process on EC2 would be hitting the AWS DynamoDB public API which is on the Internet. The traffic would be routed through the Internet Gateway you have attached to the VPC. I think if it is all in the same region it may not actually leave the AWS data center, and you could try testing that via tools like traceroute, but I don't think there are any guarantees of that.
Is there any way to control this?
Yes, add a VPC Endpoint to your VPC for the service you want to connect to. Then the DNS server in your VPC will route all traffic to that service over the VPC Endpoint, instead of routing it to your VPC's Internet Gateway. The traffic will then be guaranteed to stay within the AWS network.

Accessing S3 from inside EKS using boto3

I have a Python application deployed on EKS (Elastic Kubernetes Service). This application saves large files inside an S3 bucket using the AWS SDK for Python (boto3). Both the EKS cluster and the S3 bucket are in the same region.
My question is, how is communication between the two services (EKS and S3) handled by default?
Do both services communicate directly and internally through the Amazon network, or do they communicate externally via the Internet?
If they communicate via the internet, is there a step by step guide on how to establish a direct internal connection between both services?
how is communication between the two services (EKS and S3) handled by default?
By default the network topology of your EKS offers route to the public AWS S3 endpoints.
Do both services communicate directly and internally through the Amazon network, or do they communicate externally via the Internet?
Your cluster needs to have network access to the said public AWS S3 endpoints. Example, worker nodes running in public subnet or the use of NAT gateway in private subnet.
...is there a step by step guide on how to establish a direct internal connection between both services?
You create VPC endpoints for S3 in the VPC that your EKS runs to ensure network communication with S3 stay within AWS network. VPC endpoints for S3 support both interface and gateway type. Try this article to learn about the basic of S3 endpoints, you can use the same method to create endpoints in the VPC where your EKS runs. Request to S3 from your pods will then use the endpoint to reach out to S3 within AWS network.
You can add S3 access to your EKS node IAM role, this link shows you how to add ECR registry access to EKS node IAM role, but it is the same for S3.
The other way is to make environment variables available in your container, see this link, though I would recommend the first way.

Route table for docker hub and vpc endpoints for private hosted instances: AWS

I have a docker image which is just an Java application. The java application reads data from DynamoDB and S3 buckets and outputs something (its a test app). I have hosted the docker images onto public docker-hub repo.
In AWS, i have created private subnet which is hosting an EC2 via AWS ECS. Now to have security high; i am using VPC Endpoints for DynamoDB and S3 bucket operations for the containers.
And i have used NAT Gateway to allow EC2 to pull docker images from docker-hub.
Problem:
When i remove VPC Endpoint, the application is able to read DynamoDB and S3 via NAT. Which means the traffic is going through public network.
Thoughts:
Can not whitelist the Ip addresses of Dockerhub as it can change.
Since AWS ECS handles all the docker pull etc tasks, i do not have control to customize.
I do not want to use AWS container registry. I prefer dockerhub.
DynamoDB/S3 private addresses are not known
Question:
How to make sure that traffic for docker hub should only be allowed via NAT?
How to make sure that the DynamoDB and S3 access should be via Endpoints only?
Thanks for your help
IF you want to restrict outbound traffic over your NAT (by DNS hostname) to DockerHub only you will need a third party solution that can allow or deny outbound traffic before it traverses the internet.
You would install this appliance in a separate subnet which has NAT Gateway access. Then in your existing subnet(s) for ECS you would update the route table to have the 0.0.0.0/0 route speak to this appliance (by specifying its ENI). If you check the AWS marketplace there may be a solution already in place to fulfil the domain filter.
Alternatively you could automate a tool that is able scrape the whitelisted IP addresses for DockerHub, and then have it add these as allow all traffic rules with a NACL. This NACL would only be applied to the subnets that the NAT Gateway resides in.
Regarding your second question, from the VPC point of view by adding the prefix list of the S3 and DynamoDB endpoints to the route table it will forward any requests that hit these API endpoints through the private route.
At this time DynamoDB does not have the ability to prevent public routed interaction, however S3 does. By adding a condition of the VPCE to its bucket policy you can deny any access that tries to interact outside of the listed VPC Endpoint. Be careful not to block yourself access from the console however, by blocking only the specific verbs that you don't want allowed.

Do AWS File Gateway uses S3 endpoint if within VPC?

I am planning to use AWS File Gateway in a hybrid environment where I will mount the File Gateway to an EC2 instance from within a private subnet. As per AWS documentation, all data transfer is done through HTTPS when using File Gateway.
But since my File Gateway, EC2 instance and S3 are all inside the AWS environment, will my File Gateway still transfer files over the internet to S3 service endpoint (s3.amazonaws.com) or will it leverage VPC endpoint for S3?
Note: I cannot use EFS for this purpose as it's not HIPAA complaint.
A VPC Endpoint for S3 uses a predefined IP prefix list in your subnet route tables, which hijacks all of the traffic bound for all of the IP addresses assigned to S3 in your region... so from a subnet associated with an S3 VPC endpoint, all traffic bound for any S3 address in the region is routed through the endpoint.
To state it another way, when correctly configured, an S3 VPC endpoint becomes the only way S3 can be accessed from the associated subnets, and because it's done at the IP routing layer, anything accessing S3 from those subnets will automatically and transparently use the endpoint.
The prefix list ID logically represents the range of public IP addresses used by the service. All instances in subnets associated with the specified route tables automatically use the endpoint to access the service; subnets that are not associated with the specified route tables do not use the endpoint.
http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-endpoints.html
In theory, if you configure your VPC Route Table to use the VPC Endpoint, then any traffic destined for S3 will be sent via the VPC Endpoint. (By the way, it might only work when connecting to S3 in the same region.)
Regardless, even if the traffic is routed through your Internet Gateway to the Amazon S3 endpoint, the traffic will not traverse the real "Internet" -- it will simply pass through the AWS edge of the Internet, never leaving the AWS data center (as long as it is in the same Region).

Can I use AWS ECR from within a private subnet

I have a private subnet inside a VPC, that cannot route to the internet. I'm trying to access amazon ECR, but getting a timeout. My guess is that ECR requires internet connection, however I cannot find any documentation that says that.
Does ECR require internet connection? Is there a way to use it from within a private subnet?
Update 2020
Interface VPC Endpoints are now supported for ECR; meaning now we can configure an endpoint from our private subnet to ECR without a NAT Gateway and still be able to pull images from it.
Documentation: Amazon ECS interface VPC endpoints (AWS PrivateLink)
A private subnet is truly private and only in/out traffic that you specify will be allowed. S3 has VPC Endpoints that allow you to connect to S3 (http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-endpoints.html) without routing through the public internet. VPC endpoint functionality for AWS ECR has been requested (https://forums.aws.amazon.com/thread.jspa?threadID=222124) but to the best of my knowledge it is not yet currently available.
An VPC endpoint for ECR is not available, but requested as the first issue on AWS' container roadmap (created 2018-11-28), implemented as a PrivateLink.
It's in state "Coming soon".
It will cost minimum around 22$/month (PrivateLink costs for 3 availability zones in us-east, without traffic costs), if they don't state it otherwise.