I am planning to use AWS File Gateway in a hybrid environment where I will mount the File Gateway to an EC2 instance from within a private subnet. As per AWS documentation, all data transfer is done through HTTPS when using File Gateway.
But since my File Gateway, EC2 instance and S3 are all inside the AWS environment, will my File Gateway still transfer files over the internet to S3 service endpoint (s3.amazonaws.com) or will it leverage VPC endpoint for S3?
Note: I cannot use EFS for this purpose as it's not HIPAA complaint.
A VPC Endpoint for S3 uses a predefined IP prefix list in your subnet route tables, which hijacks all of the traffic bound for all of the IP addresses assigned to S3 in your region... so from a subnet associated with an S3 VPC endpoint, all traffic bound for any S3 address in the region is routed through the endpoint.
To state it another way, when correctly configured, an S3 VPC endpoint becomes the only way S3 can be accessed from the associated subnets, and because it's done at the IP routing layer, anything accessing S3 from those subnets will automatically and transparently use the endpoint.
The prefix list ID logically represents the range of public IP addresses used by the service. All instances in subnets associated with the specified route tables automatically use the endpoint to access the service; subnets that are not associated with the specified route tables do not use the endpoint.
http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-endpoints.html
In theory, if you configure your VPC Route Table to use the VPC Endpoint, then any traffic destined for S3 will be sent via the VPC Endpoint. (By the way, it might only work when connecting to S3 in the same region.)
Regardless, even if the traffic is routed through your Internet Gateway to the Amazon S3 endpoint, the traffic will not traverse the real "Internet" -- it will simply pass through the AWS edge of the Internet, never leaving the AWS data center (as long as it is in the same Region).
Related
I have a container running in an EC2 instance on ECS. The container is hosting a django based application that utilizes S3 and RDS for its file storage and db needs respectively. I have appropriately configured my VPC, Subnets, VPC endpoints, Internet Gateway, roles, security groups, and other parameters such that I am able to host the site, connect to the RDS instance, and I can even access the site.
The issue is with the connection to S3. When I try to run the command python manage.py collectstatic --no-input which should upload/update any new/modified files to S3 as part of the application set up the program hangs and will not continue. No files are transferred to the already set up S3 bucket.
Details of the set up:
All of the below is hosted on AWS Gov Cloud
VPC and Subnets
1 VPC located in Gov Cloud East with 2 availability zones (AZ) and one private and public subnet in each AZ (4 total subnets)
The 3 default routing tables (1 for each private subnet, and 1 for the two public subnets together)
DNS hostnames and DNS resolution are both enabled
VPC Endpoints
All endpoints have the "vpce-sg" security group attached and are associated to the above vpc
s3 gateway endpoint (set up to use the two private subnet routing tables)
ecr-api interface endpoint
ecr-dkr interface endpoint
ecs-agetn interface endpoint
ecs interface endpoint
ecs-telemetry interface endpoint
logs interface endpoint
rds interface endpoint
Security Groups
Elastic Load Balancer Security Group (elb-sg)
Used for the elastic load balancer
Only allows inbound traffic from my local IP
No outbound restrictions
ECS Security Group (ecs-sg)
Used for the EC2 instance in ECS
Allows all traffic from the elb-sg
Allows http:80, https:443 from vpce-sg for s3
Allows postgresql:5432 from vpce-sg for rds
No outbound restrictions
VPC Endpoints Security Group (vpce-sg)
Used for all vpc endpoints
Allows http:80, https:443 from ecs-sg for s3
Allows postgresql:5432 from ecs-sg for rds
No outbound restrictions
Elastic Load Balancer
Set up to use an Amazon Certificate https connection with a domain managed by GoDaddy since Gov Cloud route53 does not allow public hosted zones
Listener on http permanently redirects to https
Roles
ecsInstanceRole (Used for the EC2 instance on ECS)
Attached policies: AmazonS3FullAccess, AmazonEC2ContainerServiceforEC2Role, AmazonRDSFullAccess
Trust relationships: ec2.amazonaws.com
ecsTaskExecutionRole (Used for executionRole in task definition)
Attached policies: AmazonECSTaskExecutionRolePolicy
Trust relationships: ec2.amazonaws.com, ecs-tasks.amazonaws.com
ecsRunTaskRole (Used for taskRole in task definition)
Attached policies: AmazonS3FullAccess, CloudWatchLogsFullAccess, AmazonRDSFullAccess
Trust relationships: ec2.amazonaws.com, ecs-tasks.amazonaws.com
S3 Bucket
Standard bucket set up in the same Gov Cloud region as everything else
Trouble Shooting
If I bypass the connection to s3 the application successfully launches and I can connect to the website, but since static files are supposed to be hosted on s3 there is less formatting and images are missing.
Using a bastion instance I was able to ssh into the EC2 instance running the container and successfully test my connection to s3 from there using aws s3 ls s3://BUCKET_NAME
If I connect to a shell within the application container itself and I try to connect to the bucket using...
s3 = boto3.resource('s3')
bucket = s3.Bucket(BUCKET_NAME)
s3.meta.client.head_bucket(Bucket=bucket.name)
I receive a timeout error...
File "/.venv/lib/python3.9/site-packages/urllib3/connection.py", line 179, in _new_conn
raise ConnectTimeoutError(
urllib3.exceptions.ConnectTimeoutError: (<botocore.awsrequest.AWSHTTPSConnection object at 0x7f3da4467190>, 'Connection to BUCKET_NAME.s3.amazonaws.com timed out. (connect timeout=60)')
...
File "/.venv/lib/python3.9/site-packages/botocore/httpsession.py", line 418, in send
raise ConnectTimeoutError(endpoint_url=request.url, error=e)
botocore.exceptions.ConnectTimeoutError: Connect timeout on endpoint URL: "https://BUCKET_NAME.s3.amazonaws.com/"
Based on this article I think this may have something to do with the fact that I am using the GoDaddy DNS servers which may be preventing proper URL resolution for S3.
If you're using the Amazon DNS servers, you must enable both DNS
hostnames and DNS resolution for your VPC. If you're using your own
DNS server, ensure that requests to Amazon S3 resolve correctly to the
IP addresses maintained by AWS.
I am unsure of how to ensure that requests to Amazon S3 resolve correctly to the IP address maintained by AWS. Perhaps I need to set up another private DNS on route53?
I have tried a very similar set up for this application in AWS non-Gov Cloud using route53 public DNS instead of GoDaddy and there is no issue connecting to S3.
Please let me know if there is any other information I can provide to help.
AWS Region
The issue lies within how boto3 handles different aws regions. This may be unique to usage on AWS GovCloud. Originally I did not have a region configured for S3, but according to the docs an optional environment variable named AWS_S3_REGION_NAME can be set.
AWS_S3_REGION_NAME (optional: default is None)
Name of the AWS S3 region to use (eg. eu-west-1)
I reached this conclusion thanks to a stackoverflow answer I was using to try to manually connect to s3 via boto3. I noticed that they included an argument for region_name when creating the session, which alerted me to make sure I had appropriately set the region in my app.settings and environment variables.
If anyone has some background on why this needs to be set for GovCloud functionality but apparently not for commercial, I would be interested to know.
Signature Version
I also had to specify the AWS_S3_SIGNATURE_VERSION in app.settings so boto3 knew to use version 4 of the signature. According to the docs
As of boto3 version 1.13.21 the default signature version used for generating presigned urls is still v2. To be able to access your s3 objects in all regions through presigned urls, explicitly set this to s3v4. Set this to use an alternate version such as s3. Note that only certain regions support the legacy s3 (also known as v2) version.
Some additional information in this stackoverflow response details that new S3 regions deployed after January 2014 will only support signature version 4. AWS docs notice
Apparently GovCloud is in this group of newly deployed regions.
If you do not specify this calls to the s3 bucket for static files, such as js scripts, during operation of the web application will receiving a 400 response. S3 responds with the error message
<Code>InvalidRequest</Code>
<Message>The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256.</Message>
<RequestId>#########</RequestId>
<HostId>##########</HostId>
</Error>```
I am trying to understand the concept of how VPC endpoints work and I am not sure that I understand the AWS documentation. For example, I have a private S3 bucket and I have an EKS cluster. So if my bucket is private I believe that traffic from the EKS cluster to S3 does not go through the internet, but only through the AWS network. But in a case my s3 bucket was public, then probably I will need to set up the VPC endpoint, so traffic will not leave the AWS. The same logic I would expect with ECR, if it is private you load images to your EKS through AWS network.
So what is the exact case when you need to use VPC endpoint within your AWS account (not from on-prem or another VPC)?
VPC endpoints are typically used with public AWS services (such as S3, DynamoDB, ECR, etc.) when the client applications are hosted inside your VPC and you do not want to route traffic via public Internet, which would otherwise result in a number of hops to reach the AWS service.
Imagine a situation when you have an app running on an EC2 instance, which is deployed to a private subnet of your VPC (i.e. a Pod in your EKS cluster). This app reads/writes data from/to AWS S3. If you do not use a VPC endpoint, your traffic will first reach your NAT gateway, then your VPC's Internet gateway out to the public Internet. Eventually, it will hit AWS S3. The response will travel back via the same route.
Same thing with ECR (i.e. a new instance of your Kubernetes Pod started by the kubelet). It's better (i.e. quicker) to pick the shortest route to download a Docker image from ECR rather than traverse a number of switches/routers. With a VPC endpoint your traffic will first hit the VPC endpoint (without leaving your private subnet) and then reach e.g. ECR directly (traffic does not leave the Amazon network).
As correctly mentioned by #jarmod, one should differentiate between routing (Layer 3 in the OSI model) and authentication/authorization (Layer 7). For example, you can use a VPC endpoint to reach AWS S3, but not be authorized (or even unauthenticated) to e.g. read a file from an S3 bucket.
Hope this clarifies the idea behind using VPC endpoints.
I have a docker image which is just an Java application. The java application reads data from DynamoDB and S3 buckets and outputs something (its a test app). I have hosted the docker images onto public docker-hub repo.
In AWS, i have created private subnet which is hosting an EC2 via AWS ECS. Now to have security high; i am using VPC Endpoints for DynamoDB and S3 bucket operations for the containers.
And i have used NAT Gateway to allow EC2 to pull docker images from docker-hub.
Problem:
When i remove VPC Endpoint, the application is able to read DynamoDB and S3 via NAT. Which means the traffic is going through public network.
Thoughts:
Can not whitelist the Ip addresses of Dockerhub as it can change.
Since AWS ECS handles all the docker pull etc tasks, i do not have control to customize.
I do not want to use AWS container registry. I prefer dockerhub.
DynamoDB/S3 private addresses are not known
Question:
How to make sure that traffic for docker hub should only be allowed via NAT?
How to make sure that the DynamoDB and S3 access should be via Endpoints only?
Thanks for your help
IF you want to restrict outbound traffic over your NAT (by DNS hostname) to DockerHub only you will need a third party solution that can allow or deny outbound traffic before it traverses the internet.
You would install this appliance in a separate subnet which has NAT Gateway access. Then in your existing subnet(s) for ECS you would update the route table to have the 0.0.0.0/0 route speak to this appliance (by specifying its ENI). If you check the AWS marketplace there may be a solution already in place to fulfil the domain filter.
Alternatively you could automate a tool that is able scrape the whitelisted IP addresses for DockerHub, and then have it add these as allow all traffic rules with a NACL. This NACL would only be applied to the subnets that the NAT Gateway resides in.
Regarding your second question, from the VPC point of view by adding the prefix list of the S3 and DynamoDB endpoints to the route table it will forward any requests that hit these API endpoints through the private route.
At this time DynamoDB does not have the ability to prevent public routed interaction, however S3 does. By adding a condition of the VPCE to its bucket policy you can deny any access that tries to interact outside of the listed VPC Endpoint. Be careful not to block yourself access from the console however, by blocking only the specific verbs that you don't want allowed.
We have a hybrid model with on-premise connected to AWS via a site-to-site VPN. There is a need to download data from s3 to on-premise in a way that the traffic will go from on-premise to AWS and back without going to the open Internet for security considerations. I.e. similar to this:
on-prem --VPN--> AWS private subnet --> s3 endpoint --> s3
This schema works with interface endpoints since they generate private DNS names which can be used to call from on-premise, but the s3 endpoint is a gateway endpoint, not an interface endpoint, so it doesn't generate private DNS names.
How can this be achieved?
In February 2021, AWS released S3 PrivateLink Interface Endpoints which are different to the S3 Gateway Endpoints.
The difference is that S3 Interface Endpoints resolve to private VPC IP addresses and are routeable from outside the VPC (e.g via VPN, Direct Connect, Transit Gateway etc). S3 Gateway Endpoints use public IP ranges and are only routeable from resources within the VPC.
Interface Endpoints mean you can route to S3 buckets from your on-premise network via your VPN and one or more subnets without needing a proxy in the VPC, and without traversing the public Internet.
Refer to the blog announcement and the S3 privatelink user guide for more details.
According to the VPC endpoints documentation S3 doesn't provide direct access through a VPN:
Endpoint connections cannot be extended out of a VPC. Resources on the other side of a VPN connection, a VPC peering connection, an AWS Direct Connect connection, or a ClassicLink connection in your VPC cannot use the endpoint to communicate with resources in the endpoint service.
However, you could route the Amazon S3 IP address ranges through your VPN connection to the VPC and explicitly allow access to the S3 buckets for your VPN's public IP addresses in the bucket policy and deny everything else.
Please note that the Amazon S3 IP address ranges are subject to change.
According to the AWS documentation on NAT Gateways, they cannot send traffic over VPC endpoints, unless it is setup in the following manner:
A NAT gateway cannot send traffic over VPC endpoints [...]. If your instances in the private subnet must access resources over a VPC endpoint [...], use the private subnet’s route table to route the traffic directly to these devices.
Following this example in the docs, I created the following configuration for my ECS app:
VPC (vpc-app) with CIDR 172.31.0.0/16.
App subnet (subnet-app) with the following route table:
Destination | Target
----------------|-----------
172.31.0.0/16 | local
0.0.0.0/0 | nat-main
NAT Gateway (nat-main) in vpc-app in subnet default-1 with the following Route Table:
Destination | Target
----------------|--------------
172.31.0.0/16 | local
0.0.0.0/0 | igw-xxxxxxxx
Security Group (sg-app) with port 443 open for subnet-app.
VPC Endpoints (Interface type) with vpc-app, subnet-app and sg-app for the following services:
com.amazonaws.eu-west-1.ecr.api
com.amazonaws.eu-west-1.ecr.dkr
com.amazonaws.eu-west-1.ecs
com.amazonaws.eu-west-1.ecs-agent
com.amazonaws.eu-west-1.ecs-telemetry
com.amazonaws.eu-west-1.s3 (Gateway)
It's also important to mention that I've enabled DNS Resolution and DNS Hostnames for vpc-app, as well as the Enable Private DNS Name option for the ecr-dkr and ecr-api VPC endpoints.
I've also tried working only with Fargate containers since they don't have the added complication of the ECS Agent, and because according to the docs:
Tasks using the Fargate launch type only require the com.amazonaws.region.ecr.dkr Amazon ECR VPC endpoint and the Amazon S3 gateway endpoint to take advantage of this feature.
This also doesn't work and every time my Fargate tasks run I see a spike in Bytes out to source under nat-main's Monitoring.
No matter what I try, the EC2 instances (and Fargate tasks) in the subnet-app are still pulling images using nat-main and not going to the local address of the ECR service.
I've restarted the ECS Agent and made sure to check all the boxes in the ECS Interface VPC Endpoints guide AND the ECR Interface Endpoints guide.
What am I missing here?
Any help would be appreciated.
After many hours of trial and error, and with lots of help from #jogold, the missing piece was found in this blog post:
The next step is to create a gateway VPC endpoint for S3. This is necessary because ECR uses S3 to store Docker image layers. When your instances download Docker images from ECR, they must access ECR to get the image manifest and S3 to download the actual image layers.
After I created the S3 Gateway VPCE, I forgot to add its address to subnet-app's routing table, so although the initial request to my ECR URI was made using the internal address, the downloading of the image from S3 still used the NAT Gateway.
After adding the entry, the network usage of the NAT Gateway dropped dramatically.
More information on how to setup Gateway VPCE can be found here.
Interface VPC endpoints work with DNS resolution, not routing.
In order for you configuration to work, you need to ensure that you checked Enable Private DNS Name when you created the endpoint. This enables you to make requests to the service using its default DNS hostname instead of the endpoint-specific DNS hostnames.
From the documentation:
When you create an interface endpoint, we generate endpoint-specific DNS hostnames that you can use to communicate with the service. For AWS services and AWS Marketplace partner services, you can optionally enable private DNS for the endpoint. This option associates a private hosted zone with your VPC. The hosted zone contains a record set for the default DNS name for the service (for example, ec2.us-east-1.amazonaws.com) that resolves to the private IP addresses of the endpoint network interfaces in your VPC. This enables you to make requests to the service using its default DNS hostname instead of the endpoint-specific DNS hostnames. For example, if your existing applications make requests to an AWS service, they can continue to make requests through the interface endpoint without requiring any configuration changes.
The alternative is to update your application to use your endpoint-specific DNS hostnames.
Note that to use private DNS names, DNS resolution and DNS hostnames must be enabled for your VPC:
Also note that in order to use ECR/ECS without a NAT gateway, you need to configure a S3 endpoint (gateway, requires route table update) to allow instances to download the image layers from the underlying private Amazon S3 buckets that host them. More information in Setting up AWS PrivateLink for Amazon ECS, and Amazon ECR