Django App in ECS Container Cannot Connect to S3 in Gov Cloud - amazon-web-services

I have a container running in an EC2 instance on ECS. The container is hosting a django based application that utilizes S3 and RDS for its file storage and db needs respectively. I have appropriately configured my VPC, Subnets, VPC endpoints, Internet Gateway, roles, security groups, and other parameters such that I am able to host the site, connect to the RDS instance, and I can even access the site.
The issue is with the connection to S3. When I try to run the command python manage.py collectstatic --no-input which should upload/update any new/modified files to S3 as part of the application set up the program hangs and will not continue. No files are transferred to the already set up S3 bucket.
Details of the set up:
All of the below is hosted on AWS Gov Cloud
VPC and Subnets
1 VPC located in Gov Cloud East with 2 availability zones (AZ) and one private and public subnet in each AZ (4 total subnets)
The 3 default routing tables (1 for each private subnet, and 1 for the two public subnets together)
DNS hostnames and DNS resolution are both enabled
VPC Endpoints
All endpoints have the "vpce-sg" security group attached and are associated to the above vpc
s3 gateway endpoint (set up to use the two private subnet routing tables)
ecr-api interface endpoint
ecr-dkr interface endpoint
ecs-agetn interface endpoint
ecs interface endpoint
ecs-telemetry interface endpoint
logs interface endpoint
rds interface endpoint
Security Groups
Elastic Load Balancer Security Group (elb-sg)
Used for the elastic load balancer
Only allows inbound traffic from my local IP
No outbound restrictions
ECS Security Group (ecs-sg)
Used for the EC2 instance in ECS
Allows all traffic from the elb-sg
Allows http:80, https:443 from vpce-sg for s3
Allows postgresql:5432 from vpce-sg for rds
No outbound restrictions
VPC Endpoints Security Group (vpce-sg)
Used for all vpc endpoints
Allows http:80, https:443 from ecs-sg for s3
Allows postgresql:5432 from ecs-sg for rds
No outbound restrictions
Elastic Load Balancer
Set up to use an Amazon Certificate https connection with a domain managed by GoDaddy since Gov Cloud route53 does not allow public hosted zones
Listener on http permanently redirects to https
Roles
ecsInstanceRole (Used for the EC2 instance on ECS)
Attached policies: AmazonS3FullAccess, AmazonEC2ContainerServiceforEC2Role, AmazonRDSFullAccess
Trust relationships: ec2.amazonaws.com
ecsTaskExecutionRole (Used for executionRole in task definition)
Attached policies: AmazonECSTaskExecutionRolePolicy
Trust relationships: ec2.amazonaws.com, ecs-tasks.amazonaws.com
ecsRunTaskRole (Used for taskRole in task definition)
Attached policies: AmazonS3FullAccess, CloudWatchLogsFullAccess, AmazonRDSFullAccess
Trust relationships: ec2.amazonaws.com, ecs-tasks.amazonaws.com
S3 Bucket
Standard bucket set up in the same Gov Cloud region as everything else
Trouble Shooting
If I bypass the connection to s3 the application successfully launches and I can connect to the website, but since static files are supposed to be hosted on s3 there is less formatting and images are missing.
Using a bastion instance I was able to ssh into the EC2 instance running the container and successfully test my connection to s3 from there using aws s3 ls s3://BUCKET_NAME
If I connect to a shell within the application container itself and I try to connect to the bucket using...
s3 = boto3.resource('s3')
bucket = s3.Bucket(BUCKET_NAME)
s3.meta.client.head_bucket(Bucket=bucket.name)
I receive a timeout error...
File "/.venv/lib/python3.9/site-packages/urllib3/connection.py", line 179, in _new_conn
raise ConnectTimeoutError(
urllib3.exceptions.ConnectTimeoutError: (<botocore.awsrequest.AWSHTTPSConnection object at 0x7f3da4467190>, 'Connection to BUCKET_NAME.s3.amazonaws.com timed out. (connect timeout=60)')
...
File "/.venv/lib/python3.9/site-packages/botocore/httpsession.py", line 418, in send
raise ConnectTimeoutError(endpoint_url=request.url, error=e)
botocore.exceptions.ConnectTimeoutError: Connect timeout on endpoint URL: "https://BUCKET_NAME.s3.amazonaws.com/"
Based on this article I think this may have something to do with the fact that I am using the GoDaddy DNS servers which may be preventing proper URL resolution for S3.
If you're using the Amazon DNS servers, you must enable both DNS
hostnames and DNS resolution for your VPC. If you're using your own
DNS server, ensure that requests to Amazon S3 resolve correctly to the
IP addresses maintained by AWS.
I am unsure of how to ensure that requests to Amazon S3 resolve correctly to the IP address maintained by AWS. Perhaps I need to set up another private DNS on route53?
I have tried a very similar set up for this application in AWS non-Gov Cloud using route53 public DNS instead of GoDaddy and there is no issue connecting to S3.
Please let me know if there is any other information I can provide to help.

AWS Region
The issue lies within how boto3 handles different aws regions. This may be unique to usage on AWS GovCloud. Originally I did not have a region configured for S3, but according to the docs an optional environment variable named AWS_S3_REGION_NAME can be set.
AWS_S3_REGION_NAME (optional: default is None)
Name of the AWS S3 region to use (eg. eu-west-1)
I reached this conclusion thanks to a stackoverflow answer I was using to try to manually connect to s3 via boto3. I noticed that they included an argument for region_name when creating the session, which alerted me to make sure I had appropriately set the region in my app.settings and environment variables.
If anyone has some background on why this needs to be set for GovCloud functionality but apparently not for commercial, I would be interested to know.
Signature Version
I also had to specify the AWS_S3_SIGNATURE_VERSION in app.settings so boto3 knew to use version 4 of the signature. According to the docs
As of boto3 version 1.13.21 the default signature version used for generating presigned urls is still v2. To be able to access your s3 objects in all regions through presigned urls, explicitly set this to s3v4. Set this to use an alternate version such as s3. Note that only certain regions support the legacy s3 (also known as v2) version.
Some additional information in this stackoverflow response details that new S3 regions deployed after January 2014 will only support signature version 4. AWS docs notice
Apparently GovCloud is in this group of newly deployed regions.
If you do not specify this calls to the s3 bucket for static files, such as js scripts, during operation of the web application will receiving a 400 response. S3 responds with the error message
<Code>InvalidRequest</Code>
<Message>The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256.</Message>
<RequestId>#########</RequestId>
<HostId>##########</HostId>
</Error>```

Related

Connect timeout on endpoint URL: "https://sts.us-west-2.amazonaws.com/" in AWS EKS with IRSA for RDS,S3 and security groups applied for RDS

I created a cluster where a pod should read/write data from/to RDS and S3. In order to make the connection secure, I added IRSA for S3 and RDS. An additional layer of security was added by creating a security group for the pod so that it can talk to RDS. However after doing this, while the pod can write to RDS and S3 without any issues, pod can read only from RDS and not from S3. I exec'd into the pod to see what was happening. When I execute aws s3 ls and aws sts get-caller-identity. I get Connect timeout on endpoint URL: "https://sts.us-west-2.amazonaws.com/" as output.
In order to implement security groups for pods, I followed https://docs.aws.amazon.com/eks/latest/userguide/security-groups-for-pods.html. I understand that when security group is applied to a pod, source NAT is disabled so I created a VPC endpoint for S3 (Gateway Endpoint). I created an outbound rule in the pod's security group to access managed prefix list for S3 as well. I followed instructions on Managing Amazon S3 access with VPC endpoints and S3 Access Points for this. This didn't help with execution of the commands that I showed earlier.
I also created an Interface VPC Endpoint for STS but that didn't work either.
I have referred to https://github.com/aws/amazon-vpc-cni-k8s/issues/1211 as well. I am already following the instructions mentioned in this post as the dns resolution is active for my cluster.

Install the AWS Cloudwatch Agent from a S3 VPC endpoint

To keep our resources on AWS secure, we are trying to block access to the internet for our EC2 instances unless we explicitly need it. We have one EC2 instance (Ubuntu) running that we want to install the AWS cloudwatch agent on. The default way to do this is to use wget to download the installation files from an s3-internal address (as seen in the linked article).
We now want to replace the public access our EC2 instance has to the internet with VPC endpoints. I created an interface endpoint for global S3 access and S3 access in our region each. Optimally, the EC2 instance would now connect through our endpoint to the S3 bucket to download the resources from the AWS address.
How can I now access the files from my EC2 instance using wget? The article lists an url option for the global s3 access and another url for regional S3 access, but I can not get a connection using either. Here's a few examples of urls I tried:
wget https://accesspoint.s3-global.amazonaws.com/amazoncloudwatch-agent/ubuntu/amd64/latest/amazon-cloudwatch-agent.deb
wget https://s3.vpce-123456.s3.eu-central-1.vpce.amazonaws.com/amazoncloudwatch-agent-eu-central-1/ubuntu/amd64/latest/amazon-cloudwatch-agent.deb
wget https://amazoncloudwatch-agent-eu-central-1.vpce-123456.s3.eu-central-1.vpce.amazonaws.com/ubuntu/amd64/latest/amazon-cloudwatch-agent.deb
Note that accesspoint.s3-global.amazonaws.com is the internal private DNS entry created by the global s3 service endpoint (automatically), and *.vpce-123456.s3.eu-central-1.vpce.amazonaws.com is an example for one of the DNs entries created by the regional S3 service endpoint.
Make sure that you have updated the route table of your subnet. Add the rule that routes the traffic to the endpoint gateway (since we are talking about S3).

Route table for docker hub and vpc endpoints for private hosted instances: AWS

I have a docker image which is just an Java application. The java application reads data from DynamoDB and S3 buckets and outputs something (its a test app). I have hosted the docker images onto public docker-hub repo.
In AWS, i have created private subnet which is hosting an EC2 via AWS ECS. Now to have security high; i am using VPC Endpoints for DynamoDB and S3 bucket operations for the containers.
And i have used NAT Gateway to allow EC2 to pull docker images from docker-hub.
Problem:
When i remove VPC Endpoint, the application is able to read DynamoDB and S3 via NAT. Which means the traffic is going through public network.
Thoughts:
Can not whitelist the Ip addresses of Dockerhub as it can change.
Since AWS ECS handles all the docker pull etc tasks, i do not have control to customize.
I do not want to use AWS container registry. I prefer dockerhub.
DynamoDB/S3 private addresses are not known
Question:
How to make sure that traffic for docker hub should only be allowed via NAT?
How to make sure that the DynamoDB and S3 access should be via Endpoints only?
Thanks for your help
IF you want to restrict outbound traffic over your NAT (by DNS hostname) to DockerHub only you will need a third party solution that can allow or deny outbound traffic before it traverses the internet.
You would install this appliance in a separate subnet which has NAT Gateway access. Then in your existing subnet(s) for ECS you would update the route table to have the 0.0.0.0/0 route speak to this appliance (by specifying its ENI). If you check the AWS marketplace there may be a solution already in place to fulfil the domain filter.
Alternatively you could automate a tool that is able scrape the whitelisted IP addresses for DockerHub, and then have it add these as allow all traffic rules with a NACL. This NACL would only be applied to the subnets that the NAT Gateway resides in.
Regarding your second question, from the VPC point of view by adding the prefix list of the S3 and DynamoDB endpoints to the route table it will forward any requests that hit these API endpoints through the private route.
At this time DynamoDB does not have the ability to prevent public routed interaction, however S3 does. By adding a condition of the VPCE to its bucket policy you can deny any access that tries to interact outside of the listed VPC Endpoint. Be careful not to block yourself access from the console however, by blocking only the specific verbs that you don't want allowed.

AWS ECS: VPC Endpoints and NAT Gateways

According to the AWS documentation on NAT Gateways, they cannot send traffic over VPC endpoints, unless it is setup in the following manner:
A NAT gateway cannot send traffic over VPC endpoints [...]. If your instances in the private subnet must access resources over a VPC endpoint [...], use the private subnet’s route table to route the traffic directly to these devices.
Following this example in the docs, I created the following configuration for my ECS app:
VPC (vpc-app) with CIDR 172.31.0.0/16.
App subnet (subnet-app) with the following route table:
Destination | Target
----------------|-----------
172.31.0.0/16 | local
0.0.0.0/0 | nat-main
NAT Gateway (nat-main) in vpc-app in subnet default-1 with the following Route Table:
Destination | Target
----------------|--------------
172.31.0.0/16 | local
0.0.0.0/0 | igw-xxxxxxxx
Security Group (sg-app) with port 443 open for subnet-app.
VPC Endpoints (Interface type) with vpc-app, subnet-app and sg-app for the following services:
com.amazonaws.eu-west-1.ecr.api
com.amazonaws.eu-west-1.ecr.dkr
com.amazonaws.eu-west-1.ecs
com.amazonaws.eu-west-1.ecs-agent
com.amazonaws.eu-west-1.ecs-telemetry
com.amazonaws.eu-west-1.s3 (Gateway)
It's also important to mention that I've enabled DNS Resolution and DNS Hostnames for vpc-app, as well as the Enable Private DNS Name option for the ecr-dkr and ecr-api VPC endpoints.
I've also tried working only with Fargate containers since they don't have the added complication of the ECS Agent, and because according to the docs:
Tasks using the Fargate launch type only require the com.amazonaws.region.ecr.dkr Amazon ECR VPC endpoint and the Amazon S3 gateway endpoint to take advantage of this feature.
This also doesn't work and every time my Fargate tasks run I see a spike in Bytes out to source under nat-main's Monitoring.
No matter what I try, the EC2 instances (and Fargate tasks) in the subnet-app are still pulling images using nat-main and not going to the local address of the ECR service.
I've restarted the ECS Agent and made sure to check all the boxes in the ECS Interface VPC Endpoints guide AND the ECR Interface Endpoints guide.
What am I missing here?
Any help would be appreciated.
After many hours of trial and error, and with lots of help from #jogold, the missing piece was found in this blog post:
The next step is to create a gateway VPC endpoint for S3. This is necessary because ECR uses S3 to store Docker image layers. When your instances download Docker images from ECR, they must access ECR to get the image manifest and S3 to download the actual image layers.
After I created the S3 Gateway VPCE, I forgot to add its address to subnet-app's routing table, so although the initial request to my ECR URI was made using the internal address, the downloading of the image from S3 still used the NAT Gateway.
After adding the entry, the network usage of the NAT Gateway dropped dramatically.
More information on how to setup Gateway VPCE can be found here.
Interface VPC endpoints work with DNS resolution, not routing.
In order for you configuration to work, you need to ensure that you checked Enable Private DNS Name when you created the endpoint. This enables you to make requests to the service using its default DNS hostname instead of the endpoint-specific DNS hostnames.
From the documentation:
When you create an interface endpoint, we generate endpoint-specific DNS hostnames that you can use to communicate with the service. For AWS services and AWS Marketplace partner services, you can optionally enable private DNS for the endpoint. This option associates a private hosted zone with your VPC. The hosted zone contains a record set for the default DNS name for the service (for example, ec2.us-east-1.amazonaws.com) that resolves to the private IP addresses of the endpoint network interfaces in your VPC. This enables you to make requests to the service using its default DNS hostname instead of the endpoint-specific DNS hostnames. For example, if your existing applications make requests to an AWS service, they can continue to make requests through the interface endpoint without requiring any configuration changes.
The alternative is to update your application to use your endpoint-specific DNS hostnames.
Note that to use private DNS names, DNS resolution and DNS hostnames must be enabled for your VPC:
Also note that in order to use ECR/ECS without a NAT gateway, you need to configure a S3 endpoint (gateway, requires route table update) to allow instances to download the image layers from the underlying private Amazon S3 buckets that host them. More information in Setting up AWS PrivateLink for Amazon ECS, and Amazon ECR

Do AWS File Gateway uses S3 endpoint if within VPC?

I am planning to use AWS File Gateway in a hybrid environment where I will mount the File Gateway to an EC2 instance from within a private subnet. As per AWS documentation, all data transfer is done through HTTPS when using File Gateway.
But since my File Gateway, EC2 instance and S3 are all inside the AWS environment, will my File Gateway still transfer files over the internet to S3 service endpoint (s3.amazonaws.com) or will it leverage VPC endpoint for S3?
Note: I cannot use EFS for this purpose as it's not HIPAA complaint.
A VPC Endpoint for S3 uses a predefined IP prefix list in your subnet route tables, which hijacks all of the traffic bound for all of the IP addresses assigned to S3 in your region... so from a subnet associated with an S3 VPC endpoint, all traffic bound for any S3 address in the region is routed through the endpoint.
To state it another way, when correctly configured, an S3 VPC endpoint becomes the only way S3 can be accessed from the associated subnets, and because it's done at the IP routing layer, anything accessing S3 from those subnets will automatically and transparently use the endpoint.
The prefix list ID logically represents the range of public IP addresses used by the service. All instances in subnets associated with the specified route tables automatically use the endpoint to access the service; subnets that are not associated with the specified route tables do not use the endpoint.
http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-endpoints.html
In theory, if you configure your VPC Route Table to use the VPC Endpoint, then any traffic destined for S3 will be sent via the VPC Endpoint. (By the way, it might only work when connecting to S3 in the same region.)
Regardless, even if the traffic is routed through your Internet Gateway to the Amazon S3 endpoint, the traffic will not traverse the real "Internet" -- it will simply pass through the AWS edge of the Internet, never leaving the AWS data center (as long as it is in the same Region).