Can ecs services communicate with AWS AutoScalingGroup without NAT Gateway - amazon-web-services

I have an architecture similar to https://github.com/aws-samples/ecs-refarch-cloudformation
I would like to know if I can have an AutoScalingGroup with instances in private subnets without using NAT Gateway?
I was experimenting with removing the NATs and adding VPC endpoints but I always end up with problem like this:
2022-08-21 10:55:07 UTC+1000 <MY_ECS_TEMPLATE> CREATE_FAILED The following resource(s) failed to create: [ECSAutoScalingGroup].
2022-08-21 10:55:07 UTC+1000 ECSAutoScalingGroup CREATE_FAILED Received 0 SUCCESS signal(s) out of 1. Unable to satisfy 100% MinSuccessfulInstancesPercent requirement

if I can have an AutoScalingGroup with instances in private subnets without using NAT Gateway?
Sadly no. But you can create a vpc interface endpoints for ecs which will enable communication with ecs from your private subnets, without the need for internet and nat.
But the error can be also from cloudformation (CFN) which can't get information back if ec2 instanses launched correctly. So you may need to add vpc endpoints for CFN as well.

I had similar issue. We use AWS Batch, which is based on AutoScaling Groups. We use squid proxy for internet access from private subnets. After I created few endpoints as mentioned before, spot VM started. Next I had to update AMI for these VM and set proper proxy/noproxy. And it works.

Related

Unable to provision Pod in EKS cluster configured with Fargate

I have recently setup an EKS cluster with Fargate.
When I tried to deploy Redis Service on k8s using guide, I am getting the following errors:
Pod provisioning timed out (will retry) for pod: default/redis-operator-79d6d769dc-j246j
Disabled logging because aws-logging configmap was not found. configmap "aws-logging" not found
For solving the above errors, I tried the following solutions but none of them worked
Created a NAT gateway for granting internet connection to the instances in the private subnets.
Updated CoreDNS to run pods on Fargate. Reference
The NAT gateway that I created was in the private subnet. The private subnets themselves don't have any access to the internet. Hence, I was stuck in a loop.
By creating a nat gateway in a public subnet and then adding in the router table of private subnets being used by the EKS cluster I was able to schedule the pods

AWS ECS: ResourceInitializationError: unable to pull secrets or registry auth

Background
Testing VPC:
2 private subnets with NACLS that allow all inbound from IPs within the VPC and all outbound traffic. The subnets have a route table configured to a NAT gateway within a public subnet.
2 public subnets that allow all inbound/outbound traffic. One of the subnets contains the NAT gateway and both subnets have a route table pointing to the Internet Gateway.
Problem
When running an ECS Fargate task (platform: 1.4) within one of the private subnets, the following error arises:
ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve ecr registry auth: service call has been retried 3 time(s): RequestError: send request failed caused by: Post https://api.ecr.us-west-2.amazonaws.com/: dial tcp <IP>:443: i/o timeout
The ECS task contains one container that uses a private ECR image hosted within the same AWS account. The security group associated with the task allows all inbound traffic from IPs within the VPC and allows all outbound traffic.
ECS task execution role contains the following policies:
"arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy",
"arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role"
Attempts:
When the private subnets NACL's inbound access was changed to allow all types of traffic, the ECS task was strangely able to pull the ECR image.
Created the VPC endpoints mentioned in this article with the correct security groups but got the same error.
I'm tempted to try following this guide although it specifically says:
If your task definition references an image that's stored in Amazon ECR, this topic doesn't apply.
I assume you are pulling an image from ECR?
If you are launching via ecs-cli, add this to the ecs-params.yml
(read about that file here: https://github.com/aws/amazon-ecs-cli)
Firstly: auto-assign public IP addresses ->
run_params:
network_configuration:
awsvpc_configuration:
subnets:
- "subnet-06786b976xx"
- "subnet-0b9dxxxxxxx"
security_groups:
- "sg-08157xxxxxxxxx4"
assign_public_ip: "ENABLED"
Secondly make sure that the security groups you use for your VPC allow 80/443 traffic
Thirdly make sure in AWS IMA that the execution role (ecsTaskExecutionRole) has policies including:
AWSAppRunnerServicePolicyForECRAccess

ECS Fargate Task in EventBridge fails with ResourceInitializationError

I have created an ECS Fargate Task, which I can manually run. It updates a Dynomodb and I get logs.
Now I want this to run on a schedule. I have setup a scheduled ECS task through EventBridge. However, this does not run.
My looking at the EventBridge logs I can see that the container has been stopped for the following stopped reason:
ResourceInitializationError: unable to pull secrets or registry auth: execution resource
retrieval failed: unable to retrieve ecr registry auth: service call has been retried 3
time(s): RequestError: send request failed caused by: Post https://api.ecr....
I thought this might be a problem with permissions. However, I tested giving the Task Execution Role full power user permissions and I still get the same error. Could the problem be something else?
This is due to a connectivity issue.
The docs say the following:
For tasks on Fargate, in order for the task to pull the container image it must either use a public subnet and be assigned a public IP address or a private subnet that has a route to the internet or a NAT gateway that can route requests to the internet.
So you need to make sure your task has a route to an internet gateway (i.e. it's in a Public subnet) or a NAT gateway.
Alternatively, if your service is in an isolated subnet, you need to create VPC endpoints for ECR and other services you need to call, as described in the docs:
To allow your tasks to pull private images from Amazon ECR, you must create the interface VPC endpoints for Amazon ECR.
When you create a scheduled task, you also specify the networking options. The docs mention this step:
(Optional) Expand Configure network configuration to specify a network configuration. This is required for tasks hosted on Fargate and for tasks using the awsvpc network mode.
For Subnets, specify one or more subnet IDs.
For Security groups, specify one or more security group IDs.
For Auto-assign public IP, specify whether to assign a public IP address from your subnet to the task.
So the networking configuration changed between the manually run task and the scheduled task. Refer to the above to figure out the needed settings for your case.
I fixed this by enabling auto-assign public IP.
However, to do this, I had to first change from "Capacity provider strategy" -
"Use cluster default", to "Launch type" - "FARGATE". Then the option to enable auto-assign public IP became available in the dropdown in the EventBridge UI.
This seems odd to me, because my default capacity provider strategy for my cluster is Fargate. But it is working now.
Need to use a gateway to follow the traffic from ECS to ECR. It can either Internet Gateway or NAT Gateway eventually which would be effecting cost factor.
But where we can resolve this scenario, by creating VPC Endpoints. Which maintains the traffic within the AWS Resources.
Endpoints Required for this would be :
S3 Gateway
ECR
ECS

How to launch EKS node group into a private subnet without a NAT gateway?

I am using EKS and I want to enhance the security by keeping one out of the total two nodegroups into a private subnet. However, I have read few documents from AWS where it is a need that if a nodegroup has to be launched in private subnet then there has to be a NAT gateway connection so that the nodegroup can connect with the AWS Control plane VPC and communicate to the master. Putting up NAT will be too much because of its charges. If there is a workaround which I can use then I would be happy to know about it. I know using eksctl we can launch a nodegroup into private subnet without NAT. But I need something which can be done without eksctl. If I am wrong in my understanding then please do let me know.
AWS provides an explanation and an VPC template (amazon-eks-fully-private-vpc.yaml) for EKS without NAT in a post titled:
How do I create an Amazon EKS cluster and node groups that don't require access to the internet?
Instead of NAT, VPC interface endpoints are used for:
ec2
logs
ecr.api
ecr.dkr
sts

CodeDeploy with VPC endpoint on private subnet instances

I am trying to use CodeDeploy to deploy my revisions on private subnet instances using VPC endpoint.
VPC endpoint is having required subnet configured and security group applied on VPC endpoint is having all inbound and outbound allowed. Still my deployment is failing and I do not have any way to check logs on EC2 instance since it is private. Any help or guidance regarding this?
I am trying to follow the below link:
https://aws.amazon.com/about-aws/whats-new/2020/08/aws-codedeploy-now-supports-deployments-to-virtual-private-cloud-endpoints/