I want to run an ECS Task on EC2 instance, and I want that task/container to be able to call other AWS services via Boto3.
When I run the same task on Fargate, it works as expected and I am able to call other AWS services from the task/container. When I run the ECS Task on EC2, it given me connection timeout errors when attempting to call other AWS services. (The specific errors depend on the service.)
In an attempt to rule out any permission issues, I am running in a public subnet and using a single IAM role (with the AdministratorAccess policy) for the EC2 instance, ECS task role, and ECS task execution role.
The ECS Task on EC2 IS able to access the internet (which I confirmed by having it ping google.com).
What are any other conditions that need to be satisfied in order to call other AWS services from a container on ECS + EC2?
The cause of my issue was using a public subnet and the awsvpc network mode.
Using Amazon EC2 — You can launch EC2 instances on a public subnet.
Amazon ECS uses these EC2 instances as cluster capacity, and any
containers that are running on the instances can use the underlying
public IP address of the host for outbound networking. This applies to
both the host and bridge network modes. However, the awsvpc network
mode doesn't provide task ENIs with public IP addresses. Therefore,
they can’t make direct use of an internet gateway.
-- Amazon Elastic Container Service Best Practices Guide
Related
Using Jenkins, I am configuring CI/CD.
We push the docker image to aws ECR, and then update the aws ECS service with the aws cli command.
The network interface and IP change when updating. Is there a way to fix this?
aws ecs update-service --cluster ${CLUSTER_NAME} --service ${SERVICE_NAME} --force-new-deployment
Note that the Network Interface and corresponding IP are on the individual Task(s) running in the ECS service, not the Service itself. There's no way to preserve the network interface/IP of the individual tasks within the service. Those IP addresses are subject to change any time new ECS Tasks are started, which may be due to an update to the service, or it may be due to auto-scaling events, or due to ECS replacing a failed task.
If you have something outside the VPC that needs to connect to the ECS service via a static IP then you need to place a Network Load Balancer in front of the ECS service, and assign an Elastic IP to the load balancer. All incoming requests would then be sent to the Elastic IP.
If you have something outside the VPC that the ECS service is connecting to, and it is restricting those connections by IP address, then you need to place the ECS service in private subnets, with routes to a NAT Gateway, and assign an Elastic IP to the NAT Gateway. All outbound requests would then appear to be coming from the Elastic IP assigned to the NAT Gateway.
I have created an ECS Fargate Task, which I can manually run. It updates a Dynomodb and I get logs.
Now I want this to run on a schedule. I have setup a scheduled ECS task through EventBridge. However, this does not run.
My looking at the EventBridge logs I can see that the container has been stopped for the following stopped reason:
ResourceInitializationError: unable to pull secrets or registry auth: execution resource
retrieval failed: unable to retrieve ecr registry auth: service call has been retried 3
time(s): RequestError: send request failed caused by: Post https://api.ecr....
I thought this might be a problem with permissions. However, I tested giving the Task Execution Role full power user permissions and I still get the same error. Could the problem be something else?
This is due to a connectivity issue.
The docs say the following:
For tasks on Fargate, in order for the task to pull the container image it must either use a public subnet and be assigned a public IP address or a private subnet that has a route to the internet or a NAT gateway that can route requests to the internet.
So you need to make sure your task has a route to an internet gateway (i.e. it's in a Public subnet) or a NAT gateway.
Alternatively, if your service is in an isolated subnet, you need to create VPC endpoints for ECR and other services you need to call, as described in the docs:
To allow your tasks to pull private images from Amazon ECR, you must create the interface VPC endpoints for Amazon ECR.
When you create a scheduled task, you also specify the networking options. The docs mention this step:
(Optional) Expand Configure network configuration to specify a network configuration. This is required for tasks hosted on Fargate and for tasks using the awsvpc network mode.
For Subnets, specify one or more subnet IDs.
For Security groups, specify one or more security group IDs.
For Auto-assign public IP, specify whether to assign a public IP address from your subnet to the task.
So the networking configuration changed between the manually run task and the scheduled task. Refer to the above to figure out the needed settings for your case.
I fixed this by enabling auto-assign public IP.
However, to do this, I had to first change from "Capacity provider strategy" -
"Use cluster default", to "Launch type" - "FARGATE". Then the option to enable auto-assign public IP became available in the dropdown in the EventBridge UI.
This seems odd to me, because my default capacity provider strategy for my cluster is Fargate. But it is working now.
Need to use a gateway to follow the traffic from ECS to ECR. It can either Internet Gateway or NAT Gateway eventually which would be effecting cost factor.
But where we can resolve this scenario, by creating VPC Endpoints. Which maintains the traffic within the AWS Resources.
Endpoints Required for this would be :
S3 Gateway
ECR
ECS
I did not quite understand the configuring of VPC "CIDR block" while creating fargate cluster. Based on the link https://www.datadoghq.com/blog/aws-fargate-metrics/, there is a fleet that runs outside my VPC that has the infrastructure to run my fargate tasks.
What I dont understand if I configure a dedicated VPC for my fargate cluster. How does it connect with dedicated AWS managed infrastructure for fargate.
I did not find any documentation with some explaination.
After googling for sometime, found this https://thenewstack.io/aws-fargate-through-the-lens-of-kubernetes/
The author states the VPC configured during fargate cluster creation acts as proxy and requests are fwded to EC2 instance running in VPC owned/managed by AWS. Configuring VPC serves the purpose of controlling the IP range of ENI attached to containers. This is based on my observation, need something more to back it up.
I have two VPC's in the same account. VPC-A(has RDS installed), VPC-B has services installed through ECS EC2 deployment.
VPC-B has multiple subnets. Services deployed through ECS EC2 service couldn't integrate with RDS. It keeps getting the following error message("Is the server running on host "....")
Where as telnet on RDS database port from Ec2instance(E1) inc VPC-B subnet can connect to the database.
But, it couldn't start the server if the same services are installed through ECS. When manually trying to start the container it works(able to connect to the database).
I also set up a Peering connection between two VPC's but the connection problem exists only when the container is started through ECS EC2 deployment.
The dropdown for public IP has "Disabled" and no other options. Subnet's are public subnets.
Any help/thoughts will be highly helpful.
As per aws docs "awsvpc" launches in a private IP and to interact with external services nat gateway needs to be attached to subnet.
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-networking.html#task-networking-considerations
The awsvpc network mode does not provide task ENIs with public IP addresses for tasks that use the EC2 launch type. To access the internet, tasks that use the EC2 launch type should be launched in a private subnet that is configured to use a NAT gateway.
"Auto assign public IP" mode is "Enabled" with "bridge" netowrking mode on on ECS EC2 launch.
The error message is:
Stack named 'awseb-e-r3uhxvhyz7-stack' aborted operation. Current state: 'CREATE_FAILED' Reason: The following resource(s) failed to create: [AWSEBInstanceLaunchWaitCondition].
I am trying to use Multi-Container Docker in AWS Elastic Beanstalk.
Can someone help me to get rid of this error.Is it necessary to use more than one EC2 instance for using Multi-Container Docker in AWS Elastic Beanstalk?
This sound kinda what your issue is:
If you use Amazon VPC with Elastic Beanstalk, Amazon EC2 instances deployed in a private subnet cannot communicate directly with the Internet. Amazon EC2 instances must have Internet connectivity to communicate to Elastic Beanstalk that they were successfully launched. To provide EC2 instances in a private subnet with Internet connectivity, you must add a load balancer and NAT to the public subnet. You must create the appropriate routing rules for inbound and outbound traffic through the load balancer and NAT. You must also configure the default Amazon VPC security group to allow traffic from the Amazon EC2 instances to the NAT instance.
Source: Amazon EC2 Instances Fail to Launch within the Wait Period
I've fixed this. It looks the like IAM role created by default for the single docker EB deployment didn't contain the necessary ECS Policy (unconfirmed).
I followed the instructions to create a policy to add the role and everyhing worked.