Why can't I run multiple tasks in AWS ECS? - amazon-web-services

I'm working on a file processing system where files can be uploaded to S3 and then processed in a container. I have been using triggering ECS to run tasks from lambda and passing a few environment variables.
S3 -> Lambda -> ECS
I'm running into a problem where I can't seem to run more than 1 task at once. If a task is already running then any subsequent tasks that get run are stuck in "PROVISIONING" and eventually disappear altogether.
Here is my lambda function that runs the ECS task:
const params: RunTaskRequest = {
launchType: "FARGATE",
cluster: "arn:aws:ecs:us-east-1:XXXXXXX:cluster/FileProcessingCluster",
taskDefinition: "XXX",
networkConfiguration: {
awsvpcConfiguration: {
subnets: [
"subnet-XXX",
"subnet-XXX"
],
securityGroups: [
"..."
],
assignPublicIp: "DISABLED"
}
},
overrides: {
containerOverrides: [
{
name: "FileProcessingContainer",
environment: [
...
]
},
]
},
};
try {
await ecs.runTask(params).promise();
}catch (e) {
console.error(e, e.stack)
}
I'm using AWS-CDK to create the ECS infrastructure:
const cluster = new ecs.Cluster(this, 'FileProcessingCluster', {
clusterName: "FileProcessingCluster"
});
const taskDefinition = new ecs.FargateTaskDefinition(this, "FileProcessingTask", {
memoryLimitMiB: 8192,
cpu: 4096,
});
taskDefinition.addContainer("FileProcessingContainer", {
image: ecs.ContainerImage.fromAsset("../local-image"),
logging: new ecs.AwsLogDriver({
streamPrefix: `${id}`
}),
memoryLimitMiB: 8192,
cpu: 4096,
});
Is there some something I'm missing here? Perhaps a setting related to concurrent tasks?

It turns out that I misconfigured the subnets in the task definition, this was preventing the image pull from ECR.
You can read more about it here:
ECS task not starting - STOPPED (CannotPullContainerError: “Error response from daemon request canceled while waiting for connection”
And:
https://aws.amazon.com/premiumsupport/knowledge-center/ecs-pull-container-error/

Related

How to create Logs for NetworkLoadBalancedFargateService in CDK

I am trying to create logs for the Network Load Balancer (not the task). Currently using the following code:
taskImageOptions: {
containerPort: 8080,
image: BrazilContainerImage.fromBrazil({
brazilPackage: BrazilPackage.fromString('Service'),
transformPackage: BrazilPackage.fromString('ServiceImageBuild'),
componentName: 'service',
}),
containerName: 'Application',
taskRole: this.taskRole,
environment: {
'STAGE': props.stage,
'SERVICE_RUN': 'true'
},
logDriver: new AwsLogDriver({
streamPrefix: 'NetworkLoadBalancer-',
logGroup: new LogGroup(this, 'Service-NetworkLoadBalancer', {
removalPolicy: RemovalPolicy.RETAIN,
retention: RetentionDays.THREE_MONTHS,
})
}),
},
But this creating a new log group by deleting the existing ServiceTaskDefApplicationLogGroup. I guess this is happening because of logDriver is inside the taskImageOptions but no logging options are available in NetworkLoadBalancedFargateService. Any suggestions?
The logDriver setting is specifically for your ECS tasks. It configures the logging for the output of your docker container(s). It is not related to load balancer access logs in any way.
You would need to take the loadBalancer property from the NetworkLoadBalancedFargateService and then call logAccessLogs() on it, as documented here.

How do I enable deletion protection for load balancer using ApplicationLoadBalancedFargateService cdk construct

I have created a Fargate service running on an ECS cluster fronted by an application load balancer using the ApplicationLoadBalancedFargateService CDK construct.
cluster,
memoryLimitMiB: 1024,
desiredCount: 1,
cpu: 512,
taskImageOptions: {
image: ecs.ContainerImage.fromRegistry("amazon/amazon-ecs-sample"),
},
});
There are no Props for enabling deletion protection. Can anyone tell from his experience?
CDK offers the Escape Hatches feature to use Clouformation Props if any High-level construct does not have parameters.
// Create a load-balanced Fargate service and make it public
var loadBalancedService = new ecs_patterns.ApplicationLoadBalancedFargateService(this, `${ENV_NAME}-pgadmin4`, {
cluster: cluster, // Required
cpu: 512, // Default is 256
desiredCount: 1, // Default is 1
taskImageOptions: {
image: ecs.ContainerImage.fromRegistry('image'),
environment: {}
},
memoryLimitMiB: 1024, // Default is 512
assignPublicIp: true
});
// Get the CloudFormation resource
const cfnLB = loadBalancedService.loadBalancer.node.defaultChild as elbv2.CfnLoadBalancer;
cfnLB.loadBalancerAttributes = [{
key: 'deletion_protection.enabled',
value: 'true',
},
];

ECS task unable to pull secrets or registry auth

I have a CDK project that creates a CodePipeline which deploys an application on ECS. I had it all previously working, but the VPC was using a NAT gateway, which ended up being too expensive. So now I am trying to recreate the project without requiring a NAT gateway. I am almost there, but I have now run into issues when the ECS service is trying to start tasks. All tasks fail to start with the following error:
ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve secret from asm: service call has been retried 5 time(s): failed to fetch secret
At this point I've kind of lost track of the different things I have tried, but I will post the relevant bits here as well as some of my attempts.
const repository = ECR.Repository.fromRepositoryAttributes(
this,
"ecr-repository",
{
repositoryArn: props.repository.arn,
repositoryName: props.repository.name,
}
);
// vpc
const vpc = new EC2.Vpc(this, this.resourceName(props, "vpc"), {
maxAzs: 2,
natGateways: 0,
enableDnsSupport: true,
});
const vpcSecurityGroup = new SecurityGroup(this, "vpc-security-group", {
vpc: vpc,
allowAllOutbound: true,
});
// tried this to allow the task to access secrets manager
const vpcEndpoint = new EC2.InterfaceVpcEndpoint(this, "secrets-manager-task-vpc-endpoint", {
vpc: vpc,
service: EC2.InterfaceVpcEndpointAwsService.SSM,
});
const secrets = SecretsManager.Secret.fromSecretCompleteArn(
this,
"secrets",
props.secrets.arn
);
const cluster = new ECS.Cluster(this, this.resourceName(props, "cluster"), {
vpc: vpc,
clusterName: `api-cluster`,
});
const ecsService = new EcsPatterns.ApplicationLoadBalancedFargateService(
this,
"ecs-service",
{
taskSubnets: {
subnetType: SubnetType.PUBLIC,
},
securityGroups: [vpcSecurityGroup],
serviceName: "api-service",
cluster: cluster,
cpu: 256,
desiredCount: props.scaling.desiredCount,
taskImageOptions: {
image: ECS.ContainerImage.fromEcrRepository(
repository,
this.ecrTagNameParameter.stringValue
),
secrets: getApplicationSecrets(secrets), // returns
logDriver: LogDriver.awsLogs({
streamPrefix: "api",
logGroup: new LogGroup(this, "ecs-task-log-group", {
logGroupName: `${props.environment}-api`,
}),
logRetention: RetentionDays.TWO_MONTHS,
}),
},
memoryLimitMiB: 512,
publicLoadBalancer: true,
domainZone: this.hostedZone,
certificate: this.certificate,
redirectHTTP: true,
}
);
const scalableTarget = ecsService.service.autoScaleTaskCount({
minCapacity: props.scaling.desiredCount,
maxCapacity: props.scaling.maxCount,
});
scalableTarget.scaleOnCpuUtilization("cpu-scaling", {
targetUtilizationPercent: props.scaling.cpuPercentage,
});
scalableTarget.scaleOnMemoryUtilization("memory-scaling", {
targetUtilizationPercent: props.scaling.memoryPercentage,
});
secrets.grantRead(ecsService.taskDefinition.taskRole);
repository.grantPull(ecsService.taskDefinition.taskRole);
I read somewhere that it probably has something to do with Fargate version 1.4.0 vs 1.3.0, but I'm not sure what I need to change to allow the tasks to access what they need to run.
You need to create an interface endpoints for Secrets Manager, ECR (two types of endpoints), CloudWatch, as well as a gateway endpoint for S3.
Refer to the documentation on the topic.
Here's an example in Python, it'd work the same in TS:
vpc.add_interface_endpoint(
"secretsmanager_endpoint",
service=ec2.InterfaceVpcEndpointAwsService.SECRETS_MANAGER,
)
vpc.add_interface_endpoint(
"ecr_docker_endpoint",
service=ec2.InterfaceVpcEndpointAwsService.ECR_DOCKER,
)
vpc.add_interface_endpoint(
"ecr_endpoint",
service=ec2.InterfaceVpcEndpointAwsService.ECR,
)
vpc.add_interface_endpoint(
"cloudwatch_logs_endpoint",
service=ec2.InterfaceVpcEndpointAwsService.CLOUDWATCH_LOGS,
)
vpc.add_gateway_endpoint(
"s3_endpoint",
service=ec2.GatewayVpcEndpointAwsService.S3
)
Keep in mind that interface endpoints cost money as well, and may not be cheaper than a NAT.

How to run a Fargate Task on an existing ecs cluster using aws cdk

I have an ECS cluster that will be created by my cdk stack. Before my ECS service stack deployment I have to run a fargate task to generate the build files and configs for my application. I want to run a standalone task inside an existing Ecs cluster.
There are two questions. I Will try to answer both:
First of all you need to run the Fargate task via CDK
you need to create a Rule which runs your ECS task by schedule (or some else event)
import { Rule, Schedule } from '#aws-cdk/aws-events';
import { EcsTask } from '#aws-cdk/aws-events-targets';
new Rule(this, 'ScheduleRule', {
schedule: schedule,
targets: [
new EcsTask({
cluster,
taskDefinition: task,
}),
],
});
Second one - how I can use the existing cluster
you can find your cluster by attributes
import { Cluster } from '#aws-cdk/aws-ecs';
let cluster = Cluster.fromClusterAttributes(this, 'cluster_id', {
clusterName: "CLUSTER_NAME", securityGroups: [], vpc: iVpc
});
update:
you can trigger your task via some custom event:
new Rule(this, 'EventPatternRule', {
eventPattern: {
"version": "0",
"id": "CWE-event-id",
"detail-type": "CodePipeline Pipeline Execution State Change",
"source": "aws.codepipeline",
"account": "123456789012",
"time": "2017-04-22T03:31:47Z",
"region": "us-east-1",
"resources": [
"arn:aws:codepipeline:us-east-1:123456789012:pipeline:myPipeline"
],
"detail": {
"pipeline": "myPipeline",
"version": "1",
"state": "STARTED",
"execution-id": "01234567-0123-0123-0123-012345678901"
}
}
targets: [
new EcsTask({
cluster,
taskDefinition: task,
}),
],
});
please, see this doc for the understanding of Event Patterns

AWS CDK stuck while creating EcsService Cloud Formation stack using ApplicationLoadBalancedEc2Service

I have the following code to create a ECS ApplicationLoadBalancedEc2Service, however it is stuck in creation for 2 hours and i don't see any errors in events.
Below is my code:
this.cluster = new Cluster(this, 'Cluster', {
vpc: props.vpc
});
this.cluster.addCapacity('DefaultAutoScalingGroupCapacity', {
instanceType: InstanceType.of(InstanceClass.R5D, InstanceSize.XLARGE24),
minCapacity: 2,
maxCapacity: 50,
});
this.service = new ApplicationLoadBalancedEc2Service(this, 'Service', {
cluster: props.ecsCluster,
memoryLimitMiB: 768000,
taskImageOptions: {
containerPort: 8080,
image: new ContainerImage({
package: Package.fromString('ECSMatching'),
transformPackage: Package.fromString('ECSMatchingImage'),
componentName: 'service',
}),
taskRole: getDefaultEcsTaskInstanceRole(this),
environment: {'STAGE': props.stage}
},
});
this.service.service.connections.allowFrom(
Peer.ipv4(props.ecsCluster.vpc.vpcCidrBlock),
Port.allTraffic(),
'Local VPC Access'
);
this.service.targetGroup.setAttribute('deregistration_delay.timeout_seconds', '6000');