Deployment hangs during execution of *asgLifecycleHookDrainHookRole - amazon-web-services

I'm trying to deploy the following stack to AWS using aws-cdk:
/* imports omitted */
export class AwsEcsStack extends cdk.Stack {
constructor(app: cdk.App, id: string) {
super(app, id);
const vpc = new ec2.Vpc(this, 'main', { maxAzs: 2 });
const cluster = new ecs.Cluster(this, 'candy-workers', { vpc });
cluster.addCapacity('candy-workers-asg', {
instanceType: ec2.InstanceType.of(ec2.InstanceClass.T2, ec2.InstanceSize.MICRO),
associatePublicIpAddress: false
});
const logging = new ecs.AwsLogDriver({ streamPrefix: "candy-logs", logRetention: logs.RetentionDays.ONE_DAY })
const repository = new ecr.Repository(this, 'candy-builds');
repository.addLifecycleRule({ tagPrefixList: ['prod'], maxImageCount: 100 });
repository.addLifecycleRule({ maxImageAge: cdk.Duration.days(30) });
const taskDef = new ecs.Ec2TaskDefinition(this, "candy-task");
taskDef.addContainer("candy-container", {
image: ecs.ContainerImage.fromEcrRepository(repository),
memoryLimitMiB: 512,
logging
})
new ecs.Ec2Service(this, "candy-service", {
cluster,
taskDefinition: taskDef,
});
const candyTopic1 = new sns.Topic(this, 'candy1', {
topicName: 'candy1',
displayName: 'Produce some candy'
})
new sns.Topic(this, 'candy2', {
topicName: 'candy2',
displayName: 'Produce some more candy'
})
new sns.Topic(this, 'candy3', {
topicName: 'candy3',
displayName: 'Produce some more candy'
})
const rule = new events.Rule(this, 'candy-cron', {
schedule: events.Schedule.expression('cron(0 * * ? * *)')
});
rule.addTarget(new targets.SnsTopic(candyTopic1));
}
}
const app = new cdk.App();
new AwsEcsStack(app, 'candy-app');
app.synth();
But it fails while executing the step 50 out of 53:
50/53 | 3:01:28 PM | CREATE_COMPLETE | AWS::Lambda::Permission | candy-workers/candy-workers-asg/DrainECSHook/Function/AllowInvoke:candyappcandyworkerscandyworkersasgLifecycleHookDrainHookTopic4AA69F1A (candyworkerscandyworkersasgDrainECSHookFunctionAllowInvokecandyappcandyworkerscandyworkersasgLifecycleHookDrainHookTopic4AA69F1AAFA44A3D)
50/53 Currently in progress: candyworkerscandyworkersasgLifecycleHookDrainHookRole4BCB2138, candyserviceServiceBB6CC91A
Sometimes it also hangs on the next step:
51/53 | 4:03:14 PM | CREATE_COMPLETE | AWS::Lambda::Permission | arbitrage-workers/arbitrage-workers-asg/DrainECSHook/Function/AllowInvoke:arbitrageapparbitrageworkersarbitrageworkersasgLifecycleHookDrainHookTopic4AA69F1A (arbitrageworkersarbitrageworkersasgDrainECSHookFunctionAllowInvokearbitrageapparbitrageworkersarbitrageworkersasgLifecycleHookDrainHookTopic4AA69F1AAFA44A3D)
51/53 Currently in progress: arbitrageserviceServiceBB6CC91A
I've been waiting for a long time now and it is not finishing the deployment. I can only imagine something went wrong.
Have you ever experienced this?

Related

CDKpipeline - Cannot set lambda layer in a stack called from multiple stages in a pipeline

I want to set multiple stages with the same stack in a cdk pipeline. But I am getting the following error when bootstrapping my cdk project
C:\dev\aws-cdk\node_modules\aws-cdk-lib\aws-lambda\lib\code.ts:185
throw new Error(`Asset is already associated with another stack '${cdk.Stack.of(this.asset).stackName}'. ` +
^
Error: Asset is already associated with another stack 'msm-customer'. Create a new Code instance for every stack.
at AssetCode.bind (C:\dev\aws-cdk\node_modules\aws-cdk-lib\aws-lambda\lib\code.ts:185:13)
at new LayerVersion (C:\dev\aws-cdk\node_modules\aws-cdk-lib\aws-lambda\lib\layers.ts:124:29)
at new CustomerStack (C:\dev\aws-cdk\lib\CustomerStack.ts:22:17)
After debugging the code I found out that it is the layer declaration in the "CustomerStack" that is causing the issue. If I comment the layer section or if I keep only one stage in my pipeline then the bootstrap cmd works successfully. .
Pipelinestack.ts
// Creates a CodeCommit repository called 'CodeRepo'
const repo = new codecommit.Repository(this, 'CodeRepo', {
repositoryName: "CodeRepo"
});
const pipeline = new CodePipeline(this, 'Pipeline-dev', {
pipelineName: 'Pipeline-dev',
synth: new CodeBuildStep('SynthStep-dev', {
//role: role,
input: CodePipelineSource.codeCommit(repo, 'master'),
installCommands: [
'npm install -g aws-cdk'
],
commands: [
'npm ci',
'npm run build',
'npx cdk synth'
],
})
});
pipeline.addStage(new PipelineStage(this, 'dev'));
pipeline.addStage(new PipelineStage(this, 'uat'));
pipeline.addStage(new PipelineStage(this, 'prod'));
PipelineStage.ts
export class PipelineStage extends Stage {
constructor(scope: Construct, id: string, props?: StageProps) {
super(scope, id, props);
new CustomerStack(this, 'msm-customer-' + id, {
stackName: 'msm-customer'
env: {
account: process.env.ACCOUNT,
region: process.env.REGION,
},
});
}
}
CustomerStack.ts
import { Duration, Stack, StackProps } from 'aws-cdk-lib';
import { Construct } from 'constructs';
import * as apigateway from 'aws-cdk-lib/aws-apigateway';
import * as lambda from 'aws-cdk-lib/aws-lambda';
import * as iam from 'aws-cdk-lib/aws-iam';
import * as ec2 from 'aws-cdk-lib/aws-ec2';
import 'dotenv/config';
export class CustomerStack extends Stack {
constructor(scope: Construct, id: string, props?: StackProps) {
super(scope, id, props);
//define existing role
const role = iam.Role.fromRoleArn(this, 'Role',
`arn:aws:iam::${Stack.of(this).account}:role/` + process.env.IAM_ROLE,
{ mutable: false },
);
//define layer
const layer = new lambda.LayerVersion(this, 'msm-layer', {
code: lambda.Code.fromAsset('resources/layer/customer'),
description: 'Frontend common resources',
compatibleRuntimes: [lambda.Runtime.NODEJS_14_X],
removalPolicy: cdk.RemovalPolicy.DESTROY
});
const lambdaDefault = new lambda.Function(this, 'default', {
runtime: lambda.Runtime.NODEJS_14_X,
code: lambda.Code.fromAsset('resources/lambda/customer/default'),
handler: 'index.handler',
role: role,
timeout: Duration.seconds(20),
memorySize: 256,
layers: [layer],
allowPublicSubnet: true,
vpcSubnets: { subnetType: ec2.SubnetType.PUBLIC }
});
//rest of the code
}
}

AWS CDK VPC with 2 subnets running 2 Fargate services

i am having troubles setting up my AWS VPC via cdk.
I want to create a VPC with 2 subnets, one public and one private_isolated, with no nat gateways and one internet gateway. In the public i will host my node web server and in the private my java microservice. Since i don't know a lot about networking in general, i used a private dns namespace and got the 2 services communicate to each other via AWS Service Discovery*.
The thing is, when i deploy my stack, it gets stuck in creating the java fargate service, and i think the problem is because of the private_isolated subnet. So i added interface endpoints for ecr_docker and ecr and even an s3 gateway endpoint, but the service still remains stuck on create in progress.
Here's my code:
export class EcsServiceDiscoveryStack extends Stack {
readonly serviceBack = "back-service";
readonly servicePdf = "pdf-service";
readonly namespace = "env.test";
readonly vpc: Vpc;
readonly cluster: Cluster;
readonly dnsNamespace: PrivateDnsNamespace;
readonly ingressSecurityGroup: SecurityGroup;
readonly egressSecurityGroup: SecurityGroup;
constructor(scope: Construct, id: string, props?: StackProps) {
super(scope, id, props);
/* ----------------------------------- VPC ---------------------------------- */
const subnetConfig: ec2.SubnetConfiguration[] = [
{
name: "test-public-subnet",
subnetType: SubnetType.PUBLIC,
},
{
name: "test-private-subnet",
subnetType: SubnetType.PRIVATE_ISOLATED,
},
];
this.vpc = new ec2.Vpc(this, "test-vpc", {
maxAzs: 1,
subnetConfiguration: subnetConfig,
gatewayEndpoints: {
S3: { service: ec2.GatewayVpcEndpointAwsService.S3 },
},
});
/* ------------------------- INTERFACE VPC ENDPOINT ------------------------- */
this.vpc.addInterfaceEndpoint("aws-ecr_docker-endpoint", {
service: ec2.InterfaceVpcEndpointAwsService.ECR_DOCKER,
privateDnsEnabled: true,
subnets: { subnetType: ec2.SubnetType.PRIVATE_ISOLATED },
open: true,
});
this.vpc.addInterfaceEndpoint("aws-ecr-endpoint", {
service: ec2.InterfaceVpcEndpointAwsService.ECR,
privateDnsEnabled: true,
subnets: { subnetType: ec2.SubnetType.PRIVATE_ISOLATED },
open: true,
});
/* --------------------------------- CLUSTER -------------------------------- */
this.cluster = new ecs.Cluster(this, "test-cluster", {
clusterName: "test-cluster",
vpc: this.vpc,
});
/* ------------------------------ DNS NAMESPACE ----------------------------- */
this.dnsNamespace = new servicediscovery.PrivateDnsNamespace(
this,
"DnsNamespace",
{
name: this.namespace,
vpc: this.vpc,
description: "Private DnsNamespace for test environment",
}
);
/* -------------------------------- TASK ROLE ------------------------------- */
//NODE
const backTaskrole = new iam.Role(this, "backTaskExecutionRole", {
assumedBy: new iam.ServicePrincipal("ecs-tasks.amazonaws.com"),
});
backTaskrole.addManagedPolicy(
iam.ManagedPolicy.fromAwsManagedPolicyName(
"service-role/AmazonECSTaskExecutionRolePolicy"
)
);
//JAVA
const pdfTaskRole = new iam.Role(this, "pdfTaskExecutionRole", {
assumedBy: new iam.ServicePrincipal("ecs-tasks.amazonaws.com"),
});
pdfTaskRole.addManagedPolicy(
iam.ManagedPolicy.fromAwsManagedPolicyName(
"service-role/AmazonECSTaskExecutionRolePolicy"
)
);
/* ---------------------------- TASKS DEFINITIONS ---------------------------- */
//NODE
const backFargateTaskDefinition = new ecs.FargateTaskDefinition(
this,
`${this.serviceBack}ServiceTaskDef`,
{
cpu: 256,
memoryLimitMiB: 512,
taskRole: backTaskrole,
}
);
//JAVA
const pdfFargateTaskDefinition = new ecs.FargateTaskDefinition(
this,
`${this.servicePdf}ServiceTaskDef`,
{
cpu: 256,
memoryLimitMiB: 512,
taskRole: pdfTaskRole,
}
);
/* ---------------------------- SERVICE LOG GROUP --------------------------- */
//NODE
const backServiceLogGroup = new logs.LogGroup(
this,
`${this.serviceBack}ServiceLogGroup`,
{
logGroupName: `/ecs/${this.serviceBack}Service`,
removalPolicy: RemovalPolicy.DESTROY,
}
);
//JAVA
const pdfServiceLogGroup = new logs.LogGroup(
this,
`${this.servicePdf}ServiceLogGroup`,
{
logGroupName: `/ecs/${this.servicePdf}Service`,
removalPolicy: RemovalPolicy.DESTROY,
}
);
/* ------------------------------- LOG DRIVER ------------------------------- */
//NODE
const backServiceLogDriver = new ecs.AwsLogDriver({
logGroup: backServiceLogGroup,
---
streamPrefix: `${this.serviceBack}Service`,
});
//JAVA
const pdfServiceLogDriver = new ecs.AwsLogDriver({
logGroup: pdfServiceLogGroup,
streamPrefix: `${this.servicePdf}Service`,
});
/* ---------------------------- REPOISTORY (ECR) ---------------------------- */
//NODE
const repoBackNode = Repository.fromRepositoryName(
this,
"back-nodejs",
"task-nodejs-test"
);
//JAVA
const repoBackJava = Repository.fromRepositoryName(
this,
"pdf-java",
"task-nodejs-test"
);
/* -------------------------------- CONTAINER ------------------------------- */
//NODE
const backServiceContainer = backFargateTaskDefinition.addContainer(
`${this.serviceBack}ServiceContainer`,
{
containerName: `${this.serviceBack}ServiceContainer`,
essential: true,
image: ecs.ContainerImage.fromEcrRepository(repoBackNode),
logging: backServiceLogDriver,
portMappings: [{ containerPort: 80 }],
environment: {
PORT: "80",
NAME: "back-nodejs",
PDF_HOST: `http://${this.servicePdf}.${this.namespace}:80`,
},
}
);
//JAVA
const pdfServiceContainer = pdfFargateTaskDefinition.addContainer(
`${this.servicePdf}ServiceContainer`,
{
containerName: `${this.servicePdf}ServiceContainer`,
image: ecs.ContainerImage.fromEcrRepository(repoBackJava),
logging: pdfServiceLogDriver,
portMappings: [{ containerPort: 80 }],
environment: {
PORT: "80",
NAME: "pdf-java",
PDF_HOST: this.servicePdf,
},
}
);
/* ----------------------------- SECURITY GROUP ----------------------------- */
//VPC
this.ingressSecurityGroup = new ec2.SecurityGroup(
this,
`IngressSecurityGroup`,
{
allowAllOutbound: true,
securityGroupName: `IngressSecurityGroup`,
vpc: this.vpc,
description: "IngressSecurityGroup for the VPC",
}
);
this.ingressSecurityGroup.connections.allowFromAnyIpv4(ec2.Port.tcp(80));
//NODE
const backServiceSecGroup = new ec2.SecurityGroup(
this,
`${this.serviceBack}ServiceSecurityGroup`,
{
allowAllOutbound: true,
securityGroupName: `${this.serviceBack}ServiceSecurityGroup`,
vpc: this.vpc,
description: "Security Group for the nodejs server",
}
);
backServiceSecGroup.connections.allowFrom(
new ec2.Connections({
securityGroups: [this.ingressSecurityGroup],
}),
ec2.Port.tcp(80),
"Allow traffic on port 80 from the VPC ingress security group"
);
//JAVA
const pdfServiceSecGroup = new ec2.SecurityGroup(
this,
`${this.servicePdf}ServiceSecurityGroup`,
{
allowAllOutbound: true,
securityGroupName: `${this.servicePdf}ServiceSecurityGroup`,
vpc: this.vpc,
description: "Security Group for the java pdf service",
}
);
pdfServiceSecGroup.connections.allowFrom(
new ec2.Connections({
securityGroups: [backServiceSecGroup],
}),
ec2.Port.tcp(80),
"Allow traffic on port 80 from the backService security group"
);
/* ----------------------------- FARGATE SERVICE ---------------------------- */
//NODE
const backFargateService = new ecs.FargateService(
this,
`${this.serviceBack}Service`,
{
cluster: this.cluster,
taskDefinition: backFargateTaskDefinition,
assignPublicIp: true,
desiredCount: 1,
securityGroups: [backServiceSecGroup],
cloudMapOptions: {
name: this.serviceBack,
cloudMapNamespace: this.dnsNamespace,
dnsRecordType: servicediscovery.DnsRecordType.A,
},
}
);
//JAVA
const pdfFargateService = new ecs.FargateService(
this,
`${this.servicePdf}Service`,
{
cluster: this.cluster,
taskDefinition: pdfFargateTaskDefinition,
desiredCount: 1,
securityGroups: [pdfServiceSecGroup],
cloudMapOptions: {
name: this.servicePdf,
cloudMapNamespace: this.dnsNamespace,
dnsRecordType: servicediscovery.DnsRecordType.A,
},
}
);
}
}
Where did i do wrong? Thanks in advance for your time
*i did a try with only public subnet, in order to verify service discovery was working

ECS Fargate Service with Multiple Target Groups. CDK

Using the AWS CDK2 I would like to create an ECS Cluster, ALB and multiple Fargate services. The Fargate services will have a task with two containers (beta, primary). Am I on the right track or would you suggest I do this differently?
I have had a few errors while trying different things. This is the latest message.
AWS::ElasticLoadBalancingV2::ListenerRule Validation exception
I could use some advice on what I have built thus far.
export const createALBStack = ({
app_name,
app_props,
domain_name,
scope,
vpc,
}: {
app_name: string;
app_props: cdk.StackProps;
domain_name: string;
scope: cdk.App;
vpc: ec2.IVpc;
}) => {
const stack = new cdk.Stack(scope, app_name + '-LOADBALANCER', app_props);
// create a security group that allows all traffic from the same security group
const security_group = new ec2.SecurityGroup(stack, app_name + '-SHARED-SG', {
allowAllOutbound: true,
vpc,
});
security_group.connections.allowFrom(security_group, ec2.Port.allTraffic());
// security_group.addIngressRule(ec2.Peer.anyIpv4(), ec2.Port.tcp(443), 'Allow HTTPS Traffic');
// security group for HTTP public access
const public_http_security_group = new ec2.SecurityGroup(stack, app_name + '-PUBLIC-HTTP-SG', {
allowAllOutbound: true,
vpc,
});
public_http_security_group.connections.allowFromAnyIpv4(ec2.Port.tcp(80));
// DNS
const zone = route53.HostedZone.fromLookup(stack, app_name + '-ALB53-ZONE', {
domainName: domain_name,
});
const domain_certificate_arn = `arn:aws:acm:${app_props?.env?.region}:${app_props?.env?.account}:certificate/${certificate_identifier}`;
const certificate = acm.Certificate.fromCertificateArn(
stack,
app_name + '-CERTIFICATE',
domain_certificate_arn,
);
const alb = new loadBalancerV2.ApplicationLoadBalancer(stack, app_name + '-ALB', {
internetFacing: true,
loadBalancerName: app_name + '-ALB',
securityGroup: public_http_security_group,
vpc,
});
const https_listener = alb.addListener(app_name + '-ALB_LISTENER', {
certificates: [loadBalancerV2.ListenerCertificate.fromArn(certificate.certificateArn)],
// defaultTargetGroups: [],
open: true,
port: 443,
});
https_listener.addAction(app_name + '-ALB_DEFAULT_RESPONSE', {
action: loadBalancerV2.ListenerAction.fixedResponse(404, {
messageBody: 'SEON DEVELOPMENT 404',
}),
});
createHTTPSRedirect(app_name + '-ALB_HTTTPSRedirect', stack, alb);
// Add a Route 53 alias with the Load Balancer as the target
new route53.ARecord(stack, app_name + `-ALIAS_RECORD`, {
recordName: app_name + `-ALIAS_RECORD`,
target: route53.RecordTarget.fromAlias(new route53targets.LoadBalancerTarget(alb)),
ttl: cdk.Duration.seconds(60),
zone,
});
new cdk.CfnOutput(stack, app_name + 'HTTP-LISTENER-ARN', {
exportName: app_name + 'HTTP-LISTENER-ARN',
value: https_listener.listenerArn,
});
return {
alb,
https_listener,
security_group,
zone,
};
};
ECS Stack
export const createECSServiceStack = ({
alb,
app_props,
cluster,
containers,
https_listener,
scope,
security_group,
service_name,
service_params,
sub_domain,
task_params,
vpc,
zone,
}: {
alb: loadBalancerV2.ApplicationLoadBalancer;
app_props: cdk.StackProps;
cluster: ecs.Cluster;
containers: TaskDefContainer[];
https_listener: loadBalancerV2.ApplicationListener;
scope: cdk.App;
security_group: ec2.SecurityGroup;
service_name: string;
service_params: ServiceParams;
sub_domain: string;
task_params: FargateTaskDefinitionProps;
vpc: ec2.IVpc;
zone: route53.IHostedZone;
}) => {
const stack = new cdk.Stack(scope, service_name, app_props);
const task_role = new iam.Role(stack, service_name + '-taskrole', {
assumedBy: new iam.ServicePrincipal('ecs-tasks.amazonaws.com'),
});
const task_definition = new ecs.FargateTaskDefinition(stack, service_name + '-TASKDEF', {
cpu: task_params.cpu,
family: service_name + '-TASKDEF',
memoryLimitMiB: task_params.memoryLimitMiB,
taskRole: task_role,
// runtimePlatform
});
const execution_role_policy = new iam.PolicyStatement({
actions: [
'ecr:GetAuthorizationToken',
'ecr:BatchCheckLayerAvailability',
'ecr:GetDownloadUrlForLayer',
'ecr:BatchGetImage',
'logs:CreateLogStream',
'logs:PutLogEvents',
'dynamodb:GetItem',
'dynamodb:UpdateItem',
'xray:PutTraceSegments',
],
effect: iam.Effect.ALLOW,
resources: ['*'],
});
task_definition.addToExecutionRolePolicy(execution_role_policy);
// can add more than one container to the task
const sourced_containers = containers.map(container => {
const containerPort = parseInt(container.environment.HOST_PORT);
const ecr_repo = sourceECR({ ecr_name: container.name + '-ecr', stack });
task_definition
.addContainer(container.name, {
environment: container.environment,
image: ecs.ContainerImage.fromEcrRepository(ecr_repo),
logging: new ecs.AwsLogDriver({ streamPrefix: container.name }),
})
.addPortMappings({
containerPort,
protocol: ecs.Protocol.TCP,
});
return {
...container,
ecr: ecr_repo,
};
});
const ecs_service = new ecs.FargateService(stack, service_name, {
assignPublicIp: true,
capacityProviderStrategies: [
{
base: 1,
capacityProvider: 'FARGATE',
weight: 1,
},
{
capacityProvider: 'FARGATE_SPOT',
weight: 1,
},
],
circuitBreaker: { rollback: true },
cluster,
desiredCount: service_params.desiredCount,
maxHealthyPercent: service_params.maxHealthyPercent,
minHealthyPercent: service_params.minHealthyPercent,
securityGroups: [security_group],
serviceName: service_name,
taskDefinition: task_definition,
});
sourced_containers.map((sourced_container, index) => {
const target_group = new loadBalancerV2.ApplicationTargetGroup(
stack,
sourced_container.name + '-tg',
{
deregistrationDelay: cdk.Duration.seconds(30),
healthCheck: {
healthyHttpCodes: '200,301,302',
healthyThresholdCount: 5,
interval: cdk.Duration.seconds(300),
path: sourced_container.health_check_url,
port: sourced_container.environment.HOST_PORT,
timeout: cdk.Duration.seconds(20),
unhealthyThresholdCount: 2,
},
port: 443,
protocol: loadBalancerV2.ApplicationProtocol.HTTPS,
stickinessCookieDuration: cdk.Duration.hours(1), // todo ?
targets: [
ecs_service.loadBalancerTarget({
containerName: sourced_container.name,
containerPort: parseInt(sourced_container.environment.HOST_PORT),
}),
],
vpc,
},
);
const target_rule = new loadBalancerV2.CfnListenerRule(
stack,
sourced_container.name + '-target-rule',
{
actions: [
{
targetGroupArn: target_group.targetGroupArn,
type: 'forward',
},
],
conditions: [
{
field: 'host-header',
pathPatternConfig: {
values: [sub_domain],
},
},
{
field: 'path-pattern',
pathPatternConfig: {
values: [sourced_container.url_path],
},
},
],
listenerArn: https_listener.listenerArn,
priority: service_params.priority + index,
},
);
});
const scaling = ecs_service.autoScaleTaskCount({ maxCapacity: 6 });
const cpu_utilization = ecs_service.metricCpuUtilization();
/*
* scale out when CPU utilization exceeds 50%
* increase scale out speed if CPU utilization exceeds 70%
* scale in again when CPU utilization falls below 10%.
*/
scaling.scaleOnMetric(service_name + '-ASCALE_CPU', {
adjustmentType: aws_applicationautoscaling.AdjustmentType.CHANGE_IN_CAPACITY,
metric: cpu_utilization,
scalingSteps: [
{ change: -1, upper: 10 },
{ change: +1, lower: 50 },
{ change: +3, lower: 70 },
],
});
const pipeline = configurePipeline({
cluster,
service: ecs_service,
service_name,
sourced_containers,
stack,
});
new route53.ARecord(stack, service_name + `-ALIAS_RECORD_API`, {
recordName: sub_domain,
target: route53.RecordTarget.fromAlias(new route53targets.LoadBalancerTarget(alb)),
zone,
});
return {
ecs_service,
};
};
I suggest checking CDK Construct library for higher-level ECS Constructs for working code samples. They also have many TypeScript Examples. You might want to check the fargate-application-load-balanced-service sample code as a starting point.
Please point to any issues with a minimal reproducible example. Because now, it is almost impossible to understand what is wrong with your code. And why do you write need so much code instead of using ECS patterns?

Remove unnecessary AWS resources, VPC + NAT gateway

I recently set up an application on AWS via CDK. The application consists of a Dockerized nodejs application, which connects to an RDS instance, and has a Redis caching layer as well. After having the application deployed for a few days, the costs are much higher than I had anticipated, even with minimal traffic. After looking through the cost explorer, it looks like half of the cost is coming from the NAT gateways.
In my current setup, I have created two VPCs. One is used for the application stack, and the other is for the CodePipeline. I needed to add one for the pipeline because without it I was hitting rate limits when trying to pull Docker images during the CodeBuildAction steps.
I'm not very comfortable with the networking bits, but I feel like there are extra resources involved. The pipeline VPC has three NAT gateways and three EIPs. These end up just sitting there waiting for the next deployment, which seems like a huge waste. It seems like a new gateway + EIP is allocated for each construct the VPC is attached to in CDK. Can I just make it reuse the same one? Is there an alternative to adding a VPC at all and not getting rate limited by Docker?
I also find it very surprising (I might just be naive) that the NAT gateway is so far equally as expensive as my current Fargate task costs. Is there an alternative that would serve my purposes, but come at a little lower cost?
Anyways, here are my two stacks:
// pipeline-stack.ts
import { SecretValue, Stack, StackProps } from "aws-cdk-lib";
import { Construct } from "constructs";
import { Artifact, IStage, Pipeline } from "aws-cdk-lib/aws-codepipeline";
import {
CloudFormationCreateUpdateStackAction,
CodeBuildAction,
CodeBuildActionType,
GitHubSourceAction,
} from "aws-cdk-lib/aws-codepipeline-actions";
import {
BuildEnvironmentVariableType,
BuildSpec,
LinuxBuildImage,
PipelineProject,
} from "aws-cdk-lib/aws-codebuild";
import { SnsTopic } from "aws-cdk-lib/aws-events-targets";
import { Topic } from "aws-cdk-lib/aws-sns";
import { EventField, RuleTargetInput } from "aws-cdk-lib/aws-events";
import { EmailSubscription, SmsSubscription } from "aws-cdk-lib/aws-sns-subscriptions";
import ApiStack from "./stacks/api-stack";
import { ManagedPolicy, Role, ServicePrincipal } from "aws-cdk-lib/aws-iam";
import { SecurityGroup, SubnetType, Vpc } from "aws-cdk-lib/aws-ec2";
import { Secret } from "aws-cdk-lib/aws-ecs";
import { BuildEnvironmentVariable } from "aws-cdk-lib/aws-codebuild/lib/project";
import * as SecretsManager from "aws-cdk-lib/aws-secretsmanager";
import { getApplicationEnvironment, getApplicationSecrets } from "./secrets-helper";
const capFirst = (str: string): string => {
return str.charAt(0).toUpperCase() + str.slice(1);
};
interface PipelineStackProps extends StackProps {
environment: string;
emailAddress: string;
phoneNumber: string;
branch: string;
secrets: {
arn: string;
};
repo: {
uri: string;
name: string;
};
}
export class PipelineStack extends Stack {
private readonly envName: string;
private readonly pipeline: Pipeline;
// source outputs
private cdkSourceOutput: Artifact;
private applicationSourceOutput: Artifact;
// code source actions
private cdkSourceAction: GitHubSourceAction;
private applicationSourceAction: GitHubSourceAction;
// build outputs
private cdkBuildOutput: Artifact;
private applicationBuildOutput: Artifact;
// notifications
private pipelineNotificationsTopic: Topic;
private readonly codeBuildVpc: Vpc;
private readonly codeBuildSecurityGroup: SecurityGroup;
private readonly secrets: SecretsManager.ISecret;
private readonly ecrCodeBuildRole: Role;
// stages
private sourceStage: IStage;
private selfMutateStage: IStage;
private buildStage: IStage;
private apiTestsStage: IStage;
constructor(scope: Construct, id: string, props: PipelineStackProps) {
super(scope, id, props);
this.envName = props.environment;
this.addNotifications(props);
this.ecrCodeBuildRole = new Role(this, "application-build-project-role", {
assumedBy: new ServicePrincipal("codebuild.amazonaws.com"),
managedPolicies: [
ManagedPolicy.fromAwsManagedPolicyName("AmazonEC2ContainerRegistryPowerUser"),
],
});
this.codeBuildVpc = new Vpc(this, "codebuild-vpc", {
vpcName: "codebuild-vpc",
enableDnsSupport: true,
});
this.codeBuildSecurityGroup = new SecurityGroup(this, "codebuild-vpc-security-group", {
vpc: this.codeBuildVpc,
allowAllOutbound: true,
});
this.secrets = SecretsManager.Secret.fromSecretCompleteArn(this, "secrets", props.secrets.arn);
this.pipeline = new Pipeline(this, "pipeline", {
pipelineName: `${capFirst(this.envName)}Pipeline`,
crossAccountKeys: false,
restartExecutionOnUpdate: true,
});
// STAGE 1 - Source Stage
this.addSourceStage(props);
// STAGE 2 - Build Stage
this.addBuildStage(props);
// STAGE 3: SelfMutate Stage
this.addSelfMutateStage();
// STAGE 4: Testing
this.addTestStage();
}
addNotifications(props: PipelineStackProps) {
this.pipelineNotificationsTopic = new Topic(this, "pipeline-notifications-topic", {
topicName: `PipelineNotifications${capFirst(props.environment)}`,
});
this.pipelineNotificationsTopic.addSubscription(new EmailSubscription(props.emailAddress));
this.pipelineNotificationsTopic.addSubscription(new SmsSubscription(props.phoneNumber));
}
/**
* Stage 1
*/
addSourceStage(props: PipelineStackProps) {
this.cdkSourceOutput = new Artifact("cdk-source-output");
this.cdkSourceAction = new GitHubSourceAction({
actionName: "CdkSource",
owner: "my-org",
repo: "my-cdk-repo",
branch: "main",
oauthToken: SecretValue.secretsManager("/connections/github/access-token"),
output: this.cdkSourceOutput,
});
this.applicationSourceOutput = new Artifact("ApplicationSourceOutput");
this.applicationSourceAction = new GitHubSourceAction({
actionName: "ApplicationSource",
owner: "my-org",
repo: "my-application-repo",
branch: props.branch,
oauthToken: SecretValue.secretsManager("/connections/github/access-token"),
output: this.applicationSourceOutput,
});
this.sourceStage = this.pipeline.addStage({
stageName: "Source",
actions: [this.cdkSourceAction, this.applicationSourceAction],
});
}
/**
* stage 2
*/
addBuildStage(props: PipelineStackProps) {
const cdkBuildAction = this.createCdkBuildAction();
const applicationBuildAction = this.createApplicationBuildAction(props);
this.buildStage = this.pipeline.addStage({
stageName: "Build",
actions: [cdkBuildAction, applicationBuildAction],
});
}
/**
* stage 3
*/
addSelfMutateStage() {
this.selfMutateStage = this.pipeline.addStage({
stageName: "PipelineUpdate",
actions: [
new CloudFormationCreateUpdateStackAction({
actionName: "PipelineCreateUpdateStackAction",
stackName: this.stackName,
templatePath: this.cdkBuildOutput.atPath(`${this.stackName}.template.json`),
adminPermissions: true,
}),
],
});
}
/**
* stage 4
*/
addTestStage() {
const testAction = new CodeBuildAction({
actionName: "RunApiTests",
type: CodeBuildActionType.TEST,
input: this.applicationSourceOutput,
project: new PipelineProject(this, "api-tests-project", {
vpc: this.codeBuildVpc,
securityGroups: [this.codeBuildSecurityGroup],
environment: {
buildImage: LinuxBuildImage.STANDARD_5_0,
privileged: true,
},
buildSpec: BuildSpec.fromObject({
version: "0.2",
phases: {
install: {
commands: ["cp .env.testing .env"],
},
build: {
commands: [
"ls",
"docker-compose -f docker-compose.staging.yml run -e NODE_ENV=testing --rm api node ace test",
],
},
},
}),
}),
runOrder: 1,
});
this.apiTestsStage = this.pipeline.addStage({
stageName: "RunApiTests",
actions: [testAction],
});
}
createCdkBuildAction() {
this.cdkBuildOutput = new Artifact("CdkBuildOutput");
return new CodeBuildAction({
actionName: "CdkBuildAction",
input: this.cdkSourceOutput,
outputs: [this.cdkBuildOutput],
project: new PipelineProject(this, "cdk-build-project", {
environment: {
buildImage: LinuxBuildImage.STANDARD_5_0,
},
buildSpec: BuildSpec.fromSourceFilename("build-specs/cdk-build-spec.yml"),
}),
});
}
createApplicationBuildAction(props: PipelineStackProps) {
this.applicationBuildOutput = new Artifact("ApplicationBuildOutput");
const project = new PipelineProject(this, "application-build-project", {
vpc: this.codeBuildVpc,
securityGroups: [this.codeBuildSecurityGroup],
environment: {
buildImage: LinuxBuildImage.STANDARD_5_0,
privileged: true,
},
environmentVariables: {
ENV: {
value: this.envName,
},
ECR_REPO_URI: {
value: props.repo.uri,
},
ECR_REPO_NAME: {
value: props.repo.name,
},
AWS_REGION: {
value: props.env!.region,
},
},
buildSpec: BuildSpec.fromObject({
version: "0.2",
phases: {
pre_build: {
commands: [
"echo 'Logging into Amazon ECR...'",
"aws ecr get-login-password --region $AWS_REGION | docker login --username AWS --password-stdin $ECR_REPO_URI",
'COMMIT_HASH=$(echo "$CODEBUILD_RESOLVED_SOURCE_VERSION" | head -c 8)',
],
},
build: {
commands: ["docker build -t $ECR_REPO_NAME:latest ."],
},
post_build: {
commands: [
"docker tag $ECR_REPO_NAME:latest $ECR_REPO_URI/$ECR_REPO_NAME:latest",
"docker tag $ECR_REPO_NAME:latest $ECR_REPO_URI/$ECR_REPO_NAME:$ENV-$COMMIT_HASH",
"docker push $ECR_REPO_URI/$ECR_REPO_NAME:latest",
"docker push $ECR_REPO_URI/$ECR_REPO_NAME:$ENV-$COMMIT_HASH",
],
},
},
}),
role: this.ecrCodeBuildRole,
});
return new CodeBuildAction({
actionName: "ApplicationBuildAction",
input: this.applicationSourceOutput,
outputs: [this.applicationBuildOutput],
project: project,
});
}
public addDatabaseMigrationStage(apiStack: ApiStack, stageName: string): IStage {
let buildEnv: { [name: string]: BuildEnvironmentVariable } = {
ENV: {
value: this.envName,
},
ECR_REPO_URI: {
type: BuildEnvironmentVariableType.PLAINTEXT,
value: apiStack.repoUri,
},
ECR_REPO_NAME: {
type: BuildEnvironmentVariableType.PLAINTEXT,
value: apiStack.repoName,
},
AWS_REGION: {
type: BuildEnvironmentVariableType.PLAINTEXT,
value: this.region,
},
};
buildEnv = this.getBuildEnvAppSecrets(getApplicationSecrets(this.secrets), buildEnv);
buildEnv = this.getBuildEnvAppEnvVars(
getApplicationEnvironment({
REDIS_HOST: apiStack.redisHost.importValue,
REDIS_PORT: apiStack.redisPort.importValue,
}),
buildEnv,
);
let envVarNames = Object.keys(buildEnv);
const envFileCommand = `printenv | grep '${envVarNames.join("\\|")}' >> .env`;
return this.pipeline.addStage({
stageName: stageName,
actions: [
new CodeBuildAction({
actionName: "DatabaseMigrations",
input: this.applicationSourceOutput,
project: new PipelineProject(this, "database-migrations-project", {
description: "Run database migrations against RDS database",
environment: {
buildImage: LinuxBuildImage.STANDARD_5_0,
privileged: true,
},
environmentVariables: buildEnv,
buildSpec: BuildSpec.fromObject({
version: "0.2",
phases: {
pre_build: {
commands: [
"echo 'Logging into Amazon ECR...'",
"aws ecr get-login-password --region $AWS_REGION | docker login --username AWS --password-stdin $ECR_REPO_URI",
'COMMIT_HASH=$(echo "$CODEBUILD_RESOLVED_SOURCE_VERSION" | head -c 8)',
envFileCommand,
"cat .env",
],
},
build: {
commands: [
`docker run --env-file .env --name api $ECR_REPO_URI/$ECR_REPO_NAME:$ENV-$COMMIT_HASH node ace migration:run --force`,
": > .env",
],
},
},
}),
role: this.ecrCodeBuildRole,
}),
}),
],
});
}
private getBuildEnvAppSecrets(
secrets: { [key: string]: Secret },
buildEnv: { [name: string]: BuildEnvironmentVariable },
): { [name: string]: BuildEnvironmentVariable } {
for (let key in secrets) {
buildEnv[key] = {
type: BuildEnvironmentVariableType.SECRETS_MANAGER,
value: `${this.secrets.secretArn}:${key}`,
};
}
return buildEnv;
}
private getBuildEnvAppEnvVars(
vars: { [key: string]: string },
buildEnv: { [name: string]: BuildEnvironmentVariable },
): { [name: string]: BuildEnvironmentVariable } {
for (let key in vars) {
buildEnv[key] = {
value: vars[key],
};
}
return buildEnv;
}
public addApplicationStage(apiStack: ApiStack, stageName: string): IStage {
return this.pipeline.addStage({
stageName: stageName,
actions: [
new CloudFormationCreateUpdateStackAction({
actionName: "ApplicationUpdate",
stackName: apiStack.stackName,
templatePath: this.cdkBuildOutput.atPath(`${apiStack.stackName}.template.json`),
adminPermissions: true,
}),
],
});
}
}
// api-stack.ts
import { CfnOutput, CfnResource, Lazy, Stack, StackProps } from "aws-cdk-lib";
import * as EC2 from "aws-cdk-lib/aws-ec2";
import { ISubnet } from "aws-cdk-lib/aws-ec2";
import * as ECS from "aws-cdk-lib/aws-ecs";
import { DeploymentControllerType, ScalableTaskCount } from "aws-cdk-lib/aws-ecs";
import * as EcsPatterns from "aws-cdk-lib/aws-ecs-patterns";
import * as RDS from "aws-cdk-lib/aws-rds";
import { Credentials } from "aws-cdk-lib/aws-rds";
import * as Route53 from "aws-cdk-lib/aws-route53";
import * as Route53Targets from "aws-cdk-lib/aws-route53-targets";
import * as ECR from "aws-cdk-lib/aws-ecr";
import * as CertificateManager from "aws-cdk-lib/aws-certificatemanager";
import * as SecretsManager from "aws-cdk-lib/aws-secretsmanager";
import * as ElasticCache from "aws-cdk-lib/aws-elasticache";
import { Construct } from "constructs";
import { getApplicationEnvironment, getApplicationSecrets } from "../secrets-helper";
export type ApiStackProps = StackProps & {
environment: string;
hostedZone: {
id: string;
name: string;
};
domainName: string;
scaling: {
desiredCount: number;
maxCount: number;
cpuPercentage: number;
memoryPercentage: number;
};
repository: {
uri: string;
arn: string;
name: string;
};
secrets: { arn: string };
};
export default class ApiStack extends Stack {
vpc: EC2.Vpc;
cluster: ECS.Cluster;
ecsService: EcsPatterns.ApplicationLoadBalancedFargateService;
certificate: CertificateManager.ICertificate;
repository: ECR.IRepository;
database: RDS.IDatabaseInstance;
databaseCredentials: Credentials;
hostedZone: Route53.IHostedZone;
aliasRecord: Route53.ARecord;
redis: ElasticCache.CfnReplicationGroup;
repoUri: string;
repoName: string;
applicationEnvVariables: {
[key: string]: string;
};
redisHost: CfnOutput;
redisPort: CfnOutput;
gatewayUrl: CfnOutput;
constructor(scope: Construct, id: string, props: ApiStackProps) {
super(scope, id, props);
this.repoUri = props.repository.uri;
this.repoName = props.repository.name;
this.setUpVpc(props);
this.setUpRedisCluster(props);
this.setUpDatabase(props);
this.setUpCluster(props);
this.setUpHostedZone(props);
this.setUpCertificate(props);
this.setUpRepository(props);
this.setUpEcsService(props);
this.setUpAliasRecord(props);
}
private resourceName(props: ApiStackProps, resourceType: string): string {
return `twibs-api-${resourceType}-${props.environment}`;
}
private setUpVpc(props: ApiStackProps) {
this.vpc = new EC2.Vpc(this, this.resourceName(props, "vpc"), {
maxAzs: 3, // Default is all AZs in region
});
}
private setUpRedisCluster(props: ApiStackProps) {
const subnetGroup = new ElasticCache.CfnSubnetGroup(this, "cache-subnet-group", {
cacheSubnetGroupName: "redis-cache-subnet-group",
subnetIds: this.vpc.privateSubnets.map((subnet: ISubnet) => subnet.subnetId),
description: "Subnet group for Redis Cache cluster",
});
const securityGroup = new EC2.SecurityGroup(this, "redis-security-group", {
vpc: this.vpc,
description: `SecurityGroup associated with RedisDB Cluster - ${props.environment}`,
allowAllOutbound: false,
});
securityGroup.addIngressRule(
EC2.Peer.ipv4(this.vpc.vpcCidrBlock),
EC2.Port.tcp(6379),
"Allow from VPC on port 6379",
);
this.redis = new ElasticCache.CfnReplicationGroup(this, "redis", {
numNodeGroups: 1,
cacheNodeType: "cache.t2.small",
engine: "redis",
multiAzEnabled: false,
autoMinorVersionUpgrade: false,
cacheParameterGroupName: "default.redis6.x.cluster.on",
engineVersion: "6.x",
cacheSubnetGroupName: subnetGroup.ref,
securityGroupIds: [securityGroup.securityGroupId],
replicationGroupDescription: "RedisDB setup by CDK",
replicasPerNodeGroup: 0,
port: 6379,
});
}
private setUpDatabase(props: ApiStackProps) {
if (["production", "staging", "develop"].includes(props.environment)) {
return;
}
this.databaseCredentials = Credentials.fromUsername("my_db_username");
this.database = new RDS.DatabaseInstance(this, "database", {
vpc: this.vpc,
engine: RDS.DatabaseInstanceEngine.postgres({
version: RDS.PostgresEngineVersion.VER_13_4,
}),
credentials: this.databaseCredentials,
databaseName: `my_app_${props.environment}`,
deletionProtection: true,
});
}
private setUpCluster(props: ApiStackProps) {
this.cluster = new ECS.Cluster(this, this.resourceName(props, "cluster"), {
vpc: this.vpc,
capacity: {
instanceType: EC2.InstanceType.of(EC2.InstanceClass.T3, EC2.InstanceSize.`SMALL`),
},
});
}
private setUpHostedZone(props: ApiStackProps) {
this.hostedZone = Route53.HostedZone.fromHostedZoneAttributes(
this,
this.resourceName(props, "hosted-zone"),
{
hostedZoneId: props.hostedZone.id,
zoneName: props.hostedZone.name,
},
);
}
private setUpCertificate(props: ApiStackProps) {
this.certificate = new CertificateManager.Certificate(this, "certificate", {
domainName: props.domainName,
validation: CertificateManager.CertificateValidation.fromDns(this.hostedZone),
});
}
private setUpRepository(props: ApiStackProps) {
this.repository = ECR.Repository.fromRepositoryAttributes(
this,
this.resourceName(props, "repository"),
{
repositoryArn: props.repository.arn,
repositoryName: props.repository.name,
},
);
}
private setUpEcsService(props: ApiStackProps) {
const secrets = SecretsManager.Secret.fromSecretCompleteArn(this, "secrets", props.secrets.arn);
this.redisHost = new CfnOutput(this, "redis-host-output", {
value: this.redis.attrConfigurationEndPointAddress,
exportName: "redis-host-output",
});
this.redisPort = new CfnOutput(this, "redis-port-output", {
value: this.redis.attrConfigurationEndPointPort,
exportName: "redis-port-output",
});
// Create a load-balanced ecs-service service and make it public
this.ecsService = new EcsPatterns.ApplicationLoadBalancedFargateService(
this,
this.resourceName(props, "ecs-service"),
{
serviceName: `${props.environment}-api-service`,
cluster: this.cluster, // Required
cpu: 256, // Default is 256
desiredCount: props.scaling.desiredCount, // Default is 1
taskImageOptions: {
image: ECS.ContainerImage.fromEcrRepository(this.repository),
environment: getApplicationEnvironment({
REDIS_HOST: this.redis.attrConfigurationEndPointAddress,
REDIS_PORT: this.redis.attrConfigurationEndPointPort,
}),
secrets: getApplicationSecrets(secrets),
},
memoryLimitMiB: 512, // Default is 512
publicLoadBalancer: true, // Default is false
domainZone: this.hostedZone,
certificate: this.certificate,
},
);
const scalableTarget = this.ecsService.service.autoScaleTaskCount({
minCapacity: props.scaling.desiredCount,
maxCapacity: props.scaling.maxCount,
});
scalableTarget.scaleOnCpuUtilization("cpu-scaling", {
targetUtilizationPercent: props.scaling.cpuPercentage,
});
scalableTarget.scaleOnMemoryUtilization("memory-scaling", {
targetUtilizationPercent: props.scaling.memoryPercentage,
});
secrets.grantRead(this.ecsService.taskDefinition.taskRole);
}
private setUpAliasRecord(props: ApiStackProps) {
this.gatewayUrl = new CfnOutput(this, "gateway-url-output", {
value: this.ecsService.loadBalancer.loadBalancerDnsName,
});
this.aliasRecord = new Route53.ARecord(this, "alias-record", {
zone: this.hostedZone,
recordName: props.domainName,
target: Route53.RecordTarget.fromAlias(
new Route53Targets.LoadBalancerTarget(this.ecsService.loadBalancer),
),
});
const shouldCreateWWW = props.domainName.split(".").length === 2;
if (shouldCreateWWW) {
new Route53.ARecord(this, "alias-record-www", {
zone: this.hostedZone,
recordName: `www.${props.domainName}`,
target: Route53.RecordTarget.fromAlias(
new Route53Targets.LoadBalancerTarget(this.ecsService.loadBalancer),
),
});
}
}
}
Any advice is greatly appreciated.
I would strongly advise moving from the Docker directory to ECR public gallery to avoid ratelimit issues: https://gallery.ecr.aws/
That said, to answer the question about the number of NATs created. As you can see in the CDK docs, what you're seeing reflects the default behavior (emphasis mine):
A VPC consists of one or more subnets that instances can be placed
into. CDK distinguishes three different subnet types:
Public (SubnetType.PUBLIC) - public subnets connect directly to the Internet using an Internet Gateway. If you want your instances to
have a public IP address and be directly reachable from the Internet,
you must place them in a public subnet.
Private with Internet Access (SubnetType.PRIVATE_WITH_NAT) - instances in private subnets are not directly routable from the
Internet, and connect out to the Internet via a NAT gateway. By
default, a NAT gateway is created in every public subnet for maximum
availability. Be aware that you will be charged for NAT gateways.
Isolated (SubnetType.PRIVATE_ISOLATED) - isolated subnets do not route from or to the Internet, and as such do not require NAT
gateways. They can only connect to or be connected to from other
instances in the same VPC. A default VPC configuration will not
include isolated subnets,
A default VPC configuration will create public and private subnets.
However, if natGateways:0 and subnetConfiguration is undefined,
default VPC configuration will create public and isolated subnets.
So a separate NAT is created for every Public subnet.
Also, the docs for the natGateways parameter mentioned above also describe the default behavior:
(default: One NAT gateway/instance per Availability Zone)
To limit the number of AZs used by the VPC, specify the maxAzs parameter. Set it to 1 to only have a single NAT per VPC.
If you're fine with making the resources in the VPC publicly reachable from the internet, you can place them in Public subnets and avoid the creation of NATs altogether.
this.vpc = new EC2.Vpc(this, this.resourceName(props, "vpc"), {
maxAzs: 1,
natGateways: 0;
});
If you do this, you have to tell your resources to use the public subnet instead of the isolated one.
However, CodeBuild projects do not support this.
They require a NAT to connect to the internet if placed into a VPC. See this question for details.
So if you want your build project to be in a VPC, you need to place it into a private subnet. This is done by default, so no additional configuration needed. Just make sure you have at least one NAT gateway.
To sum up, the real solution to the Docker Hub rate limit issue is to switch over to ECR Public gallery.

How to handle failures in AWS cloudformation CUSTOM resources?

I Have a lambda function created using CFN which looks like this:
InitializeDynamoDBLambda:
Type: AWS::Lambda::Function
Properties:
Code:
ZipFile: |
const AWS = require("aws-sdk");
const response = require("cfn-response");
const docClient = new AWS.DynamoDB.DocumentClient();
exports.handler = function(event, context) {
let DynamoTableName = event.ResourceProperties.DynamoTable;
let KeyJSON = JSON.parse(event.ResourceProperties.KeyJSON);
let ValueJSON = JSON.parse(event.ResourceProperties.ValueJSON);
for(key in KeyJSON){
var params = {
TableName: DynamoTableName,
Item: {
'Key': KeyJSON[key],
'Value': JSON.stringify(ValueJSON[key])
}
};
docClient.put(params, function(err, data) {
if (err) {
console.log(err);
response.send(event, context, response.FAILED, {});
}
else {
response.send(event, context, response.SUCCESS, {});
}
});
}
};
Handler: index.handler
Role: !GetAtt 'LambdaExecutionRole.Arn'
Runtime: nodejs12.x
Timeout: 60
This Lambda is initialized using a CUSTOM resource like this:
InitializeDB:
Type: Custom::InitializeDynamoDBLambda
Properties:
ServiceToken:
Fn::GetAtt: [ InitializeDynamoDBLambda , "Arn" ]
DynamoTable: !Ref TenantLevelDBname
KeyJSON: !Ref KeyJSON
ValueJSON: !Ref ValueJSON
The problem is when there is an error, the Cloudformation stack gets stuck in a state like UPDATE_IN_PROGRESS etc.
How do I handle failures in such scenarios?