ECS task unable to pull secrets or registry auth - amazon-web-services

I have a CDK project that creates a CodePipeline which deploys an application on ECS. I had it all previously working, but the VPC was using a NAT gateway, which ended up being too expensive. So now I am trying to recreate the project without requiring a NAT gateway. I am almost there, but I have now run into issues when the ECS service is trying to start tasks. All tasks fail to start with the following error:
ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve secret from asm: service call has been retried 5 time(s): failed to fetch secret
At this point I've kind of lost track of the different things I have tried, but I will post the relevant bits here as well as some of my attempts.
const repository = ECR.Repository.fromRepositoryAttributes(
this,
"ecr-repository",
{
repositoryArn: props.repository.arn,
repositoryName: props.repository.name,
}
);
// vpc
const vpc = new EC2.Vpc(this, this.resourceName(props, "vpc"), {
maxAzs: 2,
natGateways: 0,
enableDnsSupport: true,
});
const vpcSecurityGroup = new SecurityGroup(this, "vpc-security-group", {
vpc: vpc,
allowAllOutbound: true,
});
// tried this to allow the task to access secrets manager
const vpcEndpoint = new EC2.InterfaceVpcEndpoint(this, "secrets-manager-task-vpc-endpoint", {
vpc: vpc,
service: EC2.InterfaceVpcEndpointAwsService.SSM,
});
const secrets = SecretsManager.Secret.fromSecretCompleteArn(
this,
"secrets",
props.secrets.arn
);
const cluster = new ECS.Cluster(this, this.resourceName(props, "cluster"), {
vpc: vpc,
clusterName: `api-cluster`,
});
const ecsService = new EcsPatterns.ApplicationLoadBalancedFargateService(
this,
"ecs-service",
{
taskSubnets: {
subnetType: SubnetType.PUBLIC,
},
securityGroups: [vpcSecurityGroup],
serviceName: "api-service",
cluster: cluster,
cpu: 256,
desiredCount: props.scaling.desiredCount,
taskImageOptions: {
image: ECS.ContainerImage.fromEcrRepository(
repository,
this.ecrTagNameParameter.stringValue
),
secrets: getApplicationSecrets(secrets), // returns
logDriver: LogDriver.awsLogs({
streamPrefix: "api",
logGroup: new LogGroup(this, "ecs-task-log-group", {
logGroupName: `${props.environment}-api`,
}),
logRetention: RetentionDays.TWO_MONTHS,
}),
},
memoryLimitMiB: 512,
publicLoadBalancer: true,
domainZone: this.hostedZone,
certificate: this.certificate,
redirectHTTP: true,
}
);
const scalableTarget = ecsService.service.autoScaleTaskCount({
minCapacity: props.scaling.desiredCount,
maxCapacity: props.scaling.maxCount,
});
scalableTarget.scaleOnCpuUtilization("cpu-scaling", {
targetUtilizationPercent: props.scaling.cpuPercentage,
});
scalableTarget.scaleOnMemoryUtilization("memory-scaling", {
targetUtilizationPercent: props.scaling.memoryPercentage,
});
secrets.grantRead(ecsService.taskDefinition.taskRole);
repository.grantPull(ecsService.taskDefinition.taskRole);
I read somewhere that it probably has something to do with Fargate version 1.4.0 vs 1.3.0, but I'm not sure what I need to change to allow the tasks to access what they need to run.

You need to create an interface endpoints for Secrets Manager, ECR (two types of endpoints), CloudWatch, as well as a gateway endpoint for S3.
Refer to the documentation on the topic.
Here's an example in Python, it'd work the same in TS:
vpc.add_interface_endpoint(
"secretsmanager_endpoint",
service=ec2.InterfaceVpcEndpointAwsService.SECRETS_MANAGER,
)
vpc.add_interface_endpoint(
"ecr_docker_endpoint",
service=ec2.InterfaceVpcEndpointAwsService.ECR_DOCKER,
)
vpc.add_interface_endpoint(
"ecr_endpoint",
service=ec2.InterfaceVpcEndpointAwsService.ECR,
)
vpc.add_interface_endpoint(
"cloudwatch_logs_endpoint",
service=ec2.InterfaceVpcEndpointAwsService.CLOUDWATCH_LOGS,
)
vpc.add_gateway_endpoint(
"s3_endpoint",
service=ec2.GatewayVpcEndpointAwsService.S3
)
Keep in mind that interface endpoints cost money as well, and may not be cheaper than a NAT.

Related

Fargate Service Cannot Write to DynamoDB Table (CDK)

I'm working through setting up a new infrastructure with the AWS CDK and I'm trying to get a TypeScript app running in Fargate to be able to read/write from/to a DynamoDB table, but am hitting IAM issues.
I have both my Fargate service and my DynamoDB Table defined, and both are running as they should be in AWS, but whenever I attempt to write to the table from my app, I am getting an access denied error.
I've tried the solutions defined in this post, as well as the ones it links to, but nothing seems to be allowing my container to write to the table. I've tried everything from setting table.grantReadWriteData(fargateService.taskDefinition.taskRole) to the more complex solutions described in the linked articles of defining my own IAM policies and setting the effects and actions, but I always just get the same access denied error when attempting to do a putItem:
AccessDeniedException: User: {fargate-service-arn} is not authorized to perform: dynamodb:PutItem on resource: {dynamodb-table} because no identity-based policy allows the dynamodb:PutItem action
Am I missing something, or a crucial step to make this possible?
Any help is greatly appreciated.
Thanks!
Edit (2022-09-19):
Here is the boiled down code for how I'm defining my Vpc, Cluster, Container Image, FargateService, and Table.
export class FooCdkStack extends cdk.Stack {
constructor(scope: Construct, id: string, props?: cdk.StackProps) {
super(scope, id, props);
const vpc = new Vpc(this, 'FooVpc', {
maxAzs: 2,
natGateways: 1
});
const cluster = new Cluster(this, 'FooCluster', { vpc });
const containerImage = ContainerImage.fromAsset(
path.join(__dirname, '/../app'),
{
platform: Platform.LINUX_AMD64 // I'm on an M1 Mac and images weren't working appropriately without this
}
);
const fargateService = new ApplicationLoadBalancedFargateService(
this,
'FooFargateService',
{
assignPublicIp: true,
cluster,
memoryLimitMiB: 1024,
cpu: 512,
desiredCount: 1,
taskImageOptions: {
containerPort: PORT,
image: containerImage
}
}
);
fargateService.targetGroup.configureHealthCheck({ path: '/health' });
const serverTable = new Table(this, 'FooTable', {
billingMode: BillingMode.PAY_PER_REQUEST,
removalPolicy: cdk.RemovalPolicy.DESTROY,
partitionKey: { name: 'id', type: AttributeType.STRING },
pointInTimeRecovery: true
});
serverTable.grantReadWriteData(fargateService.taskDefinition.taskRole);
}
}
Apparently either order in which the resources are defined matters, or the inclusion of a property from the table in the Fargate service is what did the trick. I moved the table definition up above the Fargate service and included an environment variable holding the table name and it's working as intended now.
export class FooCdkStack extends cdk.Stack {
constructor(scope: Construct, id: string, props?: cdk.StackProps) {
super(scope, id, props);
const vpc = new Vpc(this, 'FooVpc', {
maxAzs: 2,
natGateways: 1
});
const cluster = new Cluster(this, 'FooCluster', { vpc });
const containerImage = ContainerImage.fromAsset(
path.join(__dirname, '/../app'),
{
platform: Platform.LINUX_AMD64 // I'm on an M1 Mac and images weren't working appropriately without this
}
);
const serverTable = new Table(this, 'FooTable', {
billingMode: BillingMode.PAY_PER_REQUEST,
removalPolicy: cdk.RemovalPolicy.DESTROY,
partitionKey: { name: 'id', type: AttributeType.STRING },
pointInTimeRecovery: true
});
const fargateService = new ApplicationLoadBalancedFargateService(
this,
'FooFargateService',
{
assignPublicIp: true,
cluster,
memoryLimitMiB: 1024,
cpu: 512,
desiredCount: 1,
taskImageOptions: {
containerPort: PORT,
image: containerImage,
environment: {
SERVER_TABLE_NAME: serverTable.tableName
}
}
}
);
fargateService.targetGroup.configureHealthCheck({ path: '/health' });
serverTable.grantReadWriteData(fargateService.taskDefinition.taskRole);
}
}
Hopefully this helps someone in the future who may come across the same issue.

How do I enable deletion protection for load balancer using ApplicationLoadBalancedFargateService cdk construct

I have created a Fargate service running on an ECS cluster fronted by an application load balancer using the ApplicationLoadBalancedFargateService CDK construct.
cluster,
memoryLimitMiB: 1024,
desiredCount: 1,
cpu: 512,
taskImageOptions: {
image: ecs.ContainerImage.fromRegistry("amazon/amazon-ecs-sample"),
},
});
There are no Props for enabling deletion protection. Can anyone tell from his experience?
CDK offers the Escape Hatches feature to use Clouformation Props if any High-level construct does not have parameters.
// Create a load-balanced Fargate service and make it public
var loadBalancedService = new ecs_patterns.ApplicationLoadBalancedFargateService(this, `${ENV_NAME}-pgadmin4`, {
cluster: cluster, // Required
cpu: 512, // Default is 256
desiredCount: 1, // Default is 1
taskImageOptions: {
image: ecs.ContainerImage.fromRegistry('image'),
environment: {}
},
memoryLimitMiB: 1024, // Default is 512
assignPublicIp: true
});
// Get the CloudFormation resource
const cfnLB = loadBalancedService.loadBalancer.node.defaultChild as elbv2.CfnLoadBalancer;
cfnLB.loadBalancerAttributes = [{
key: 'deletion_protection.enabled',
value: 'true',
},
];

Could not find a value associated with JSONKey in SecretString

I want to make RDS and proxy with credential
However, I bumped into this error.
14:32:32 | CREATE_FAILED | AWS::RDS::DBCluster | DatabaseB269D8BB
Could not find a value associated with JSONKey in SecretString
My script is like this below.
const rdsCredentials: rds.Credentials = rds.Credentials.fromGeneratedSecret(dbInfos['user'],{secretName:`cdk-st-${targetEnv}-db-secret`});
const dbCluster = new rds.DatabaseCluster(this, 'Database', {
parameterGroup,
engine: rds.DatabaseClusterEngine.auroraMysql({ version: rds.AuroraMysqlEngineVersion.VER_2_08_1 }),
credentials: rdsCredentials,
cloudwatchLogsExports:['slowquery','general','error','audit'],
backup: backupProps,
instances:2,
removalPolicy: cdk.RemovalPolicy.DESTROY,
clusterIdentifier: dbInfos['cluster'], //clusterIdentifier,
defaultDatabaseName :dbInfos['database'], //defaultDatabaseName,
instanceProps: {
// optional , defaults to t3.medium
instanceType: ec2.InstanceType.of(ec2.InstanceClass.BURSTABLE2, ec2.InstanceSize.SMALL),
vpcSubnets: {
subnetType: ec2.SubnetType.PRIVATE_ISOLATED,
},
vpc,
securityGroups:[dbSecurityGroup],
},
});
const proxy = new rds.DatabaseProxy(this, 'Proxy', {
proxyTarget: rds.ProxyTarget.fromCluster(dbCluster),
secrets: [dbCluster.secret!],
vpc,
});
I guess this error is related to secrets: [dbCluster.secret!] maybe.
I googled around and found this error happens when secrets are deleted.
However I want to use credential which is just generated for RDS
Is it impossible?
how can I fix this?
More Test
I tried another way but this comes the error below
/node_modules/aws-cdk-lib/aws-rds/lib/proxy.ts:239
secret.grantRead(role);
my code is here
dbCluster.addProxy('testProxy',{
secrets: [rdsCredentials.secret!],
vpc
});

Create an AWS RDS instance without NAT Gateway using CDK

Is it possible to create a serverless RDS cluster via CDK without NAT Gateway? The NAT Gateway base charge is pretty expensive for a development environment. I'm also not interested in setting up a NAT instance. I'm attaching a Lambda in the VPC with the RDS instance like this.
// VPC
const vpc = new ec2.Vpc(this, 'MyVPC');
// RDS
const dbCluster = new rds.ServerlessCluster(this, 'MyAuroraCluster', {
engine: rds.DatabaseClusterEngine.AURORA_MYSQL,
defaultDatabaseName: 'DbName',
vpc,
});
Yes, you can. You may have to add some VPC endpoints like Secrets Manager so password rotation can be done, but it is possible. You will need to create a VPC with subnets that have no NAT gateway too.
// VPC
const vpc = new ec2.Vpc(this, 'MyVPC', {
natGateways: 0,
subnetConfiguration: [
{
cidrMask: 24,
name: 'public',
subnetType: ec2.SubnetType.PUBLIC,
},
{
cidrMask: 28,
name: 'rds',
subnetType: ec2.SubnetType.ISOLATED,
}
]
});
// RDS
const dbCluster = new rds.ServerlessCluster(this, 'MyAuroraCluster', {
engine: rds.DatabaseClusterEngine.AURORA_MYSQL,
defaultDatabaseName: 'DbName',
vpcSubnets: {
subnetType: ec2.SubnetType.ISOLATED,
},
vpc,
});
If you want Secrets Manager controlled password, use:
vpc.addInterfaceEndpoint('SecretsManagerEndpoint', {
service: ec2.InterfaceVpcEndpointAwsService.SECRETS_MANAGER,
});
dbCluster.addRotationSingleUser();
Serverless clusters cannot be placed into a public subnet.
This is a hard and documented limitation of RDS Serverless.

RDS issues with custom VPC Subnets

I am trying to use CDK to define a Serverless Postgres Aurora cluster but keep running into issues with regards to the VPC subnets either being "invalid" or "not existing", depending on which db cluster construct I attempt to use. In my setup, I have 2 Stacks: 1 for the VPC, and 1 for the RDS.
This is the contents of my Vpc Stack:
const vpc = new Vpc(this, 'Vpc');
const privateSubnetIds = vpc.selectSubnets({
subnetType: SubnetType.PRIVATE
}).subnetIds;
const rdsSecurityGroup = new SecurityGroup(this, 'RdsSecurityGroup', {
securityGroupName: 'rds-security-group',
allowAllOutbound: true,
description: `RDS cluster security group`,
vpc: vpc
});
...
// The rest of the file defines exports.
Case 1:
Initially, I tried using the CfnDBCluster as the DatabaseCluster does not allow you to directly define engineMode: 'serverless' and enableHttpEndpoint: true. Below is the contents of the RDS Stack using the CfnDBCluster construct:
// The beginning of the file imports all the VPC exports from the VPC Stack:
// subnetIds (for the private subnet), securityGroupId
...
const databaseSecret = new DatabaseSecret(this, 'secret', {
username: 'admin'
});
const secretArn = databaseSecret.secretArn;
const dbSubnetGroup = new CfnDBSubnetGroup(this, "DbSubnetGroup", {
dbSubnetGroupDescription: `Database cluster subnet group`,
subnetIds: subnetIds
});
const dbCluster = new CfnDBCluster(this, 'DbCluster', {
dbClusterIdentifier: 'aurora-cluster',
engine: 'aurora-postgresql',
engineMode: 'serverless',
databaseName: DB_NAME,
masterUsername: databaseSecret.secretValueFromJson('username').toString(),
masterUserPassword: databaseSecret.secretValueFromJson('password').toString(),
enableHttpEndpoint: true,
scalingConfiguration: {
autoPause: true,
minCapacity: 1,
maxCapacity: 16,
secondsUntilAutoPause: 300
},
vpcSecurityGroupIds: [securityGroupId],
dbSubnetGroupName: dbSubnetGroup.dbSubnetGroupName
});
Using the CfnDBCluster construct, I get the following error:
Some input subnets in :[subnet-044631b3e615d752c,subnet-05c2881d9b13195ef,subnet-03c63ec89ae49a748] are invalid. (Service: AmazonRDS; Status Code: 400; Error Code: InvalidParameterValue; Request ID: 5c4e6237-6527-46a6-9ed4-1bc46c38dce0)
I am able to verify that those Subnets do exist before the RDS Stack is run.
Case 2:
After failing to get the CfnDBCluster example above working, I tried using the DatabaseCluster construct with raw overrides. Below is the contents of the RDS Stack using the DatabaseCluster construct:
// The beginning of the file imports all the VPC exports from the VPC Stack:
// subnetIds (for the private subnet), securityGroupId, vpcId, AZs, vpc (using Vpc.fromAttributes)
...
const dbCluster = new DatabaseCluster(this, 'DbCluster', {
engine: DatabaseClusterEngine.auroraPostgres({
version: AuroraPostgresEngineVersion.VER_10_7
}),
masterUser: {
username: databaseSecret.secretValueFromJson('username').toString(),
password: databaseSecret.secretValueFromJson('password')
},
instanceProps: {
vpc: vpc,
vpcSubnets: {
subnetType: SubnetType.PRIVATE
}
},
});
const cfnDbCluster = dbCluster.node.defaultChild as CfnDBCluster;
cfnDbCluster.addPropertyOverride('DbClusterIdentifier', 'rds-cluster');
cfnDbCluster.addPropertyOverride('EngineMode', 'serverless');
cfnDbCluster.addPropertyOverride('DatabaseName', DB_NAME);
cfnDbCluster.addPropertyOverride('EnableHttpEndpoint', true);
cfnDbCluster.addPropertyOverride('ScalingConfiguration.AutoPause', true);
cfnDbCluster.addPropertyOverride('ScalingConfiguration.MinCapacity', 1);
cfnDbCluster.addPropertyOverride('ScalingConfiguration.MaxCapacity', 16);
cfnDbCluster.addPropertyOverride('ScalingConfiguration.SecondsUntilAutoPause', 300);
cfnDbCluster.addPropertyOverride('VpcSecurityGroupIds', subnetIds);
Using the DatabaseCluster construct, I get the following error:
There are no 'Private' subnet groups in this VPC. Available types:
I am able to verify that the VPC does have a Private subnet, I also verified that it was properly imported and that the Subnets all have the expected tags i.e. key: 'aws-cdk:subnet-type' value: 'Private'
This issue has me blocked and confused, I cannot figure out why either of these issues are manifesting and would appreciate any guidance offered on helping resolve this issue.
References:
DatabaseCluster Construct
CfnDBCluster Construct
Database Cluster CloudFormation Properties
Escape Hatches
Notes:
I am using CDK version 1.56.0 with Typescript
In case you visiting this page after getting-
Some input subnets in :[subnet-XXXX,subnet-YYYY,subnet-ZZZZ] are invalid.
You probably checked and confirmed that these subnets do not exist and knock your head struggling to find where the hell these subnets are coming from.
The reason CDK still point to these subnets is since cdk.context.json is still contains values from last deployments.
From the docs-
Context values are key-value pairs that can be associated with a stack
or construct. The AWS CDK uses context to cache information from your
AWS account, such as the Availability Zones in your account or the
Amazon Machine Image (AMI) IDs used to start your instances.
Replace all the JSON content to a valid one ( {} ) and re-deploy the stack.