ECS Cluster with Capacity Provider not getting deleted during ROLLBACK - amazon-web-services

I have a cloudformation template as below.
Resources:
EcsCluster:
Type: 'AWS::ECS::Cluster'
Properties:
ClusterName: !Sub 'EcsCluster-${EnvName}'
CapacityProviders:
- !Ref OnDemandCapacityProvider
- !Ref SpotFleetCapacityProvider
DefaultCapacityProviderStrategy:
- CapacityProvider: !Ref OnDemandCapacityProvider
Base: 2
Weight: 0
- CapacityProvider: !Ref SpotFleetCapacityProvider
Base: 0
Weight: 1
OnDemandCapacityProvider:
Type: AWS::ECS::CapacityProvider
Properties:
AutoScalingGroupProvider:
AutoScalingGroupArn: !Ref OnDemandEcsInstanceAsg
ManagedScaling:
MaximumScalingStepSize: 2
MinimumScalingStepSize: 1
Status: ENABLED
TargetCapacity: 100
ManagedTerminationProtection: ENABLED
SpotFleetCapacityProvider:
Type: AWS::ECS::CapacityProvider
Properties:
AutoScalingGroupProvider:
AutoScalingGroupArn: !Ref SpotEcsInstanceAsg
ManagedScaling:
MaximumScalingStepSize: 2
MinimumScalingStepSize: 1
Status: ENABLED
TargetCapacity: 100
ManagedTerminationProtection: DISABLED
...
When the stack creation failed, It's trying to rollback and stack rollback failed with below message.
The following resource(s) failed to delete: [SpotFleetCapacityProvider, OnDemandCapacityProvider, EcsCluster].
I checked the CapacityProviders, and I could see below message there,
The capacity provider cannot be deleted because it is associated with cluster: EcsCluster-test-217. Remove the capacity provider from the cluster and try again.
As per above message, Capacity providers will not get deleted and I need to remove capacity providers from the cluster before trying to delete them, But how can I write to remove them from the cloudformation template.
Appreciate any clue!!
Thanks

Related

ECS Failed to create service due to assume role

I got the following error when attempting to create an ECS service (Fargate) using Cloud Formation.
Invalid request provided: CreateService error: Unable to assume role and validate the specified targetGroupArn. Please verify that the ECS service role being passed has the proper permissions. (Service: Ecs, Status Code: 400, Request ID: 32dc55bc-3b69-46dd-bf95-f3fff77c2508, Extended Request ID: null)
Things that tried/related:
Updating the role to include even AdministratorAccess (just for troubleshooting).
Allowing several services (ecs, elb, ec2, cloudformation) to assume role (was only ecs-tasks originally).
Create ECS service in web console successfully (same config). (But Cloud Formation doesn't work).
The ECS role has not been updated, the last successful ECS service creation was 21 Nov 2020 (/w Cloud Formation)
The following is the ECS role and Cloud Trail event of the above error. Has anyone faced similar issues or know what is happening?
Edit 1:
ECS template is included, IAM role and the ECS service belongs to different root stack such that it is not possible to use DependsOn attribute. We have CI/CD that ensures the IAM stack is updated before the ECS stack.
ECS Task role used:
EcsTaskRole:
Type: 'AWS::IAM::Role'
Properties:
ManagedPolicyArns:
- 'arn:aws:iam::aws:policy/AdministratorAccess'
- 'arn:aws:iam::aws:policy/AmazonSQSFullAccess'
- 'arn:aws:iam::aws:policy/AmazonS3FullAccess'
- 'arn:aws:iam::aws:policy/AmazonSNSFullAccess'
- 'arn:aws:iam::aws:policy/AmazonDynamoDBFullAccess'
- 'arn:aws:iam::aws:policy/AmazonRDSFullAccess'
- 'arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy'
- 'arn:aws:iam::aws:policy/AmazonSSMReadOnlyAccess'
- 'arn:aws:iam::aws:policy/AWSXrayFullAccess'
- 'arn:aws:iam::aws:policy/AWSBatchFullAccess'
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Principal:
Service:
- ecs-tasks.amazonaws.com
- ecs.amazonaws.com
- cloudformation.amazonaws.com
- elasticloadbalancing.amazonaws.com
- ec2.amazonaws.com
Action:
- 'sts:AssumeRole'
Outputs:
EcsTaskRoleArn:
Description: EcsTaskRoleArn
Value: !GetAtt EcsTaskRole.Arn
Export:
Name: !Sub "${AWS::StackName}-EcsTaskRoleArn"
Event from Cloud Trail: (Masked some info)
{
"eventVersion":"1.08",
"userIdentity":{
"type":"IAMUser",
"principalId":"********",
"arn":"arn:aws:iam::*****:user/****",
"accountId":"*********",
"accessKeyId":"********",
"userName":"********",
"sessionContext":{
"sessionIssuer":{
},
"webIdFederationData":{
},
"attributes":{
"mfaAuthenticated":"false",
"creationDate":"2021-01-01T20:48:02Z"
}
},
"invokedBy":"cloudformation.amazonaws.com"
},
"eventTime":"2021-01-01T20:48:14Z",
"eventSource":"ecs.amazonaws.com",
"eventName":"CreateService",
"awsRegion":"ap-east-1",
"sourceIPAddress":"cloudformation.amazonaws.com",
"userAgent":"cloudformation.amazonaws.com",
"errorCode":"InvalidParameterException",
"errorMessage":"Unable to assume role and validate the specified targetGroupArn. Please verify that the ECS service role being passed has the proper permissions.",
"requestParameters":{
"clientToken":"75e4c412-a82c-b01a-1909-cfdbe788f1f1",
"cluster":"********",
"desiredCount":1,
"enableECSManagedTags":true,
"enableExecuteCommand":false,
"healthCheckGracePeriodSeconds":300,
"launchType":"FARGATE",
"loadBalancers":[
{
"targetGroupArn":"arn:aws:elasticloadbalancing:ap-east-1:********:listener-rule/app/********/e6a62b4cc4d13aaa/098a6759b6062f3f/f374eba8a4fb66e5",
"containerName":"********",
"containerPort":8080
}
],
"networkConfiguration":{
"awsvpcConfiguration":{
"assignPublicIp":"ENABLED",
"securityGroups":[
"sg-025cd908f664b25fe"
],
"subnets":[
"subnet-067502309b0359486",
"subnet-018893d9e397ecac5",
"subnet-0bfb736aefb90f05a"
]
}
},
"propagateTags":"SERVICE",
"serviceName":"********",
"taskDefinition":"arn:aws:ecs:ap-east-1:********:task-definition/********"
},
"responseElements":null,
"requestID":"32dc55bc-3b69-46dd-bf95-f3fff77c2508",
"eventID":"3f872d94-72a7-4ced-96a6-028a6ceeacba",
"readOnly":false,
"eventType":"AwsApiCall",
"managementEvent":true,
"eventCategory":"Management",
"recipientAccountId":"904822583864"
}
Cloud formation template of ECS service
MyServiceLogGroup:
Type: AWS::Logs::LogGroup
Properties:
LogGroupName: my-service-log
RetentionInDays: 365
MyServiceTargetGroup:
Type: 'AWS::ElasticLoadBalancingV2::TargetGroup'
Properties:
HealthCheckPath: /my-service/health
HealthCheckIntervalSeconds: 300
HealthCheckTimeoutSeconds: 10
Name: my-service-target-group
TargetType: ip
Port: 8080
Protocol: HTTP
VpcId: !Ref VpcId
MyServiceListenerRule:
Type: 'AWS::ElasticLoadBalancingV2::ListenerRule'
Properties:
Actions:
- Type: forward
TargetGroupArn: !Ref MyServiceTargetGroup
Conditions:
- Field: path-pattern
Values:
- /my-service/*
ListenerArn: !Ref AppAlbListenerArn
Priority: 164
MyServiceTaskDef:
Type: 'AWS::ECS::TaskDefinition'
Properties:
ContainerDefinitions:
- Name: my-service-container
Image: !Join
- ''
- - !Ref 'AWS::AccountId'
- .dkr.ecr.
- !Ref 'AWS::Region'
- .amazonaws.com/
- 'Fn::ImportValue': !Sub '${RepositoryStackName}-MyServiceECR'
- ':'
- !Ref MyServiceVersion
Essential: true
PortMappings:
- ContainerPort: 8080
Protocol: tcp
LogConfiguration:
LogDriver: awslogs
Options:
awslogs-group: !Ref MyServiceLogGroup
awslogs-region: !Ref AWS::Region
awslogs-stream-prefix: my-service
RequiresCompatibilities:
- FARGATE
Cpu: 256
Memory: 512
Family: my-service-taskdef
NetworkMode: awsvpc
ExecutionRoleArn:
'Fn::ImportValue': !Sub '${IamStackName}-EcsTaskRoleArn'
TaskRoleArn:
'Fn::ImportValue': !Sub '${IamStackName}-EcsTaskRoleArn'
Volumes: []
MyServiceECS:
Type: 'AWS::ECS::Service'
Properties:
DesiredCount: 1
Cluster: !Ref EcsCluster
TaskDefinition: !Ref MyServiceTaskDef
LaunchType: FARGATE
NetworkConfiguration:
AwsvpcConfiguration:
AssignPublicIp: ENABLED
SecurityGroups:
- !Ref SecurityGroupECS
Subnets:
- !Ref DmzSubnet1
- !Ref DmzSubnet2
- !Ref DmzSubnet3
LoadBalancers:
- ContainerName: my-service-container
ContainerPort: '8080'
TargetGroupArn: !Ref MyServiceListenerRule
EnableECSManagedTags: true
PropagateTags: SERVICE
HealthCheckGracePeriodSeconds: 300
DependsOn:
- MyServiceListenerRule
Use the DependsOn attribute to specify the dependency of the AWS::ECS::Service resource on AWS::IAM::Policy.
There are mistakes in your templates. The first apparent one is:
TargetGroupArn: !Ref MyServiceListenerRule
This should be:
TargetGroupArn: !Ref MyServiceTargetGroup
Large chunks of your templates are missing (ALB definition, listener), so can't comment on them.
p.s.
The IAM role is fine, in a sense that it is not the source of the issue. But giving full privileges to a number of services in one role is not a good practice.

How to Add TaskInstanceGroup to AWS EMR for autoscaling using cloudformation?

I want to add a auto scaling group for Task Nodes and unable to get it to work with cloudformation.
Same thing works fine for CoreInstanceGroup like below.
Instances:
CoreInstanceGroup:
InstanceCount: 1
InstanceType: !Ref CoreInstanceType
Market: ON_DEMAND
Name: Core Instance
AutoScalingPolicy:
Constraints:
MinCapacity: !Ref CoreMinCapacity
MaxCapacity: !Ref CoreMaxCapacity
When i replace CoreInstanceGroup with TaskInstanceGroup, the linter gives a warning and on running the script fails with error Property Not found.
Came across a Terraform script which refers to the TaskInstanceGroup. Anyone has had a way to figure this out ?
TIA.
Task Instance group is not part of AWS::EMR::Cluster. Thats why you are getting the error.
You have attach TaskInstanceGroup as different resource.
Which is AWS::EMR::InstanceGroupConfig.
JobFlowId: !Ref myEMRCluster this will determine in which cluster it is going to attach the resource. myEMRCluster is the resource name of EMR.
You can attach multiple TaskInstanceGroup with different autoscaling policy.
Also you can have different CloudFormation script for your task group. In that case you have to pass cluster id like JobFlowId: 'j-ABCD123456789'.
AWSTemplateFormatVersion: 2010-09-09
Resources:
myEMRCluster:
Type: 'AWS::EMR::Cluster'
Properties: <... Your existing config ...>
TaskInstanceGroup:
Type: 'AWS::EMR::InstanceGroupConfig'
Properties:
InstanceRole: TASK
InstanceCount: 0
InstanceType: 'r5.8xlarge'
Market: SPOT
BidPrice: '1.110'
Name: cfnTask
JobFlowId: !Ref myEMRCluster
AutoScalingPolicy:
Constraints:
MinCapacity: 0
MaxCapacity: 40
Rules:
- Name: container-pending-ratio-scale-out
Description: >-
Replicates the default scale-out rule in the console for YARN
memory.
Action:
SimpleScalingPolicyConfiguration:
AdjustmentType: CHANGE_IN_CAPACITY
ScalingAdjustment: 10
CoolDown: 300
Trigger:
CloudWatchAlarmDefinition:
ComparisonOperator: GREATER_THAN
EvaluationPeriods: 2
MetricName: ContainerPendingRatio
Namespace: AWS/ElasticMapReduce
Period: 300
Threshold: 2
Statistic: AVERAGE
Unit: COUNT
Dimensions:
- Key: JobFlowId
Value: '${emr.clusterId}'
- Name: idle-scale-in
Description: Replicates the default scale-in rule in the console for idle.
Action:
SimpleScalingPolicyConfiguration:
AdjustmentType: CHANGE_IN_CAPACITY
ScalingAdjustment: -40
CoolDown: 300
Trigger:
CloudWatchAlarmDefinition:
ComparisonOperator: LESS_THAN_OR_EQUAL
EvaluationPeriods: 2
MetricName: ContainerAllocated
Namespace: AWS/ElasticMapReduce
Period: 300
Threshold: 0
Statistic: AVERAGE
Unit: COUNT
Dimensions:
- Key: JobFlowId
Value: '${emr.clusterId}'
myEMRStep:
Type: 'AWS::EMR::Step'
Properties: <... If you have any ...>
Hope this helps.

Cloud-formation : Provided Arn is not in correct format

I am trying to trigger a AWS Task-schedular from CLoudwatch every 2 hours which will perform some of operation .
Below is my Cloudformation template
TaskSchedule:
Type: "AWS::Events::Rule"
DeletionPolicy: Delete
Properties:
Description: >
Run every two hours.
ScheduleExpression: !Ref TaskRate #rate(1 day) #cron (15 10 * * ? *) #(0 0 * * *) #!Ref LambdaRate
State: ENABLED
#Targets:
# - Arn: !Ref ecsCluster.Arn #!Sub ${TaskDefinitionDaily.Arn}
# Id: TaskSchedule
# EcsParameters:
# TaskDefinitionArn: !Ref TaskDefinitionDaily
# TaskCount: 1
# LaunchType: 'FARGATE'
# PlatformVersion: 'LATEST'
Targets:
- Id: 'ECSTarget'
Arn: !Ref ecsCluster.Arn #!Sub ${TaskDefinitionDaily.Arn}
EcsParameters:
TaskCount: 1
TaskDefinitionArn: !Ref 'TaskDefinitionDaily'
Now when I am trying to run the above Cloudformation template I am getting below error . I am new in CFT and don't know what is causing this.
Provided Arn is not in correct format. (Service: AmazonCloudWatchEvents; Status Code: 400; Error Code: ValidationException;
Please let me know what might I am doing wrong here.
You are trying to access the Arn attribute of ecsCluster but you are using !Ref to do so. This doesn't work. You have to use !GetAtt to receive an attribute.
Try the following
TaskSchedule:
Type: AWS::Events::Rule
DeletionPolicy: Delete
Properties:
Description: >
Run every two hours.
ScheduleExpression: !Ref TaskRate
State: ENABLED
Targets:
- Id: ECSTarget
Arn: !GetAtt ecsCluster.Arn
EcsParameters:
TaskCount: 1
TaskDefinitionArn: !Ref TaskDefinitionDaily

AWS ECS: Invalid service in ARN (Service: AmazonECS; ...)

Trying to create a ECS Service (on Fargate) with cloudformation but got error:
Invalid service in ARN (Service: AmazonECS; Status Code: 400; Error
Code: InvalidParameterException; Request ID: xxx).
According to error message seems some ARN is wrong, but I didn't find the reason, I checked ARN of IAM roles and its ok. The other ARN are passed with !Ref function (so not a typo error)
All Resources (including from all others nested templates, vpc, cluster, alb etc) are created, except the "Service" resouce (the ECS service).
Below is the template used (nested template). All parameters are ok (passed from root template). Parameters TaskExecutionRole and ServiceRole are ARNs from IAM roles created by ECS wizard:
Description: >
Deploys xxx ECS service, with load balancer listener rule,
target group, task definition, service definition and auto scaling
Parameters:
EnvironmentName:
Description: An environment name that will be prefixed to resource names
Type: String
EnvironmentType:
Description: See master template
Type: String
VpcId:
Type: String
PublicSubnet1:
Type: String
PublicSubnet2:
Type: String
ALBListener:
Description: ALB listener
Type: String
Cluster:
Description: ECS Cluster
Type: String
TaskExecutionRole:
Description: See master template
Type: String
ServiceRole:
Description: See master template
Type: String
ServiceName:
Description: Service name (used as a variable)
Type: String
Default: xxx
Cpu:
Description: Task size (CPU)
Type: String
Memory:
Description: Task size (memory)
Type: String
Conditions:
HasHttps: !Equals [!Ref EnvironmentType, production]
HasNotHttps: !Not [!Equals [!Ref EnvironmentType, production]]
Resources:
ServiceTargetGroup:
Type: AWS::ElasticLoadBalancingV2::TargetGroup
Properties:
Name: !Sub '${EnvironmentName}-${ServiceName}'
VpcId: !Ref VpcId
TargetType: ip
Port: 80
Protocol: HTTP
AlbListenerRule:
Type: AWS::ElasticLoadBalancingV2::ListenerRule
Properties:
Actions:
- Type: forward
TargetGroupArn: !Ref ServiceTargetGroup
Conditions:
- Field: host-header
Values: [www.mydomain.com] # test
ListenerArn: !Ref ALBListener
Priority: 1
TaskDefinition:
Type: AWS::ECS::TaskDefinition
Properties:
Family: !Sub '${EnvironmentName}-${ServiceName}-Task'
ContainerDefinitions:
- Name: !Ref ServiceName
Image: nginx
PortMappings:
- ContainerPort: 80
LogConfiguration:
LogDriver: awslogs
Options:
awslogs-group: !Ref EnvironmentName
awslogs-region: !Ref AWS::Region
awslogs-stream-prefix: !Ref ServiceName
NetworkMode: awsvpc
RequiresCompatibilities: [FARGATE]
Cpu: !Ref Cpu
Memory: !Ref Memory
ExecutionRoleArn: !Ref TaskExecutionRole
Service:
Type: AWS::ECS::Service
DependsOn: TaskDefinition
Properties:
Cluster: !Ref Cluster
ServiceName: !Ref ServiceName
TaskDefinition: !Ref TaskDefinition
LaunchType: FARGATE
DesiredCount: 1
LoadBalancers:
- ContainerName: !Ref ServiceName
ContainerPort: 80
TargetGroupArn: !Ref ServiceTargetGroup
NetworkConfiguration:
AwsvpcConfiguration:
AssignPublicIp: ENABLED
Subnets:
- !Ref PublicSubnet1
- !Ref PublicSubnet2
Role: !Ref ServiceRole
I lost a few hours in this and could not solve it, I reviewed a lot in the documentation but nothing, if someone knows how to help.
Thanks!
The error message is confusing because it does not explain which parameter is wrong. Amazon API expects resource ARNs in several parameters including Cluster, TaskDefinition and TargetGroup. The error happens when one of these parameters are wrong. Please check carefully these parameters and make sure they are valid ARNs.
I had exactly the same error and in my case I made a mistake and provided wrong Cluster value.
And I am posting an answer here because this was the first search result for this error message and it had no answer.
The problem for me was that the default AWS region was set to the wrong one. To fix that, run the following command (using the correct region).
$ aws configure set default.region us-west-2

How to perform AWS CloudFormation autoscaling for ECS instance when cluster has insufficient memory available

I have created CloudFormation template that creates ECS service and task and has autoscaling for tasks. It is pretty basic - if MemoruUtilization for tasks reaches certain value then add 1 task and vice verse. Here are some of the most relevant parts form template.
EcsTd:
Type: AWS::ECS::TaskDefinition
DependsOn: LogGroup
Properties:
Family: !Sub ${EnvironmentName}-${PlatformName}-${Type}
ContainerDefinitions:
- Name: !Sub ${EnvironmentName}-${PlatformName}-${Type}
Image: !Sub ${AWS::AccountId}.dkr.ecr.{AWS::Region}.amazonaws.com/${PlatformName}:${ImageVersion}
Environment:
- Name: APP_ENV
Value: !If [isProd, "production", "staging"]
- Name: APP_DEBUG
Value: "false"
...
PortMappings:
- ContainerPort: 80
HostPort: 0
Memory: !Ref Memory
Essential: true
EcsService:
Type: AWS::ECS::Service
DependsOn: WaitForLoadBalancerListenerRulesCondition
Properties:
ServiceName: !Sub ${EnvironmentName}-${PlatformName}-${Type}
Cluster:
Fn::ImportValue: !Sub ${EnvironmentName}-ECS-${Type}
DesiredCount: !Sub ${DesiredCount}
TaskDefinition: !Ref EcsTd
Role: "learningEcsServiceRole"
LoadBalancers:
- !If
- isWeb
- ContainerPort: 80
ContainerName: !Sub ${EnvironmentName}-${PlatformName}-${Type}
TargetGroupArn: !Ref AlbTargetGroup
- !Ref AWS::NoValue
ServiceScalableTarget:
Type: "AWS::ApplicationAutoScaling::ScalableTarget"
Properties:
MaxCapacity: !Sub ${MaxCount}
MinCapacity: !Sub ${MinCount}
ResourceId: !Join
- /
- - service
- !Sub ${EnvironmentName}-${Type}
- !GetAtt EcsService.Name
RoleARN: arn:aws:iam::645618565575:role/learningEcsServiceRole
ScalableDimension: ecs:service:DesiredCount
ServiceNamespace: ecs
ServiceScaleOutPolicy:
Type : "AWS::ApplicationAutoScaling::ScalingPolicy"
Properties:
PolicyName: !Sub ${EnvironmentName}-${PlatformName}-${Type}- ScaleOutPolicy
PolicyType: StepScaling
ScalingTargetId: !Ref ServiceScalableTarget
StepScalingPolicyConfiguration:
AdjustmentType: ChangeInCapacity
Cooldown: 1800
MetricAggregationType: Average
StepAdjustments:
- MetricIntervalLowerBound: 0
ScalingAdjustment: 1
MemoryScaleOutAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmName: !Sub ${EnvironmentName}-${PlatformName}-${Type}-MemoryOver70PercentAlarm
AlarmDescription: Alarm if memory utilization greater than 70% of reserved memory
Namespace: AWS/ECS
MetricName: MemoryUtilization
Dimensions:
- Name: ClusterName
Value: !Sub ${EnvironmentName}-${Type}
- Name: ServiceName
Value: !GetAtt EcsService.Name
Statistic: Maximum
Period: '60'
EvaluationPeriods: '1'
Threshold: '70'
ComparisonOperator: GreaterThanThreshold
AlarmActions:
- !Ref ServiceScaleOutPolicy
- !Ref EmailNotification
...
So when ever task starts to run out of memory we'll add new task. However at some point we'll reach the limit how much memory are available in out cluster.
So for example is Cluster consists of one t2.small instance then we have 2Gb RAM. A small amount of that is used by ECS task running in instance so we have less then 2GB RAM. If we set the value of Task's memory to 512Mb then we can put only 3 tasks in that cluster unless we scale up the cluster.
By default ECS service has MemoryReservation metrics that can be used for autoscaling cluster. We would tell that when MemoryReservation in more then 75% then add 1 instance to cluster. That's relatively easy.
EcsCluster:
Type: AWS::ECS::Cluster
Properties:
ClusterName: !Sub ${EnvironmentName}-${Type}
SgEcsHost:
...
ECSLaunchConfiguration:
Type: AWS::AutoScaling::LaunchConfiguration
Properties:
ImageId: !FindInMap [AWSRegionToAMI, !Ref 'AWS::Region', AMIID]
InstanceType: !Ref InstanceType
SecurityGroups: [ !Ref SgEcsHost ]
AssociatePublicIpAddress: true
IamInstanceProfile: "ecsInstanceRole"
KeyName: !Ref KeyName
UserData:
Fn::Base64: !Sub |
#!/bin/bash
echo ECS_CLUSTER=${EnvironmentName}-${Type} >> /etc/ecs/ecs.config
ECSAutoScalingGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
VPCZoneIdentifier:
- Fn::ImportValue: !Sub ${EnvironmentName}-SubnetEC2AZ1
- Fn::ImportValue: !Sub ${EnvironmentName}-SubnetEC2AZ2
LaunchConfigurationName: !Ref ECSLaunchConfiguration
MinSize: !Ref AsgMinSize
MaxSize: !Ref AsgMaxSize
DesiredCapacity: !Ref AsgDesiredSize
Tags:
- Key: Name
Value: !Sub ${EnvironmentName}-ECS
PropagateAtLaunch: true
ScalePolicyUp:
Type: AWS::AutoScaling::ScalingPolicy
Properties:
AdjustmentType: ChangeInCapacity
AutoScalingGroupName:
Ref: ECSAutoScalingGroup
Cooldown: '1'
ScalingAdjustment: '1'
MemoryReservationAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
EvaluationPeriods: '1'
Statistic: Average
Threshold: '75'
AlarmDescription: Alarm if MemoryReservation is more then 75%
Period: '60'
AlarmActions:
- Ref: ScalePolicyUp
- Ref: EmailNotification
Namespace: AWS/EC2
Dimensions:
- Name: AutoScalingGroupName
Value:
Ref: ECSAutoScalingGroup
ComparisonOperator: GreaterThanThreshold
MetricName: MemoryReservation
However it does not make sense because that would happen when the third task is added so the new instance will be empty until 4th tasks is scaled. That means we'll be paying for instance that we don't use.
I have noticed that when ECS service tries to add task to cluster where there is not enough free Memory I get
service Production-admin-worker was unable to place a task because no
container instance met all of its requirements. The closest matching
container-instance ################### has
insufficient memory available.
In this example the template's parameters are:
EnvironmentName=Production
PlatformName=Admin
Type=worker
Is it possible to create AWS::CloudWatch::Alarm that looks at ECS cluster events and looks for that particular pattern? The idea would be to scale up instance count in cluster using AWS::AutoScaling::AutoScalingGroup only when AWS::ApplicationAutoScaling::ScalingPolicy adds tasks that does not have space in cluster. And scale down the cluster when MemoryReservation is less then 25% (meaning that there are no tasks running there - AWS::ApplicationAutoScaling::ScalingPolicy has removed them).
That means we'll be paying for instance that we don't use.
Either you pay for the extra/backup capacity in advance, or implement logic to retry the ones that failed due to low capacity.
Couple of ways I can think of:
You could create a custom script/lambda (https://forums.aws.amazon.com/thread.jspa?threadID=94984) that reports a metric say load_factor calculated as number of tasks / number of instances and then base an your auto scaling policy on that. Lambda can be triggered by a CW Rule.
You could also report this from your task implementation instead of a new custom lambda/script.
Create a metric filter that looks for a specific pattern in a log file/group and reports a metric. Then of course use this metric for scaling.
From docs:
When a metric filter finds one of the terms, phrases, or values in your log events, you can increment the value of a CloudWatch metric.