ECS/EC2 ASG not scaling out for new service tasks - amazon-web-services

I'm relatively new to ECS, for the most part its been fine, but lately I'm facing an issue that I can seem to find an intuitive solution for.
I'm running a ECS cluster with an EC2 capacity provider. EC2 has an AutoScalingGroup with a min_capacity: 1 & max_capacity: 5.
Each ECS service task has auto scaling enabled based upon CPU/Memory utilisation.
The issue I'm seeing is that when new Tasks and being deployed as part of our CI/CD, ECS returns unable to place a task because no container instance met all of its requirementsunable to place a task because no container instance met all of its requirements.
I'm wondering how I get ECS to trigger a scale out event for the ASG when this happens? Do I need a particular scaling policy for the ASG? (I feel its related to this)
I attempted to setup a EventBridge/Cloudwatch alarm to trigger a scale out event whenever this happens but seems hacky. It worked, not ideally but worked. Surely there is a nicer/simpler way of doing this?
Any advice or points from experience would be greatly appreciated!
(PS - let me know if you need any more information/screenshots/code examples etc.)

Related

Maintaining desired instances when using AWS CodeDeploy with ASG as a capacity provider

I have an application running in ECS with asg as the capacity provider and using code deploy for rolling out new images.
the scaling policy configured in ECS service triggers autoscaling based on the cpu/mem metrics through cloud-watch alarms (target tracking policy).
When i trigger blue/green rollout in code deploy, there is a need for exactly the double the instances in the asg in order to accommodate the replacement version before routing the traffic.
however at this point ASG wont trigger autoscale and hence code deploy will not get enough instances ready to get replacement version started.
i think there are few ways to achieve this (although not tried it) but I am looking for a simpler out of the box solution where i need not to maintain a lot of configuration

EC+EC2 auto scale instance on deployment

I have an ECS cluster running service and task that sits on an EC2 machine. The task is big and takes up the whole machine its running on. Is there any way to set up ECS & EC2 to scale temporally on deployments to create a new instance, run the new task then stop the old pre-deployment task?
So far tried to play with the auto-scaling on both EC2 & ECS separately but it seems to me that the conditions for scaling are down to CPU or memory utilization, however as my task takes up the whole instance there are no alarms that could be triggered on deployment as there's just no suitable instance to start the task on.
Right now I have the services running as DAEMON, so it's one per instance, and autoscaling seems to be disabled for it. During deployments, it drains the old task before starting the new one creating downtime. To set the service type to REPLICA, scaling can be enabled, however, it's based on instance resource utilization and I can't seem to figure out how I would create a new instance on deployments.
I am missing something or interpreting how these auto-scalers work wrong, as I can't seem to see a nice way to deploy without affecting the current task sort of having just an unused instance sitting there running in the background all the time(wasting money).

AWS instance schedulewith auto scaling groups

I've configured AWS instance scheduler, everything is working as expected.
The issue I'm having is each instance has a autoscaling group in my dev environment and i'm unable to shutdown instances without them beign terminated by autoscale group when it does a health check and notices its down.
Has anyone figured out an automated solution to this without me having to manually suspend ASG? Since the whole purpose of this is to stop the instances after hours I'm unable to intervene to suspend/resume ASG.
Thanks in advance!
"Auto Scaling" and "AWS Instance Scheduler" don't really fit together nicely. Do you really need ELB for Dev environments? I feel this is overkill.
Anyway, if you still want to use ELB + AutoScaling and would like to shutdown the boxes during off hours, you can set "AutoScaling" to ZERO for the hours you want using Scheduled Scaling approach.

AWS ECS Periodical Job - Automatically Scale in instance

Amazon ECS provides really good service for scheduled tasks : ECS Scheduled tasks that works pretty well.
However it's important in this always keep one ECS instance in ECS cluster.
What is the best way:
Launch/scale in ECS instance in for periodical job (just before task execution);
Run ECS tasks on newly created instance;
Terminate/scale out instance after completion.
One possible workaround is to write lambda that will do smth. like that (launch ec2) but it looks as too much pain.
Finally I found out an easy solution for that problem. Everything was quite simple:
Go to Autoscaling groups (This you can find on EC2 dashboard-> Autoscaling section);
Create scheduled action (In that case necessary frequency can be specified for your container instance);
Save your configuration. Instance will be added in the specified time.
In my case I also need to scale down this instance in 1 hour period.

Scaling ECS EC2 instances when a task cannot be placed

I am using an ECS cluster for Jenkins agents/slaves with the Jenkins ECS plugin.
The plugin places a ECS Task when a job requests a build-node. Now I want to scale the EC2 instances in a Autoscaling Group associated with the ECS Cluster according to the demand.
The jenkins is often idle. In this case, I do not want there to be any instances in the autoscaling group.
If a node (and therefore an ECS task) is requested and cannot be placed, I want to add an EC2 instance to the autoscaling group.
If an instance is idle and shortly before an billing hour, I want that instance to be removed.
The 3. point can be accomplished by a cronjob on the EC2 instances that regularly checks if the conditions are met and removes the EC2 instance.
But how can I accomplish the 2. point? I am unable to create a cloudwatch alarm that triggers, if a task cannot be placed.
How can I accomplish this?
A rather hacky way to achieve this: You could use a Lambda function to detect when a service has runningCount + pendingCount < desiredCount for more than X seconds. (I have not tested this yet.)
Similar solutions are proposed here.
There does not seem to be a proper solution to scale only when tasks cannot be placed. Maybe AWS wants us to over-provision our clusters, which might be good practice for high availability, but not always the best or cheapest solution.
When a task cannot be placed it means that placing that task in your ECS cluster would exceed either your MemoryReservation or CPUReservation. You could set up Cloudwatch alarms for one or both of these ECS metrics and an auto scaling policy that will add and remove EC2 instances in your ECS cluster.
This, in combination with an auto scaling policy that scales your ECS services on the ecs:service:DesiredCount dimension should be enough to get you adding the underlying EC2 instances your ECS cluster requires.
For example your ScalingPolicy for an ECS Service might be "when we're using 70% of our allotted memory for this service, add 2 to the DesiredCount". After adding 1 service task, your ECS Cluster MemoryReservation metric might bump up past an "80" threshold, at which point a Cloudwatch alarm would trigger for some threshold on ECS MemoryReservation, with an auto scaling policy adding another EC2 node, on which the 2nd task could now be placed.
For those arriving after January 2020, the way to handle it now is probably Cluster Auto Scaling as documented here: "Amazon ECS cluster auto scaling" with more info here: "Deep Dive on Amazon ECS Cluster Auto Scaling)".
Essentially, ECS now handles most the heavy lifting. Not all, or I wouldn't be here looking for an answer ;)
For point 2, one way to solve this would be to autoscale when there is not enough cpu units for placing a new jenkins slave.
You should use the cpu reservation metric on the cluster to scale.
http://docs.aws.amazon.com/AmazonECS/latest/developerguide/cloudwatch-metrics.html#cluster_reservation