Setup CloudWatch Alarms for EC2 instances in Autoscaling Group(CF) - amazon-web-services

I have an AWS::AutoScaling::AutoScalingGroup configuration that runs two instances of EC2. My question is - is it possible to attach CloudWatch alarms for both instances? For example I want to observe StatusCheckFailed_Instance metric for each EC2 in a group?
Usually you can attach alarms through the EC2 Instance ID but how to know each EC2 Instance ID in AutoScalingGroup to attach alerts? or here should be another way to attach alerts? I really can't find something useful and workable over internet.

Option 1)
Create your own script that's triggered on launch/terminate events
the scripts will each be set to trigger a lambda that would read in the instance ID and create/delete an alarm
Option 2)
If you're not trying to use the auto-recover option (which you shouldn't need in an ASG, since the ASG will just replace the instances), then you can make 1 aggregate alarm for the ASG
Create the alarm based on the StatusCheckFailed_Instance metric with the ASGName=<> Dimension
Set it to trigger if the MAX statistic value is > 1 (since that means at least 1 instance is failing, each instance will push its own datapoints to ASG versions of EC2 metrics)
Since you only have 2 instances, you can just manually check both if it ever triggers. But for larger ASGs using the SEARCH() math expression on the CloudWatch metrics console (or a dashboard) would be a good way to look through all the ASG instances and view their metrics to see which one is failing

Related

Autoscaling policy not launching new instances

I have setup different types of dynamic scaling policies within my autoscaling group that uses a launch template to try and see autoscaling launch new instances when these are triggered, but it doesn't do it. I just set up the following tracking policy, and here are the results:
The tracking policy
Alarm is in-alarm stated, and action was "Successfully executed"
Autoscaling activity is not reporting anything
The alarm always logs that an autoscaling action was triggered, however autoscaling does not log any activity.
This autoscaling group is set up with min: 1, max:6, currently there is only 1 instance running.
Where or how can I find the error that is causing this? Is it perhaps something related to permissions? All instances are healthy/in-service
Spent the past couple of days going through threads on stackoverflow and other forums and haven't found anything that can help me find where the issue is.
Also, the timestamps differ in the screenshots as I had just done a re-deployment, but it is the same behavior all the same before I had done that, there is never activity on the autoscaling instance from the action launched within the alarm...
I tried to replicate your scaling policy and environment:-
Created an ASG
assigned a tracking policy of 0.0001
My max capacity is 2, desired 1 and minimum 1.
The key point here is to wait for Cloudwatch ALarms to collect data from data points, for my scale-out activity it took around 5 minutes (the majority of the time is taken in warm up of instances too )
I just found the reason of this issue, under the autoscaling group, go to Details, then to Advanced configurations and then to Suspended processes, nothing should be selected in this field, in my case alarms was set there along a few other actions, that is the reason autoscaling wasn't running on cloudwatch alarms.
Autoscaling group advanced configurations section

Scaling ECS EC2 instances when a task cannot be placed

I am using an ECS cluster for Jenkins agents/slaves with the Jenkins ECS plugin.
The plugin places a ECS Task when a job requests a build-node. Now I want to scale the EC2 instances in a Autoscaling Group associated with the ECS Cluster according to the demand.
The jenkins is often idle. In this case, I do not want there to be any instances in the autoscaling group.
If a node (and therefore an ECS task) is requested and cannot be placed, I want to add an EC2 instance to the autoscaling group.
If an instance is idle and shortly before an billing hour, I want that instance to be removed.
The 3. point can be accomplished by a cronjob on the EC2 instances that regularly checks if the conditions are met and removes the EC2 instance.
But how can I accomplish the 2. point? I am unable to create a cloudwatch alarm that triggers, if a task cannot be placed.
How can I accomplish this?
A rather hacky way to achieve this: You could use a Lambda function to detect when a service has runningCount + pendingCount < desiredCount for more than X seconds. (I have not tested this yet.)
Similar solutions are proposed here.
There does not seem to be a proper solution to scale only when tasks cannot be placed. Maybe AWS wants us to over-provision our clusters, which might be good practice for high availability, but not always the best or cheapest solution.
When a task cannot be placed it means that placing that task in your ECS cluster would exceed either your MemoryReservation or CPUReservation. You could set up Cloudwatch alarms for one or both of these ECS metrics and an auto scaling policy that will add and remove EC2 instances in your ECS cluster.
This, in combination with an auto scaling policy that scales your ECS services on the ecs:service:DesiredCount dimension should be enough to get you adding the underlying EC2 instances your ECS cluster requires.
For example your ScalingPolicy for an ECS Service might be "when we're using 70% of our allotted memory for this service, add 2 to the DesiredCount". After adding 1 service task, your ECS Cluster MemoryReservation metric might bump up past an "80" threshold, at which point a Cloudwatch alarm would trigger for some threshold on ECS MemoryReservation, with an auto scaling policy adding another EC2 node, on which the 2nd task could now be placed.
For those arriving after January 2020, the way to handle it now is probably Cluster Auto Scaling as documented here: "Amazon ECS cluster auto scaling" with more info here: "Deep Dive on Amazon ECS Cluster Auto Scaling)".
Essentially, ECS now handles most the heavy lifting. Not all, or I wouldn't be here looking for an answer ;)
For point 2, one way to solve this would be to autoscale when there is not enough cpu units for placing a new jenkins slave.
You should use the cpu reservation metric on the cluster to scale.
http://docs.aws.amazon.com/AmazonECS/latest/developerguide/cloudwatch-metrics.html#cluster_reservation

AWS Cloudwatch alarm for each single instance of an auto scaling group

We have configured an Auto Scaling group in AWS. And it works fine. We have configured some alarms for the group, such as: send alarm if the average CPUUtilization > 60 for 2 minutes ... use AWS CLI.
The only problem is, if we want to monitoring each instance in the group. We have to configure them manually. Are they any way to do it automatically like config, template?
Amazon CloudWatch alarms can be created on the Auto Scaling group as a whole, such as Average CPUUtilization. This is because alarms are used to tell Auto Scaling when to add/remove instances and such decisions would be based upon the group as a whole. For example, if one machine is 100% busy but another is 0% busy, then on average the group is only 50% busy.
There should be no reason for placing an alarm on the individual instances in an auto-scaling group, at least as far as triggering a scaling action.
There is no in-built capability to specify an alarm that will be applied against each auto-scaled instance individually. You could do it programmatically by responding to an Amazon SNS notification whenever an instance is added/removed by Auto Scaling, but this would require your own code to be written.
You can accomplish this with lifecycle hooks and a little lambda glue. When you have lifecycle events for adding or terminating an instance, you can create an alarm on that individual instance or remove it (depending on the event) via a lambda function.
To John's point, this is a little bit of an anti-pattern with horizontal scaling and load balancing. However, theory and practice sometimes diverge.

AWS custom autoscaling policy

I am trying to figure out the way to create a custom autoscale policy for autoscaling in AWS using boto. I saw that the scale out and scale in policies are defined using system dependent resources like CPU utilization.
But I want the scale out/in policy to be defined in a way that it calls a REST API and compare the response with some values. How can I make it possible?
I am using CircleCi as CI tool. I have 2 ec2 instances running as CircleCi- builders. During the weekends we generally don't require 2 instances. So I need to autoscale with min 1 and max 3 ec2 instances. If there are builds in the queue I wan to spin up a new ec2 and if the queue is empty for more that say 2 hours I want to scale down to just 1 ec2 VM as circleci builder.
I can get the builds in queue information using CircleCi REST API.
Autoscaling doesnt do that for you. The reverse works though, you can execute a policy.
What you could also do is send your custom metrics to cloudwatch, configure an alarm on that and add an autoscaling action to the alarm.

cloudwatch alarm for Autoscaling

how to setup cloudwatch alarm for Autoscaling group when its scaling down Mincapacity Instances using cloudformation template.
I mean need alarm when all the Instances were "OutofService" basically this will happen when Instance failed ELB healthcheck.
Why don't you add an alarm based on the CloudTrail metric HealtlyHostCount? If you set a low threshold, you will get warned when there are no healthy instances.
You can see the metrics documentation here