I'm setting up an AWS EC2 Autoscaling Group (ASG) and it's using TargetTrackingScaling that is tracking a custom metric. This metric is published by each instance in the ASG every 30 seconds.
It's working fine, but I'd like the scale-out action to happen more quickly. The source of this looks to be due to the ASG Alarm that gets auto generated. To me it looks like it's waiting for at least 3 datapoints over 3 minutes before ringing the alarm.
Is there a way I can configure the ASG/Scaling policy such that the alarm only needs 1 datapoint (or less time) before deciding to ring the alarm? Or if that's not possible, can I create a custom alarm and use that instead of the alarm that the ASG auto generated.
Related
In my AWS elastic server setup, I have configured 4 Alarms
Add instance when CPU utilization > 20
Add instance when TargetResponceTime > 0.9
Remove instance when CPU utilization < 20
Remove instance when TargetResponceTime < 0.9
What will happen if two or more alarms triggered together?
For Example
If alarm 1 and 2 triggered together will it add two instances?
If alarm 1 and 4 triggered together will it remove an instance and add one or will it stay neutral?
The alarms are working fine, but I want to understand the mechanism behind alarm action execution.
Any Idea?
Your auto scaling group has a cooldown period, so technically multiple actions cannot occur at the same time. The next action would occur after the cooldown period has passed.
This functionality is to stop exactly what you're talking about, with multiple instances scaling at once.
I think personally for what you're doing you should be making use of a composite CloudWatch alarm. By having an OR condition these 4 alarms could become 2, which would reduce the number of alarms you have to trigger an autoscaling action.
I'm using the AutoScalingGroup to launch a group of EC2 instances. These instances are acting as workers which are continuously listening to SQS for any new request.
Requirement:
Do upscale on something like throughput (i.e Total number of messages present in SQS by total number instances).
And I want to downscale whenever any instance which is part of ASG is sitting idle (CPUIdle) for let's say more than 15 mins.
Note: I am not looking for any metric which applies as whole to a particular ASG (eg: Average CPU).
One way of doing that could be defining the custom metric and allowing it to trigger a cloudwatch alarm to do that.
Is there a better way to accomplish this?
If you are defining the scaling policy at instance level, then you defeating the entire purposes of ASG. If you need to scale based on changing conditions, such as the queue size, then you can configure ASG based on the conditions specified here
https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-using-sqs-queue.html
A custom metric to send to Amazon CloudWatch that measures the number of messages in the queue per EC2 instance in the Auto Scaling group.
A target tracking policy that configures your Auto Scaling group to scale based on the >custom metric and a set target value. CloudWatch alarms invoke the scaling policy.
If you know a specific time window when the queue size goes up or down, you can also scale based on schedule.
You can always start with a very low instance count in ASG and set the desired capacity as such (say 1) and scale up based on queue, so you can continue using ASG policies.
Is it possible to set a CloudWatch alarm for when we are approaching the limit of EC2 instances currently allowed on our account?
For instance, if limit for EC2 instances is currently 250, when instance number 240 is provisioned, I want an alarm to trigger.
If you have an auto scaling group which launches new instances and you want to control it, you can use GroupInServiceInstances which gives you the number of instances running as part of the ASG. Read more here.
Yes, you could do this with a Lambda function, a CloudWatch Metric and a CloudWatch alarm.
Your alarm would be configured to alarm on the metric, if it exceeds some threshold (the threshold being your instance limit).
Your Lambda function, would run on a schedule e.g. every 5 mins, and would do the following:
Use the ec2:DescribeAccountAttributes API to get the account instance limit and cloudwatch:DescribeAlarms to get the current threshold of the alarm. If they differ, the alarm threshold should be updated the the instance limit via the cloudwatch:PutMetricAlarm API.
Use the ec2:DescribeInstances API and count the number of instances that are running and publish the value to a custom CloudWatch metric with the cloudwatch:PutMetricData API.
If the value published to the metric exceeds the threshold of the alarm, it will fire. The lambda function will keep the alarm threshold configured to the limit of instances and will publish datapoints to the metric based on the number of instances currently running.
We have an opsworks stack with two 24x7 instances. Four time-based instances. Two load-based instances.
Our issue is with the load-based instances. We've spent a great deal of time creating meaningful-to-our-service cloudwatch alarms. Thus, we want the load-based instances in our stack to come UP when a particular cloudwatch latency alarm is in an ALARM state. I see that in the load-based instance configuration, you can define a cloudwatch alarm for bringing the instance(s) UP and you can define a cloudwatch alarm for bringing the instance(s) DOWN.
Thing is, when I select the specific cloudwatch alarm I want to use to trigger the UP, it removes that cloudwatch alarm from being selected as the trigger for DOWN. Why?
Specifically, we want our latency alarm (we'll call it the "oh crap things are slowing down" cloudwatch alarm) to trigger the load-based instances to START when in an ALARM state. Then, we want the "oh crap things are slowing down" cloudwatch alarm to trigger the load-based instances to SHUTDOWN when in an OK state. It would be rad if the load-based instances waited 15 minutes after the OK state of the alarm before shutting down.
The "oh crap things are slowing down" threshold is Latency > 2 for 3 minutes
Do I just need to create a new "oh nice things are ok" alarm with a threshold of Latency < 2 for 3 minutes to use as the DOWN alarm in the load-based instance configuration?
Sorry for the newbie question, just feel stuck.
From what I can tell, you have to add a second alarm that triggers only when the latency is below 2 for three minutes. If someone else comes up with a cleaner solution than this, I'd love to hear about it. As it is, you'll always have one of the alerts in a continuous state of alarm.
I would like to create a CloudWatch alarm that sends an email when I forgot to delete my RDS instance after use. So I only want an alarm that triggers when the RDS instance is available. My initial approach is the following:
Create an alarm based on "CPUUtilization" and have it trigger when the utilization has on average been between 0 and 1 percent for about 1 or 2 hours.
However, until now I can only state 1 constraint. What I mean is that I can have the alarm trigger when the utilization is below 1 percent for about 1 or 2 hours. But this means that it will also trigger when the instance has been deleted.
Can anyone help me figuring out how to tackle this problem?
If you stop your RDS instance, it will stop publishing metrics. Your alarm will go into INSUFFICIENT_DATA state, so your ALARM actions won't be executed.
More about CloudWatch Alarms here: http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/AlarmThatSendsEmail.html