AWS EC2 autoscaling without constant alarms? - amazon-web-services

I have created the following two alerts for an autoscaling group:
Scale up 1 instance if "CPUUtilization >= 75%" changes to state ALARM
Scale down 1 instance if "CPUUtilization >= 30%" changes to state OK
I have chosen to trigger the scale down event on OK to not have a constant ALARM in Cloudwatch if load is below 30%. On the other hand that is exactly the issue. When an upscale happens leaving the group with an average load between 30% and 75% the state is set to ALARM.
Is there any way to configure Cloudwatch to trigger scale up and down events properly without being left with an ALARM state after the scaling happened?

The "Scale down" operation should be set to "CPUUtilization < 75%" (state OK), and don't worry about leaving the group without machines, though you have a min number of instances and it does not go beyond that.

Related

Auto scale rule based on custom Cloudwatch alarm

I have an auto-scaling group of EC2 servers that run a number of processes.
This number of processes changes with the load and I'd like to trigger a scaling (up/down) based on the number of processes.
I've successfully set up a script that sends to Cloudwatch the number of processes on every servers, for every minutes, and I can see these on Cloudwatch. (I haven't set a dimension, to be able to get the value for all the servers).
Then, I created an Alarm, that uses the average for the values sent, and if it reach a certain limit, it triggers the "Add a new server" to the auto scaling group, and when it stop being on alarm, it triggers a "Remove a server".
My issue is that when I add the new server, the average drops, since there is one more server now, which move the alarm to the ok state, removing the server, and increasing again the average, triggering again the alarm, etc.
For instance, the limit is set to 10 processes on average. With 3 servers, if the average becomes 11, I trigger the alarm state, adding a server. Now with the new server, I'm at 33 processes (3 x 11) for 4 servers : 8,25 processes on average, thus triggering the "OK" alarm.
My question is: Is it possible to set up an alarm based on the number of processes without having the new trigger causes a up-down-up-down issue?
Instead of average, I can use something else to trigger the alarm, such as min/max/I-don't-know.
Thank you for your help. Happy to provide any other details if needed.
You should not create an alarm that adds instances when True and removes instances when False. This will cause a continual 'flip-flop' situation rather than trying to find a steady-state.
You could have each server regularly send a custom metric to Amazon CloudWatch. You could then use this with Target tracking scaling policies for Amazon EC2 Auto Scaling - Amazon EC2 Auto Scaling, which will calculate the average value of the metric and automatically launch/terminate instances to keep the target value around 10.
This would work well with long-running processes (perhaps 5+ minutes with several processes running concurrently), but would not be good with short sub-minute processes because it takes time to launch new instances.
I think you could look at metric math. So instead of directly triggering your alarm based on your process-count-metric only, you could perhabs calculate the average count yourself using metric math. You could use the GroupTotalInstances metric from your ASG, or just publish second custom metric having the number of instances.
In both cases, your metric for the alarm would use metric math to divide number of processes by size of ASG for each evaluation period.

AWS Application Scaling - Step Scaling policies

I was reading through this AWS DOC https://docs.aws.amazon.com/autoscaling/application/userguide/application-auto-scaling-step-scaling-policies.html around Application Auto Scaling Step policies as target auto scaling policies won't work for my use case.
Something that's not clear to me is whether if I define a Step that adds 1 capacity to, say, number of tasks of an ECS Service, that tracks an alarm threshold X (measured in percentage) and, even after the scale out action the X percentage keeps relatively still, if that will continue increasing the number of tasks (after the cooldown period).
Eg.:
T0
number of tasks = 10
metric X = 60%
with a step scaling policy that scales-out when X >= 70%
T1
metric X goes up to 80%
scale out action is triggered
T2
number of tasks is now 11
the new task reduces the burden and metric X is reduced to 75%
then here comes the question; will that step scale policy trigger another scale out (given that the metric X is still > 70%)?
Yes, every minute that a cloudwatch alarm stays in alarm state it will trigger the AutoScaling action. So as long as the alarm triggers the policy after the cooldown ends, it would scale again (remembering that the cooldown doesn't start until the first scaling action /finishes/).
You can also define multiple steps, and if a larger one is triggered while the first is still in progress, the difference will happen to allow another scale out. For example:
T0 step policy triggers +1
T1 (1 minute later while the +1 is in progress): step policy is triggered at the +3 step. 2 more tasks would be added now.
As a side note, why doesn't target tracking work for you, what metric are you using? You can define a custom metric specification if creating the policy outside the CLI to use non-predefined metrics

Repeated AWS EC2 Autoscaling

I was looking at the AWS EC2 Auto-Scaling with Cloud Watch feature.
What I cannot glean is if the auto scaling can be applied only once or repeatedly? I think only once though.
E.g. scale from 2 to 4 max if, say, 60% cpu reached.
Then, what if, having gone from 2 to 4, and then 60% cpu reached again according to Cloud Watch target rules, can we reapply autoscaling again?
I saw a "step" option, but that seems to work with different target values. Could the step have repeated 60% configuration or does it need 60, 70% and so on? This aspect is not entirely clear.
Or is such an iterative approach simply not possible? Implying one needs to re-baseline the initial allocations? I think not based on the step approach.
Autoscaling will be triggered by CloudWatch alarms when your alarm will breach your threshold.
So if your policy is to add 50% capacity when average CPU across the fleet is above 60% for more than 15min, the alarm will trigger a change in your scaling group, increasing the value of your "desired capacity" property, let's say going from 2 to 3 instances. Autoscaling will then react and bring the fleet to "desired capacity"
If, later, the same alarm is still on, a new change will be triggered to your scaling group, bringing desired capacity to 4. Then AutoScaling will create EC2 instances to bring your fleet to the new desired capacity.
And so on.
Of course, you do not want to increase the fleet size above your budget. So you can define a MAX fleet size, that AutoScaling will never go above. If your max fleet size is 3, the second alarm in my example will leave desired capacity as it is (3) because new desired capacity > max capacity. This will end up the scaling process.
To go back to normal, you must also create scale in policies, in addition of your scale out policies : i.e. when the average CPU on my fleet is below 15% for 1h, let's remove an instance. Your CloudWatch alarm will trigger, causing a change in the desired capacity of your fleet and AutoScaling will adjust (kill instances) to go to the new desired capacity value.
Of course, you do not want AutoScaling to kill the last instance from your fleet (going from desired capacity of 1 to desired capacity of 0), so, similarly to a MAX Capacity value, you also define a MIN capacity, i.e. the minimum number of instances to keep in your fleet, whatever alarm are triggering and trying to change the desired capacity.

Is there a way to scale in "instance" (part of ASG ) on certain custom metric?

I'm using the AutoScalingGroup to launch a group of EC2 instances. These instances are acting as workers which are continuously listening to SQS for any new request.
Requirement:
Do upscale on something like throughput (i.e Total number of messages present in SQS by total number instances).
And I want to downscale whenever any instance which is part of ASG is sitting idle (CPUIdle) for let's say more than 15 mins.
Note: I am not looking for any metric which applies as whole to a particular ASG (eg: Average CPU).
One way of doing that could be defining the custom metric and allowing it to trigger a cloudwatch alarm to do that.
Is there a better way to accomplish this?
If you are defining the scaling policy at instance level, then you defeating the entire purposes of ASG. If you need to scale based on changing conditions, such as the queue size, then you can configure ASG based on the conditions specified here
https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-using-sqs-queue.html
A custom metric to send to Amazon CloudWatch that measures the number of messages in the queue per EC2 instance in the Auto Scaling group.
A target tracking policy that configures your Auto Scaling group to scale based on the >custom metric and a set target value. CloudWatch alarms invoke the scaling policy.
If you know a specific time window when the queue size goes up or down, you can also scale based on schedule.
You can always start with a very low instance count in ASG and set the desired capacity as such (say 1) and scale up based on queue, so you can continue using ASG policies.

set aws autoscale alarm for a percentage increase of CPU

I have an AWS autoscale group. Is it possible to set an alarm for a percentage increase of CPU? For example, if CPU increases 40% over 1min, trigger the alarm? Thus if CPU is at 0% at 12:51 and 40% at 12:52 the alarm will be triggered.
You can set alarms to check CPU average is greater than predefined value for a period of time.
But to check if it increased with precentage, you might need to implement a custom metrics as mentioned below AWS Blog
Yes You can Set Custom Alarm for CPU Load average Increase for AWS Auto scaling. Just follow the steps below.
You can find this in Auto Scaling Groups select a group for which you want to set alam, further goto Scaling Policies click on Add policy then next to Execute policy when: click on Create new alarm, set alarm for your desired aws auto scaling instance according to your need and set that alarm in Execute policy when: and in Take the action: you can select Add, Remove or Set to options according to your needs and you are done. When auto scaling group reaches to provided threshold which is given in Scaling Policies it will execute policy which you set. For reference you can check AWS Docs