I have observed it takes 15 data points in 15 minutes to trigger an alarm and start scale in and I could not find an option to change or adjust this behavior.
Can we change this behavior or we will have to define another scaling policy to scale in?
It seems like its the default settings for Target Tracking scaling policy for cpu utilization that
CPUUtilization > 50 for 3 datapoints within 3 minutes [will trigger high alarm]
CPUUtilization < 35 for 15 datapoints within 15 minutes [will trigger low alarm]
Is this true?
Yes, you are correct. These are default settings for the target tracking policy. However, you shouldn't be editing target tracking alarms outside of autoscaling settings.
The AWS specifically writes:
DO NOT EDIT OR DELETE.
If you are not happy with how target tracking policy works, you can use step or simple scaling policies instead.
For more advanced tuning of your scaling, you can use multiple scaling polices.
Related
I was reading through this AWS DOC https://docs.aws.amazon.com/autoscaling/application/userguide/application-auto-scaling-step-scaling-policies.html around Application Auto Scaling Step policies as target auto scaling policies won't work for my use case.
Something that's not clear to me is whether if I define a Step that adds 1 capacity to, say, number of tasks of an ECS Service, that tracks an alarm threshold X (measured in percentage) and, even after the scale out action the X percentage keeps relatively still, if that will continue increasing the number of tasks (after the cooldown period).
Eg.:
T0
number of tasks = 10
metric X = 60%
with a step scaling policy that scales-out when X >= 70%
T1
metric X goes up to 80%
scale out action is triggered
T2
number of tasks is now 11
the new task reduces the burden and metric X is reduced to 75%
then here comes the question; will that step scale policy trigger another scale out (given that the metric X is still > 70%)?
Yes, every minute that a cloudwatch alarm stays in alarm state it will trigger the AutoScaling action. So as long as the alarm triggers the policy after the cooldown ends, it would scale again (remembering that the cooldown doesn't start until the first scaling action /finishes/).
You can also define multiple steps, and if a larger one is triggered while the first is still in progress, the difference will happen to allow another scale out. For example:
T0 step policy triggers +1
T1 (1 minute later while the +1 is in progress): step policy is triggered at the +3 step. 2 more tasks would be added now.
As a side note, why doesn't target tracking work for you, what metric are you using? You can define a custom metric specification if creating the policy outside the CLI to use non-predefined metrics
I enabled scaling policy and I'm trying to figure out in what period I should set my threshold for. The metric I'm scaling on is request count per target
If I navigate to
Target Groups > MyTargetGroup > Monitoring > Request Count Per Target
The period is set to 5 Minutes by default. I thought this would be the period in which I should set my target to, but it doesn't seem right.
What should be the correct period?
The screen your on is a monitoring screen. When you set the period there its only changing the graph view for that screen.
When you create the target tracking policy it will automatically create 2 CloudWatch alarms for you, a high and a low. These alarms are what trigger the scaling policy. Currently the values for these alarms on EC2 AutoScaling default to:
Scale out (high alarm): 3 consecutive 60 second periods
Scale in (low alarm): 15 consecutive 60 second periods
You cannot control the alarm settings other than setting the target value when using target tracking, its designed to be managed for you for ease of setup. The target tracking settings work very well for most use cases. If they don't want these settings you can configure your own alarms when you use step scaling instead.
I have one scale in simple policy in my autoscaling group which is based on CPU Utilization.
The policy looks like:
Execute :
When CPUUtilization < 50 for 5 consecutive periods of 60 seconds
Action :
Remove 10 percent of group
Cooldown time:
600 seconds before allowing another scaling activity
Now I would like to add a more aggressive simple policy, saying if CPUUtilization is less than 35 for 5 minutes, remove 20% of the group.
The goal is
When 35 < CPU Utilization < 50 for 5 minutes, remove 10% of the group
When CPU Utilization < 35 for 5 minutes, remove 20% of the group
The problem is I cannot use scaling policy with steps since the cooldown time is not supported which could make my asg scaling in until the min instances.
And if I have both simple policies, they are obviously conflict. I don't really know which policy will be triggered first if it reachs CPUUtilization < 35.
Does anyone have a workaround of this one?
Thanks.
You would certainly need to use Scaling Policy with Steps to be able to specify multiple rules for the scaling policy. While it doesn't allow the specification of a Cooldown period, it should work fine. I recommend you try it and monitor/test the system.
By the way, you have a very aggressive policy. It is not typically a good idea to scale-in based upon only 5 minutes of data. Amazon EC2 is charged in hourly increments, so you might be thrashing (adding and removing instances very quickly), which is not economical. It is typically recommended to scale-out quickly (to respond to user demand) but scale-in slowly (since there's really no rush).
I am planning on using AWS Autoscaling to scale my EC2 services, I have 4 policies that need to control my instance behavior, 2 for scale out and 2 for scale in. My question is what order will they be evaluated in? Scale out first then scale in? or vice-versa? Random? or something else?
Thank you,
Policies are not evaluated in an order. Each policy is compared against the metrics that policy is set up to measure, and takes actions based on the results.
For example, perhaps you have the following four policies:
Add 1 instance when an SQS queue depth is > 1000 messages
Remove 1 instance when the same SQS queue depth is < 200 messages
Add 1 instance when the average CPU of all instances in the autoscaling group is > 80%
Remove 1 instance when the average CPU of all instances in the autoscaling group is < 30%
As you can see, ordering doesn't make sense in this context. The appropriate action(s) will be executed whenever the conditions are met.
Note that without planning and testing you can encounter loops of instances that constantly cycle up and down. Drawing from the previous example, imagine that a new instance is launched because there are > 1000 messages in the queue. But the CPU usage is only 20% for all the instances, so then the 4th policy fires to remove an instance. Thus all the policies should be considered in concert.
I have an AWS autoscale group. Is it possible to set an alarm for a percentage increase of CPU? For example, if CPU increases 40% over 1min, trigger the alarm? Thus if CPU is at 0% at 12:51 and 40% at 12:52 the alarm will be triggered.
You can set alarms to check CPU average is greater than predefined value for a period of time.
But to check if it increased with precentage, you might need to implement a custom metrics as mentioned below AWS Blog
Yes You can Set Custom Alarm for CPU Load average Increase for AWS Auto scaling. Just follow the steps below.
You can find this in Auto Scaling Groups select a group for which you want to set alam, further goto Scaling Policies click on Add policy then next to Execute policy when: click on Create new alarm, set alarm for your desired aws auto scaling instance according to your need and set that alarm in Execute policy when: and in Take the action: you can select Add, Remove or Set to options according to your needs and you are done. When auto scaling group reaches to provided threshold which is given in Scaling Policies it will execute policy which you set. For reference you can check AWS Docs