I have one scale in simple policy in my autoscaling group which is based on CPU Utilization.
The policy looks like:
Execute :
When CPUUtilization < 50 for 5 consecutive periods of 60 seconds
Action :
Remove 10 percent of group
Cooldown time:
600 seconds before allowing another scaling activity
Now I would like to add a more aggressive simple policy, saying if CPUUtilization is less than 35 for 5 minutes, remove 20% of the group.
The goal is
When 35 < CPU Utilization < 50 for 5 minutes, remove 10% of the group
When CPU Utilization < 35 for 5 minutes, remove 20% of the group
The problem is I cannot use scaling policy with steps since the cooldown time is not supported which could make my asg scaling in until the min instances.
And if I have both simple policies, they are obviously conflict. I don't really know which policy will be triggered first if it reachs CPUUtilization < 35.
Does anyone have a workaround of this one?
Thanks.
You would certainly need to use Scaling Policy with Steps to be able to specify multiple rules for the scaling policy. While it doesn't allow the specification of a Cooldown period, it should work fine. I recommend you try it and monitor/test the system.
By the way, you have a very aggressive policy. It is not typically a good idea to scale-in based upon only 5 minutes of data. Amazon EC2 is charged in hourly increments, so you might be thrashing (adding and removing instances very quickly), which is not economical. It is typically recommended to scale-out quickly (to respond to user demand) but scale-in slowly (since there's really no rush).
Related
I have observed it takes 15 data points in 15 minutes to trigger an alarm and start scale in and I could not find an option to change or adjust this behavior.
Can we change this behavior or we will have to define another scaling policy to scale in?
It seems like its the default settings for Target Tracking scaling policy for cpu utilization that
CPUUtilization > 50 for 3 datapoints within 3 minutes [will trigger high alarm]
CPUUtilization < 35 for 15 datapoints within 15 minutes [will trigger low alarm]
Is this true?
Yes, you are correct. These are default settings for the target tracking policy. However, you shouldn't be editing target tracking alarms outside of autoscaling settings.
The AWS specifically writes:
DO NOT EDIT OR DELETE.
If you are not happy with how target tracking policy works, you can use step or simple scaling policies instead.
For more advanced tuning of your scaling, you can use multiple scaling polices.
I have two Scale-out rules:
Scale-out-rule-1: Add 1 instance if YARNMemoryAvailablePercentage is less than 15 for 1 five-minute period with a cooldown of 300 seconds.
Scale-out-rule-2: Add 5 instance if ContainerPendingRatio is greater than 0.75 for 1 five-minute period with a cooldown of 300 seconds.
Here if both scenarios are matching,
does it process both rules? Any order?
if the only one rule processed then which one and why?
Appreciate comments on similar scanario for scale in(cluster scale down).
Q 1) does it process both rules? any order?
Only one rule will be processed when both rules triggered at same time, EC2 Auto Scaling chooses the policy that provides the largest capacity.
In your case "Scale-out-rule-2" will be processed as it adds 5 instances and "Scale-out-rule-1" will be suspended.
Reference: https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-scale-based-on-demand.html#multiple-scaling-policy-resolution
Q 2) if the only one rule processed then which one and why?
Explained above
I'd like to share my finding that I have learned
=== Two rules ====
Scaling-out Rule1:
Add 1 instance if YARNMemoryAvailablePercentage is < 15 for 1 five-minute period with a cooldown of 300 seconds.
Scaling-out Rule2:
Add 5 instance if ContainerPendingRatio is > .75 for 1 five-minute period with a cooldown of 300 seconds.
EMR cluster internally use "Amazon EC2 Auto Scaling" because EMR Instance group is also a group with EC2 instance fleet.
[1] https://docs.aws.amazon.com/autoscaling/ec2/userguide/what-is-amazon-ec2-auto-scaling.html
Consequently, its scaling out/in behavior follows that of "Amazon EC2 Auto Scaling". According to the doc[2], when these situations occur, Amazon EC2 Auto Scaling ( attached the EMR instance group ) chooses the policy that provides the largest capacity for both scale out and scale in. In this case, "ContainerPendingRatio" rule will be triggered because it adds 5 instances. You can find more details/reason in the doc[2]
[2] Multiple Scaling Policies
https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-scale-based-on-demand.html#multiple-scaling-policy-resolution
I had experiments after creating EMR cluster in my account, and I see the same result as expected.
I hope this helps you.
I dont think it is possible there is no such thing available in amazon docs.
Our website is hosted on AWS in a t2.small instance. User-facing sign-up is currently timing out.
Initially, I was getting a loadbalancer latency alarm notification for this instance, so I increased the limit, which seemed to work as a temporary solution.
However, once I increased the limit, I started getting 2 other alarm notifications, which were as follows:
1) production-remove-capacity-alarm
Description: None
Threshold: CPUUtilization <= 40 for 3 datapoints within 15 minutes
2) AWSEBCloudwatchAlarmLow
Description: ElasticBeanstalk Default Scale Down alarm
Threshold: NetworkOut < 2,000,000 for 1 datapoints within 5 minutes
It seems to me that I should simply change the alarm notifications so that I'm no longer alerted to #2, as I don't see how this is interfering with anything, but please correct me if I seem to be missing something.
Regarding #1, does it seem likely that somehow adjusting CPU Utilization in AWS will solve the timeout issue with website sign-up?
And if so, what specifically ought to be done?
Everything is okay. Don't panic.
The first priority is that your application operates correctly. Hopefully your adjustment to the instance type) satisfactorily fixed this (but it is still worth watching).
The above two alarms are basically saying:
CPU is under 40%
There's not a lot of network traffic
These alarms can be used to scale-in instances (reduce the number of instances) so that you are not paying for excess capacity. There would be similar alarms that let you scale-out (add additional instances).
ALARM simply means the check is True. That is, the condition has been satisfied. It does not necessarily indicate a problem.
I'm going to presume that you currently have only one instance running. If so, you can ignore those alarms (and Auto Scaling will ignore them too) because you are already at the minimum capacity.
If Auto Scaling has been configured to scale-out to more instances, these alarms would later scale-in to save you money. They're probably a bit trigger-happy, only looking at 15 minutes CPU and 5 minutes of network traffic — it would normally be better to wait for a longer period before deciding to remove capacity.
Bottom line: If your application is running correctly and you are only operating a single instance, there's nothing to worry about. It's all working as expected.
My primary requirement is as follows:
When CPU consumption on an instance exceeds 50 % then adjust capacity of autoscaling group to 5 instances, when CPU consumption exceeds 80% then adjust capacity to 10 instances.
However if I use cloudwatch alarms to set capacity I can imagine the following race condition:
5 instances exist
CPU consumption exceeds 80 %
Alarm is triggered
Capacity is changed to 19 instances
CPU consumption drops below 50 %
Eventually CPU consumption again exceeds 50% but now capacity will be changed to 5 instances (which is something I don't want to happen)
So what I would ideally like to happen is that in response to alarm triggers I would like to ensure that capacity is altleast the corresponding threshold.
I am aware that this can be done by manually setting the capacity through AWS SDK - which could be triggered in response to lifecycle events monitored by a supervisor, but is there a better approach, preferably one that does not require setting up additional supervisors or webhooks for alarms ?
A general approach is to fine grain the scaling actions:
Do not jump that big:
if the ASG avg CPU is over 70% > Add an instance
if the ASG avg CPU is over 90% > Add "n" instances
if the ASG avg CPU is under 40% > remove an instance
if the ASG avg CPU is under 10% > remove "n" instance
All of these values are the last 5 mins AVG. So if you have a really fast pike, you need more aggressive scaling. So in half an hour you can easily add 6 servers or even more.
Also scaling works better with higher numbers. So if your system needs only 1-3 instances, it may make sense to decrease the instance size so you can have 2-6 instances. It give some extra flexibility to your system.
But again, the question is, what is your expected load? Big pikes or an expected up and down during the day?
I would suggest looking into an AWS lambda function, triggered by an SNS message from cloudwatch - it should give you free reign to put as much logic into the scaling decision as you want.
Good Luck!
I have an auto scale group with triggers as follows:
Average CPU Utliziation > 90% scale up 1 instance
Average CPU Utilization < 25% scale down 1 instance
The metric is being calculated every 2 minutes and the breach limit is 10 minutes.
The problem I am experiencing is that the triggers are being triggered constantly it seems. The instances are being created and destroyed every 10 minutes. I have been monitoring the CPU Utilization and it never surpasses the scale up threshold. The maximium it hits is around 80% and this only happened 1 time, most of the time it is in the 20 to 25% range. I only have 1 instance running normally, but eveyr 10 minutes ELB will create a new instance, and soon after it will terminate it.
Any thing I am doing wrong here? Am I not understanding how the average CPU Utilization works?
The new EC2 instances are being created by Auto-Scaling (not Load Balancer).
There is a "Scaling History" tab in the Auto Scaling group that might provide some hints as to what is triggering the scale-out policy.
Check whether "Detailed Monitoring" is enabled on the Auto Scaling group and/or Launch Configuration -- this will cause metrics (eg CPU) to be collected every 1 minute instead of the default 5 minutes.
Check the setting on your CloudWatch chart to match the metric collection interval -- if metrics are being collected every minute, set the CloudWatch chart to 1-minute also. Otherwise, you might be viewing metrics at a lower "resolution" than the alarm itself.
Worst case, increase the timing settings for the Alarm, such as "Above 90% for 2 consecutive periods" rather than just one period.