What order are AWS AutoScaling policies applied in?

What order are AWS AutoScaling policies applied in? - amazon-web-services

I am planning on using AWS Autoscaling to scale my EC2 services, I have 4 policies that need to control my instance behavior, 2 for scale out and 2 for scale in. My question is what order will they be evaluated in? Scale out first then scale in? or vice-versa? Random? or something else?
Thank you,

Policies are not evaluated in an order. Each policy is compared against the metrics that policy is set up to measure, and takes actions based on the results.
For example, perhaps you have the following four policies:
Add 1 instance when an SQS queue depth is > 1000 messages
Remove 1 instance when the same SQS queue depth is < 200 messages
Add 1 instance when the average CPU of all instances in the autoscaling group is > 80%
Remove 1 instance when the average CPU of all instances in the autoscaling group is < 30%
As you can see, ordering doesn't make sense in this context. The appropriate action(s) will be executed whenever the conditions are met.
Note that without planning and testing you can encounter loops of instances that constantly cycle up and down. Drawing from the previous example, imagine that a new instance is launched because there are > 1000 messages in the queue. But the CPU usage is only 20% for all the instances, so then the 4th policy fires to remove an instance. Thus all the policies should be considered in concert.

Related

AWS auto-scaling desired capacity changed automatically and is not coming down

I have an auto-scaling group with two target scaling policies, one is based on SQS backlog and another on CPU-Utilization. I have initially set the capacities at minimum-1,desired-1,maximum-4. During the scale-out process, my desired capacity changed to 2 automatically. Now all the messages in the queues were successfully processed and the two scaling processes are requesting the ASG to scale in. But, the desired capacity is still at 2 and is not coming down. Is there a logic that I am missing? or do I need to do it manually?

How does AWS Cloudwatch alarms work when triggered together?

In my AWS elastic server setup, I have configured 4 Alarms
Add instance when CPU utilization > 20
Add instance when TargetResponceTime > 0.9
Remove instance when CPU utilization < 20
Remove instance when TargetResponceTime < 0.9
What will happen if two or more alarms triggered together?
For Example
If alarm 1 and 2 triggered together will it add two instances?
If alarm 1 and 4 triggered together will it remove an instance and add one or will it stay neutral?
The alarms are working fine, but I want to understand the mechanism behind alarm action execution.
Any Idea?

Your auto scaling group has a cooldown period, so technically multiple actions cannot occur at the same time. The next action would occur after the cooldown period has passed.
This functionality is to stop exactly what you're talking about, with multiple instances scaling at once.
I think personally for what you're doing you should be making use of a composite CloudWatch alarm. By having an OR condition these 4 alarms could become 2, which would reduce the number of alarms you have to trigger an autoscaling action.

Is there a way to scale in "instance" (part of ASG ) on certain custom metric?

I'm using the AutoScalingGroup to launch a group of EC2 instances. These instances are acting as workers which are continuously listening to SQS for any new request.
Requirement:
Do upscale on something like throughput (i.e Total number of messages present in SQS by total number instances).
And I want to downscale whenever any instance which is part of ASG is sitting idle (CPUIdle) for let's say more than 15 mins.
Note: I am not looking for any metric which applies as whole to a particular ASG (eg: Average CPU).
One way of doing that could be defining the custom metric and allowing it to trigger a cloudwatch alarm to do that.
Is there a better way to accomplish this?

If you are defining the scaling policy at instance level, then you defeating the entire purposes of ASG. If you need to scale based on changing conditions, such as the queue size, then you can configure ASG based on the conditions specified here
https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-using-sqs-queue.html
A custom metric to send to Amazon CloudWatch that measures the number of messages in the queue per EC2 instance in the Auto Scaling group.
A target tracking policy that configures your Auto Scaling group to scale based on the >custom metric and a set target value. CloudWatch alarms invoke the scaling policy.
If you know a specific time window when the queue size goes up or down, you can also scale based on schedule.
You can always start with a very low instance count in ASG and set the desired capacity as such (say 1) and scale up based on queue, so you can continue using ASG policies.

AutoScaling Based on Comparing Query Metrics

I have a not-so-complicated situation but it can be complicated on AWS cloudformations:
I would like to autoscale up and down based on the number of messages on SQS.
But I am not sure what I need to specify on AWS cloudformation, I would imagine that I would need:
some sort of lambda/cloudformation that perform query on the current number of instances on AutoScalingGroup
some sort of lambda/cloudformation that perform query on the current number of messages on SQS.
some comparison operations that compares #1 and #2.
create scale up policy when #1 < #2
create scale down policy when #1 > #2
Not sure where I should get started... can someone kind enough to show some examples?

You have several different concepts all mixed together (CloudFormation, Auto Scaling, Lambda). It is best to keep things simple, at least for an initial deployment. You can then automate it with CloudFormation later.
The most difficult part of Auto Scaling is actually determining the best Scaling Policies to use. A general rule is to quickly add capacity when it is needed, and then slowly remove capacity when it is no longer needed. This way, you can avoid churn, where instances are added and removed within short spaces of time.
The simplest setup would be:
Scale-out when the queue size is larger than X (To be determined by testing)
Scale-in when the queue is empty (You can later tweak this to be more efficient)
Use the ApproximateNumberOfMessagesVisible metric for your scaling policies. (See Amazon SQS Metrics and Dimensions). It provides a count of messages waiting to be processed. However, I have seen situations where a zero count is not actually sent as a metric, so also trigger your scale-in policy on an alarm status of INSUFFICIENT_DATA, which also means that the queue is empty.
There is no need to use AWS Lambda functions unless you have very complex requirements for when to scale.
If your requests come on a regular basis throughout the day, set the minimum to one instance to always have capacity available.
If your requests are infrequent (and there could be several hours with no requests coming in), then set the minimum to zero instances so you save money.
You will need to experiment to determine the best queue size that should trigger a scale-out event. This depends upon how frequently the messages arrive and how long they take to process. You can also experiment with the Instance Type -- figure out whether it is better to have many smaller (eg T2) instances, or fewer larger instances (eg M4 or C4, depending upon need).
If you do not need to process the requests within a short time period (that is, you can be a little late sometimes), you could consider using spot pricing that will dramatically lower your costs, with the potential to occasionally have no instances running due to a high spot price. (Or, just bid high and accept that occasionally you'll pay more than on-demand prices but in general you will save considerable costs.)
Create all of the above manually in the console, then experiment and measure results. Once it is finalized, you can then implement it as a CloudFormation stack if desired.
Update:
The Auto Scaling screens will only create an alarm based on EC2. To create an alarm on a different metric, first create the alarm, then put it in the policy.
To add a rule based on an Amazon SQS queue:
Create an SQS queue
Put a message in the queue (otherwise the metrics will not flow through to CloudWatch)
Create an alarm in CloudWatch based on the ApproximateNumberOfMessagesVisible metric (which will appear after a few minutes)
Edit your Auto Scaling policies to use the above alarm

AWS EC2 Auto Scaling Groups: I get Min and Max, but what's Desired instances limit for?

When you setup an Auto Scaling groups in AWS EC2 Min and Max bounds seem to make sense:
The minimum number of instances to scale down to based on policies
The maximum number of instances to scale up to based on policies
However, I've never been able to wrap my head around what the heck Desired is intended to affect.
I've always just set Desired equal to Min, because generally, I want to pay Amazon the minimum tithe possible, and unless you need an instance to handle load it should be at the Min number of instances.
I know if you use ElasticBeanstalk and set a Min to 1 and Max to 2 it sets a Desired to 2 (of course!)--you can't choose a value for Desired.
What would be the use case for a different Desired number of instances and how does it differ? When you expect AWS to scale lower than your Desired if desired is larger than Min?

Here are the explanations for the "min, desired and max" values from AWS support:
MIN: This will be the minimum number of instances that can run in your
auto scale group. If your scale down CloudWatch alarm is triggered,
your auto scale group will never terminate instances below this number
DESIRED: If you trip a CloudWatch alarm for a scale up event, then it
will notify the auto scaler to change it's desired to a specified
higher amount and the auto scaler will start an instance/s to meet
that number. If you trip a CloudWatch alarm to scale down, then it
will change the auto scaler desired to a specified lower number and
the auto scaler will terminate instance/s to get to that number.
MAX: This will be the maximum number of instances that you can run in
your auto scale group. If your scale up CloudWatch alarm stays
triggered, your auto scale group will never create instances more than
the maximum amount specified.

Think about it like a sliding range UI element.
With min and max, you are setting the lower bound of your instance scaling. Withe desired capacity, you are setting what you'd currently like the instance count to hover.
Example:
You know your application will have heavy load due to a marketing email or product launch...simply scale up your desired capacity beforehand:
aws autoscaling set-desired-capacity --auto-scaling-group-name my-auto-scaling-group --desired-capacity 2 --honor-cooldown
Source

"Desired" is (necessarily) ambiguous.
It means the "initial" number of instances. Why not just "initial" then? Because the number may change by autoscaling events.
So it means "current" number of instance. Why not just "current" then? Because during an autoscaling event, instances will start / terminate. Those instances do not count towards "current" number of instances. By "current", a user expects instances that are operate-able.
So it means "target" number of instance. Why not just "target" then? I guess "target" is just as good (ambiguous) as "desired"...

When you expect AWS to scale lower than your Desired if desired is
larger than Min?
This happens when you set a CloudWatch alarm based on some AutoScaling policy. Whenever that alarm is triggered it will update the DesiredCount to whatever is mentioned in config.
e.g., If an AutoScalingGroup config has Min=1, Desired=3, Max=5 and there is an Alarm set on an AutoScalingPolicy which says if CPU usage is <50% for consecutive 10 mins then Remove 1 instances then it will keep reducing the instance count by 1 whenever the alarm is triggered until the DesiredCount = MinCount.
Lessons Learnt: Set the MinCount to be > 0 or = DesiredCount. This will make sure that the application is not brought down when the mincount=0 and CPU usage goes down.

In layman's terms, DesiredCapacity value is automatically updated on scale-in and scale-out events.
In other words,
Scale-in or Scale-out are done by decreasing or increasing the DesiredCapacity value.

Desired capacity simply means the number of instances that will come up / fired up when you launch the autoscaling. That means if desired capacity = 4, then 4 instances will keep on running until and unless any scale up or scale down event triggers. If scale up event occurs, the number of instances will go up till maximum capacity and if scale down event occurs it will go down till the minimum capacity.
Correct me if wrong, thanks.

I noticed that desired capacity went down and no new instance came up when
I set one of the instances to standby. It kept on running but was detached from ELB ( requests were not forwarded to that particular instance when accessed via ELB DNS ). No new instance has been initiated by AWS. Rather desired capacity was decreased by 1.
When I changed the state of instance ( from standby ) the instance was again attached to ELB ( the instance started to get requests when accessed via ELB DNS ). The desired capacity was increased by 1 and became 2.
Hence it seems no of instances attached to ELB can't cross the threshold limit set by min and max but the desired capacity is adjusted or changed automatically based on the occurrence of scale in or scale out event. It was definitely something unknown to me.
It might be a way to let AWS know that this is the desired capacity required for the respective ELB at a given point in time.

Min and max is self explanatory but desired was confusing until i have attached Target Tracking Auto scaling policy with the ASG where CPU utilization was the target metric. Here, desired instances were scaled out and scaled in based on target CPU utilization. If any desired count are placed through cloudformation/manual, for time being ASG will create same number of instances as desired count. But later ASG policy will automatically adjust the desire instances based on target CPU utilization.

Desired is what we start initially. It will go to min or max depending on the scale-in / scale-out.

I liked the analogy with a slider to understand this - https://stackoverflow.com/a/36272945/10779109
Think of min and max as the maximum allowed brightness on a screen. You probably don't want to min to be 0 in that case (sidenote). The desired quantity keeps changing based on the env (in the case of ASG, it depends on the scaling policies).
For instance, if the following check runs every hour, this is where desired quantity is required.
if low_load(<CPU or Mem etc>) and desired_capacity>= min_capacity:
desired_capacity = desired_capacity-1
Max capacity can also be understood in the same way where you'd want to keep increasing the desired quantity based on a cloudwatch_alarm (or any scaling policy) up to the max capacity.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js