Repeated AWS EC2 Autoscaling - amazon-web-services

I was looking at the AWS EC2 Auto-Scaling with Cloud Watch feature.
What I cannot glean is if the auto scaling can be applied only once or repeatedly? I think only once though.
E.g. scale from 2 to 4 max if, say, 60% cpu reached.
Then, what if, having gone from 2 to 4, and then 60% cpu reached again according to Cloud Watch target rules, can we reapply autoscaling again?
I saw a "step" option, but that seems to work with different target values. Could the step have repeated 60% configuration or does it need 60, 70% and so on? This aspect is not entirely clear.
Or is such an iterative approach simply not possible? Implying one needs to re-baseline the initial allocations? I think not based on the step approach.

Autoscaling will be triggered by CloudWatch alarms when your alarm will breach your threshold.
So if your policy is to add 50% capacity when average CPU across the fleet is above 60% for more than 15min, the alarm will trigger a change in your scaling group, increasing the value of your "desired capacity" property, let's say going from 2 to 3 instances. Autoscaling will then react and bring the fleet to "desired capacity"
If, later, the same alarm is still on, a new change will be triggered to your scaling group, bringing desired capacity to 4. Then AutoScaling will create EC2 instances to bring your fleet to the new desired capacity.
And so on.
Of course, you do not want to increase the fleet size above your budget. So you can define a MAX fleet size, that AutoScaling will never go above. If your max fleet size is 3, the second alarm in my example will leave desired capacity as it is (3) because new desired capacity > max capacity. This will end up the scaling process.
To go back to normal, you must also create scale in policies, in addition of your scale out policies : i.e. when the average CPU on my fleet is below 15% for 1h, let's remove an instance. Your CloudWatch alarm will trigger, causing a change in the desired capacity of your fleet and AutoScaling will adjust (kill instances) to go to the new desired capacity value.
Of course, you do not want AutoScaling to kill the last instance from your fleet (going from desired capacity of 1 to desired capacity of 0), so, similarly to a MAX Capacity value, you also define a MIN capacity, i.e. the minimum number of instances to keep in your fleet, whatever alarm are triggering and trying to change the desired capacity.

Related

Is there a way to scale in "instance" (part of ASG ) on certain custom metric?

I'm using the AutoScalingGroup to launch a group of EC2 instances. These instances are acting as workers which are continuously listening to SQS for any new request.
Requirement:
Do upscale on something like throughput (i.e Total number of messages present in SQS by total number instances).
And I want to downscale whenever any instance which is part of ASG is sitting idle (CPUIdle) for let's say more than 15 mins.
Note: I am not looking for any metric which applies as whole to a particular ASG (eg: Average CPU).
One way of doing that could be defining the custom metric and allowing it to trigger a cloudwatch alarm to do that.
Is there a better way to accomplish this?
If you are defining the scaling policy at instance level, then you defeating the entire purposes of ASG. If you need to scale based on changing conditions, such as the queue size, then you can configure ASG based on the conditions specified here
https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-using-sqs-queue.html
A custom metric to send to Amazon CloudWatch that measures the number of messages in the queue per EC2 instance in the Auto Scaling group.
A target tracking policy that configures your Auto Scaling group to scale based on the >custom metric and a set target value. CloudWatch alarms invoke the scaling policy.
If you know a specific time window when the queue size goes up or down, you can also scale based on schedule.
You can always start with a very low instance count in ASG and set the desired capacity as such (say 1) and scale up based on queue, so you can continue using ASG policies.

Testing AWS spot instance provisioning

My company is looking to switch to using Spot pricing when provisioning EC2 instances. I've been tasked with writing some unit tests that test things such as:
Our Spot instance count in at a certain threshold
When that threshold isn't met on demand replacements are brought up to replace
them
I'm not an adept tester and haven't had much exposure to AWS on the whole. So my question is what approach, tools, software could I use to begin implementing this? My initial thinking is to write a bash script with AWS CLI commands and go from there.
Any pointers or recommendations would be greatly appreciated!
I thought about this a little and I would recommend you have two auto scaling groups, one for spot instances and one for on-demand instances. For the spot instances auto scaling group you would essentially set your desired capacity. For the on demand auto scaling group you would simply set the min and max to 0.
Next you would setup two cloud watch alarms. One would be for GroupInServiceInstances less than whatever maximum you declared. This would be set to on by default. Another would be GroupInServiceInstances equal to the maximum you declared. This would be set to off by default.
Now when the GroupInServiceInstancesalarm for instances less than your desired maximum goes off it would invoke a lambda function. This lambda function would do the following:
Enable the GroupInServiceInstances equal to your maximum capacity alarm
Disable the GroupInServiceInstances less than your desired capacity alarm
Call the auto scaling group API to get ( max instances - currently running instances )
Set the min and max instances in the on demand auto scaling group to whatever that value is
It also would be a good idea to setup a simple notification service topic that emails someone when the spot instance auto scaling group has an insufficient number of instances after X amount of time. That lets you decide if you need to rework the spot prices.
Now when the GroupInServiceInstances equal to your maximum desired capacity alarm goes off, it will invoke a lambda function to do the following:
Enable the GroupInServiceInstances less than desired alarm
Disable the GroupInServiceInstances equal to desired alarm
Set the min and max on demand auto scaling group instances desired to 0
This will essentially terminate all the instances in the on demand auto scaling group so you can revert back to using the (hopefully) lower cost spot instances
This solution does require knowledge of Lambda, but I think it ends up a lot more automated and reduces the additional logic a CLI script would require.

AWS EC2 Auto Scaling Groups: I get Min and Max, but what's Desired instances limit for?

When you setup an Auto Scaling groups in AWS EC2 Min and Max bounds seem to make sense:
The minimum number of instances to scale down to based on policies
The maximum number of instances to scale up to based on policies
However, I've never been able to wrap my head around what the heck Desired is intended to affect.
I've always just set Desired equal to Min, because generally, I want to pay Amazon the minimum tithe possible, and unless you need an instance to handle load it should be at the Min number of instances.
I know if you use ElasticBeanstalk and set a Min to 1 and Max to 2 it sets a Desired to 2 (of course!)--you can't choose a value for Desired.
What would be the use case for a different Desired number of instances and how does it differ? When you expect AWS to scale lower than your Desired if desired is larger than Min?
Here are the explanations for the "min, desired and max" values from AWS support:
MIN: This will be the minimum number of instances that can run in your
auto scale group. If your scale down CloudWatch alarm is triggered,
your auto scale group will never terminate instances below this number
DESIRED: If you trip a CloudWatch alarm for a scale up event, then it
will notify the auto scaler to change it's desired to a specified
higher amount and the auto scaler will start an instance/s to meet
that number. If you trip a CloudWatch alarm to scale down, then it
will change the auto scaler desired to a specified lower number and
the auto scaler will terminate instance/s to get to that number.
MAX: This will be the maximum number of instances that you can run in
your auto scale group. If your scale up CloudWatch alarm stays
triggered, your auto scale group will never create instances more than
the maximum amount specified.
Think about it like a sliding range UI element.
With min and max, you are setting the lower bound of your instance scaling. Withe desired capacity, you are setting what you'd currently like the instance count to hover.
Example:
You know your application will have heavy load due to a marketing email or product launch...simply scale up your desired capacity beforehand:
aws autoscaling set-desired-capacity --auto-scaling-group-name my-auto-scaling-group --desired-capacity 2 --honor-cooldown
Source
"Desired" is (necessarily) ambiguous.
It means the "initial" number of instances. Why not just "initial" then? Because the number may change by autoscaling events.
So it means "current" number of instance. Why not just "current" then? Because during an autoscaling event, instances will start / terminate. Those instances do not count towards "current" number of instances. By "current", a user expects instances that are operate-able.
So it means "target" number of instance. Why not just "target" then? I guess "target" is just as good (ambiguous) as "desired"...
When you expect AWS to scale lower than your Desired if desired is
larger than Min?
This happens when you set a CloudWatch alarm based on some AutoScaling policy. Whenever that alarm is triggered it will update the DesiredCount to whatever is mentioned in config.
e.g., If an AutoScalingGroup config has Min=1, Desired=3, Max=5 and there is an Alarm set on an AutoScalingPolicy which says if CPU usage is <50% for consecutive 10 mins then Remove 1 instances then it will keep reducing the instance count by 1 whenever the alarm is triggered until the DesiredCount = MinCount.
Lessons Learnt: Set the MinCount to be > 0 or = DesiredCount. This will make sure that the application is not brought down when the mincount=0 and CPU usage goes down.
In layman's terms, DesiredCapacity value is automatically updated on scale-in and scale-out events.
In other words,
Scale-in or Scale-out are done by decreasing or increasing the DesiredCapacity value.
Desired capacity simply means the number of instances that will come up / fired up when you launch the autoscaling. That means if desired capacity = 4, then 4 instances will keep on running until and unless any scale up or scale down event triggers. If scale up event occurs, the number of instances will go up till maximum capacity and if scale down event occurs it will go down till the minimum capacity.
Correct me if wrong, thanks.
I noticed that desired capacity went down and no new instance came up when
I set one of the instances to standby. It kept on running but was detached from ELB ( requests were not forwarded to that particular instance when accessed via ELB DNS ). No new instance has been initiated by AWS. Rather desired capacity was decreased by 1.
When I changed the state of instance ( from standby ) the instance was again attached to ELB ( the instance started to get requests when accessed via ELB DNS ). The desired capacity was increased by 1 and became 2.
Hence it seems no of instances attached to ELB can't cross the threshold limit set by min and max but the desired capacity is adjusted or changed automatically based on the occurrence of scale in or scale out event. It was definitely something unknown to me.
It might be a way to let AWS know that this is the desired capacity required for the respective ELB at a given point in time.
Min and max is self explanatory but desired was confusing until i have attached Target Tracking Auto scaling policy with the ASG where CPU utilization was the target metric. Here, desired instances were scaled out and scaled in based on target CPU utilization. If any desired count are placed through cloudformation/manual, for time being ASG will create same number of instances as desired count. But later ASG policy will automatically adjust the desire instances based on target CPU utilization.
Desired is what we start initially. It will go to min or max depending on the scale-in / scale-out.
I liked the analogy with a slider to understand this - https://stackoverflow.com/a/36272945/10779109
Think of min and max as the maximum allowed brightness on a screen. You probably don't want to min to be 0 in that case (sidenote). The desired quantity keeps changing based on the env (in the case of ASG, it depends on the scaling policies).
For instance, if the following check runs every hour, this is where desired quantity is required.
if low_load(<CPU or Mem etc>) and desired_capacity>= min_capacity:
desired_capacity = desired_capacity-1
Max capacity can also be understood in the same way where you'd want to keep increasing the desired quantity based on a cloudwatch_alarm (or any scaling policy) up to the max capacity.

How to configure EC2 autoscaling based on multiple limits on same metric?

My primary requirement is as follows:
When CPU consumption on an instance exceeds 50 % then adjust capacity of autoscaling group to 5 instances, when CPU consumption exceeds 80% then adjust capacity to 10 instances.
However if I use cloudwatch alarms to set capacity I can imagine the following race condition:
5 instances exist
CPU consumption exceeds 80 %
Alarm is triggered
Capacity is changed to 19 instances
CPU consumption drops below 50 %
Eventually CPU consumption again exceeds 50% but now capacity will be changed to 5 instances (which is something I don't want to happen)
So what I would ideally like to happen is that in response to alarm triggers I would like to ensure that capacity is altleast the corresponding threshold.
I am aware that this can be done by manually setting the capacity through AWS SDK - which could be triggered in response to lifecycle events monitored by a supervisor, but is there a better approach, preferably one that does not require setting up additional supervisors or webhooks for alarms ?
A general approach is to fine grain the scaling actions:
Do not jump that big:
if the ASG avg CPU is over 70% > Add an instance
if the ASG avg CPU is over 90% > Add "n" instances
if the ASG avg CPU is under 40% > remove an instance
if the ASG avg CPU is under 10% > remove "n" instance
All of these values are the last 5 mins AVG. So if you have a really fast pike, you need more aggressive scaling. So in half an hour you can easily add 6 servers or even more.
Also scaling works better with higher numbers. So if your system needs only 1-3 instances, it may make sense to decrease the instance size so you can have 2-6 instances. It give some extra flexibility to your system.
But again, the question is, what is your expected load? Big pikes or an expected up and down during the day?
I would suggest looking into an AWS lambda function, triggered by an SNS message from cloudwatch - it should give you free reign to put as much logic into the scaling decision as you want.
Good Luck!

What order are AWS AutoScaling policies applied in?

I am planning on using AWS Autoscaling to scale my EC2 services, I have 4 policies that need to control my instance behavior, 2 for scale out and 2 for scale in. My question is what order will they be evaluated in? Scale out first then scale in? or vice-versa? Random? or something else?
Thank you,
Policies are not evaluated in an order. Each policy is compared against the metrics that policy is set up to measure, and takes actions based on the results.
For example, perhaps you have the following four policies:
Add 1 instance when an SQS queue depth is > 1000 messages
Remove 1 instance when the same SQS queue depth is < 200 messages
Add 1 instance when the average CPU of all instances in the autoscaling group is > 80%
Remove 1 instance when the average CPU of all instances in the autoscaling group is < 30%
As you can see, ordering doesn't make sense in this context. The appropriate action(s) will be executed whenever the conditions are met.
Note that without planning and testing you can encounter loops of instances that constantly cycle up and down. Drawing from the previous example, imagine that a new instance is launched because there are > 1000 messages in the queue. But the CPU usage is only 20% for all the instances, so then the 4th policy fires to remove an instance. Thus all the policies should be considered in concert.