My company is looking to switch to using Spot pricing when provisioning EC2 instances. I've been tasked with writing some unit tests that test things such as:
Our Spot instance count in at a certain threshold
When that threshold isn't met on demand replacements are brought up to replace
them
I'm not an adept tester and haven't had much exposure to AWS on the whole. So my question is what approach, tools, software could I use to begin implementing this? My initial thinking is to write a bash script with AWS CLI commands and go from there.
Any pointers or recommendations would be greatly appreciated!
I thought about this a little and I would recommend you have two auto scaling groups, one for spot instances and one for on-demand instances. For the spot instances auto scaling group you would essentially set your desired capacity. For the on demand auto scaling group you would simply set the min and max to 0.
Next you would setup two cloud watch alarms. One would be for GroupInServiceInstances less than whatever maximum you declared. This would be set to on by default. Another would be GroupInServiceInstances equal to the maximum you declared. This would be set to off by default.
Now when the GroupInServiceInstancesalarm for instances less than your desired maximum goes off it would invoke a lambda function. This lambda function would do the following:
Enable the GroupInServiceInstances equal to your maximum capacity alarm
Disable the GroupInServiceInstances less than your desired capacity alarm
Call the auto scaling group API to get ( max instances - currently running instances )
Set the min and max instances in the on demand auto scaling group to whatever that value is
It also would be a good idea to setup a simple notification service topic that emails someone when the spot instance auto scaling group has an insufficient number of instances after X amount of time. That lets you decide if you need to rework the spot prices.
Now when the GroupInServiceInstances equal to your maximum desired capacity alarm goes off, it will invoke a lambda function to do the following:
Enable the GroupInServiceInstances less than desired alarm
Disable the GroupInServiceInstances equal to desired alarm
Set the min and max on demand auto scaling group instances desired to 0
This will essentially terminate all the instances in the on demand auto scaling group so you can revert back to using the (hopefully) lower cost spot instances
This solution does require knowledge of Lambda, but I think it ends up a lot more automated and reduces the additional logic a CLI script would require.
Related
I am creating a Disaster recovery solution in AWS. For second (fallback) region i want to have only 1 EC2 instance to minimise cost. In case of disaster i would like to know if it's possible to write a lambda function in the second region that increases the desired capacity of the auto Scaling group to some number.
To achieve this i can subscribe the function to the health check alarm SNS topic.
I would like to know if there is a API to autoscale a ec2 group from Lambda and what sort of roles/permissions is needed ?
Yes this is entirely possible.
In Boto3 you can use the update_autoscaling_group function and specify the MinSize, MaxSize and DesiredCapacity. By doing this you would be able to adjust the values to match what you expect them to be at.
Alternatively you could have the minimum capacity as 1 and the maximum capacity as whatever it should be, if the alarms never trigger it would never scale. You could simply then call the set_desired_capacity to set the number of instances to a specific count.
The permissions for these options are as follows:
autoscaling:SetDesiredCapacity
autoscaling:UpdateAutoScalingGroup
I was looking at the AWS EC2 Auto-Scaling with Cloud Watch feature.
What I cannot glean is if the auto scaling can be applied only once or repeatedly? I think only once though.
E.g. scale from 2 to 4 max if, say, 60% cpu reached.
Then, what if, having gone from 2 to 4, and then 60% cpu reached again according to Cloud Watch target rules, can we reapply autoscaling again?
I saw a "step" option, but that seems to work with different target values. Could the step have repeated 60% configuration or does it need 60, 70% and so on? This aspect is not entirely clear.
Or is such an iterative approach simply not possible? Implying one needs to re-baseline the initial allocations? I think not based on the step approach.
Autoscaling will be triggered by CloudWatch alarms when your alarm will breach your threshold.
So if your policy is to add 50% capacity when average CPU across the fleet is above 60% for more than 15min, the alarm will trigger a change in your scaling group, increasing the value of your "desired capacity" property, let's say going from 2 to 3 instances. Autoscaling will then react and bring the fleet to "desired capacity"
If, later, the same alarm is still on, a new change will be triggered to your scaling group, bringing desired capacity to 4. Then AutoScaling will create EC2 instances to bring your fleet to the new desired capacity.
And so on.
Of course, you do not want to increase the fleet size above your budget. So you can define a MAX fleet size, that AutoScaling will never go above. If your max fleet size is 3, the second alarm in my example will leave desired capacity as it is (3) because new desired capacity > max capacity. This will end up the scaling process.
To go back to normal, you must also create scale in policies, in addition of your scale out policies : i.e. when the average CPU on my fleet is below 15% for 1h, let's remove an instance. Your CloudWatch alarm will trigger, causing a change in the desired capacity of your fleet and AutoScaling will adjust (kill instances) to go to the new desired capacity value.
Of course, you do not want AutoScaling to kill the last instance from your fleet (going from desired capacity of 1 to desired capacity of 0), so, similarly to a MAX Capacity value, you also define a MIN capacity, i.e. the minimum number of instances to keep in your fleet, whatever alarm are triggering and trying to change the desired capacity.
I'm using the javascript aws-sdk. I have a large file I am about to process and I want to ensure that the normally low number of instances in my beanstalk are boosted up before I begin processing.
How can I trigger a scaling event to guarantee that there are at least 8 instances and wait for that to complete before I begin the next step of my process?
I would accept any non-javascript examples that have analagous api's in the javascript sdk (I can translate it).
You could call updateAutoScalingGroup to modify the minimum number of instances in the Auto Scaling group. When finished, set the value back to the previous minimum value. The Auto Scaling group will eventually adjust itself to the appropriate level.
Alternatively, you could simply setDesiredCapacity to tell Auto Scaling exactly how many instances it should have running. However, the scaling policies might then reduce this quantity if there is insufficient work taking place.
In conjuction with setDesiredCapacity, you could call suspendProcesses to prevent termination of instances during your extra-workload period. Then, when you have finished the extra workload, simply resumeProcesses and the levels will automatically adjust.
I have a not-so-complicated situation but it can be complicated on AWS cloudformations:
I would like to autoscale up and down based on the number of messages on SQS.
But I am not sure what I need to specify on AWS cloudformation, I would imagine that I would need:
some sort of lambda/cloudformation that perform query on the current number of instances on AutoScalingGroup
some sort of lambda/cloudformation that perform query on the current number of messages on SQS.
some comparison operations that compares #1 and #2.
create scale up policy when #1 < #2
create scale down policy when #1 > #2
Not sure where I should get started... can someone kind enough to show some examples?
You have several different concepts all mixed together (CloudFormation, Auto Scaling, Lambda). It is best to keep things simple, at least for an initial deployment. You can then automate it with CloudFormation later.
The most difficult part of Auto Scaling is actually determining the best Scaling Policies to use. A general rule is to quickly add capacity when it is needed, and then slowly remove capacity when it is no longer needed. This way, you can avoid churn, where instances are added and removed within short spaces of time.
The simplest setup would be:
Scale-out when the queue size is larger than X (To be determined by testing)
Scale-in when the queue is empty (You can later tweak this to be more efficient)
Use the ApproximateNumberOfMessagesVisible metric for your scaling policies. (See Amazon SQS Metrics and Dimensions). It provides a count of messages waiting to be processed. However, I have seen situations where a zero count is not actually sent as a metric, so also trigger your scale-in policy on an alarm status of INSUFFICIENT_DATA, which also means that the queue is empty.
There is no need to use AWS Lambda functions unless you have very complex requirements for when to scale.
If your requests come on a regular basis throughout the day, set the minimum to one instance to always have capacity available.
If your requests are infrequent (and there could be several hours with no requests coming in), then set the minimum to zero instances so you save money.
You will need to experiment to determine the best queue size that should trigger a scale-out event. This depends upon how frequently the messages arrive and how long they take to process. You can also experiment with the Instance Type -- figure out whether it is better to have many smaller (eg T2) instances, or fewer larger instances (eg M4 or C4, depending upon need).
If you do not need to process the requests within a short time period (that is, you can be a little late sometimes), you could consider using spot pricing that will dramatically lower your costs, with the potential to occasionally have no instances running due to a high spot price. (Or, just bid high and accept that occasionally you'll pay more than on-demand prices but in general you will save considerable costs.)
Create all of the above manually in the console, then experiment and measure results. Once it is finalized, you can then implement it as a CloudFormation stack if desired.
Update:
The Auto Scaling screens will only create an alarm based on EC2. To create an alarm on a different metric, first create the alarm, then put it in the policy.
To add a rule based on an Amazon SQS queue:
Create an SQS queue
Put a message in the queue (otherwise the metrics will not flow through to CloudWatch)
Create an alarm in CloudWatch based on the ApproximateNumberOfMessagesVisible metric (which will appear after a few minutes)
Edit your Auto Scaling policies to use the above alarm
When you setup an Auto Scaling groups in AWS EC2 Min and Max bounds seem to make sense:
The minimum number of instances to scale down to based on policies
The maximum number of instances to scale up to based on policies
However, I've never been able to wrap my head around what the heck Desired is intended to affect.
I've always just set Desired equal to Min, because generally, I want to pay Amazon the minimum tithe possible, and unless you need an instance to handle load it should be at the Min number of instances.
I know if you use ElasticBeanstalk and set a Min to 1 and Max to 2 it sets a Desired to 2 (of course!)--you can't choose a value for Desired.
What would be the use case for a different Desired number of instances and how does it differ? When you expect AWS to scale lower than your Desired if desired is larger than Min?
Here are the explanations for the "min, desired and max" values from AWS support:
MIN: This will be the minimum number of instances that can run in your
auto scale group. If your scale down CloudWatch alarm is triggered,
your auto scale group will never terminate instances below this number
DESIRED: If you trip a CloudWatch alarm for a scale up event, then it
will notify the auto scaler to change it's desired to a specified
higher amount and the auto scaler will start an instance/s to meet
that number. If you trip a CloudWatch alarm to scale down, then it
will change the auto scaler desired to a specified lower number and
the auto scaler will terminate instance/s to get to that number.
MAX: This will be the maximum number of instances that you can run in
your auto scale group. If your scale up CloudWatch alarm stays
triggered, your auto scale group will never create instances more than
the maximum amount specified.
Think about it like a sliding range UI element.
With min and max, you are setting the lower bound of your instance scaling. Withe desired capacity, you are setting what you'd currently like the instance count to hover.
Example:
You know your application will have heavy load due to a marketing email or product launch...simply scale up your desired capacity beforehand:
aws autoscaling set-desired-capacity --auto-scaling-group-name my-auto-scaling-group --desired-capacity 2 --honor-cooldown
Source
"Desired" is (necessarily) ambiguous.
It means the "initial" number of instances. Why not just "initial" then? Because the number may change by autoscaling events.
So it means "current" number of instance. Why not just "current" then? Because during an autoscaling event, instances will start / terminate. Those instances do not count towards "current" number of instances. By "current", a user expects instances that are operate-able.
So it means "target" number of instance. Why not just "target" then? I guess "target" is just as good (ambiguous) as "desired"...
When you expect AWS to scale lower than your Desired if desired is
larger than Min?
This happens when you set a CloudWatch alarm based on some AutoScaling policy. Whenever that alarm is triggered it will update the DesiredCount to whatever is mentioned in config.
e.g., If an AutoScalingGroup config has Min=1, Desired=3, Max=5 and there is an Alarm set on an AutoScalingPolicy which says if CPU usage is <50% for consecutive 10 mins then Remove 1 instances then it will keep reducing the instance count by 1 whenever the alarm is triggered until the DesiredCount = MinCount.
Lessons Learnt: Set the MinCount to be > 0 or = DesiredCount. This will make sure that the application is not brought down when the mincount=0 and CPU usage goes down.
In layman's terms, DesiredCapacity value is automatically updated on scale-in and scale-out events.
In other words,
Scale-in or Scale-out are done by decreasing or increasing the DesiredCapacity value.
Desired capacity simply means the number of instances that will come up / fired up when you launch the autoscaling. That means if desired capacity = 4, then 4 instances will keep on running until and unless any scale up or scale down event triggers. If scale up event occurs, the number of instances will go up till maximum capacity and if scale down event occurs it will go down till the minimum capacity.
Correct me if wrong, thanks.
I noticed that desired capacity went down and no new instance came up when
I set one of the instances to standby. It kept on running but was detached from ELB ( requests were not forwarded to that particular instance when accessed via ELB DNS ). No new instance has been initiated by AWS. Rather desired capacity was decreased by 1.
When I changed the state of instance ( from standby ) the instance was again attached to ELB ( the instance started to get requests when accessed via ELB DNS ). The desired capacity was increased by 1 and became 2.
Hence it seems no of instances attached to ELB can't cross the threshold limit set by min and max but the desired capacity is adjusted or changed automatically based on the occurrence of scale in or scale out event. It was definitely something unknown to me.
It might be a way to let AWS know that this is the desired capacity required for the respective ELB at a given point in time.
Min and max is self explanatory but desired was confusing until i have attached Target Tracking Auto scaling policy with the ASG where CPU utilization was the target metric. Here, desired instances were scaled out and scaled in based on target CPU utilization. If any desired count are placed through cloudformation/manual, for time being ASG will create same number of instances as desired count. But later ASG policy will automatically adjust the desire instances based on target CPU utilization.
Desired is what we start initially. It will go to min or max depending on the scale-in / scale-out.
I liked the analogy with a slider to understand this - https://stackoverflow.com/a/36272945/10779109
Think of min and max as the maximum allowed brightness on a screen. You probably don't want to min to be 0 in that case (sidenote). The desired quantity keeps changing based on the env (in the case of ASG, it depends on the scaling policies).
For instance, if the following check runs every hour, this is where desired quantity is required.
if low_load(<CPU or Mem etc>) and desired_capacity>= min_capacity:
desired_capacity = desired_capacity-1
Max capacity can also be understood in the same way where you'd want to keep increasing the desired quantity based on a cloudwatch_alarm (or any scaling policy) up to the max capacity.