AWS controlling which instance is scaled-in when autoscaling - amazon-web-services

I want to have a scale-in policy where nodes are scaled-in when the CPU is under 30%.
How can I control that the individual instance selected for scaling-in is the one with lowest CPU or at least a CPU under 30%?

By default, Auto Scaling first identifies which Availability Zone has the most instances (or, if they are equal, it picks a random AZ). Then, it uses the Termination Policy to determine which instance to control which instance Auto Scaling terminates during scale-in.
However, Auto Scaling does not select an instance based upon how 'busy' that particular is -- after all, that instance might be more busy, or less busy, a few seconds later.
Fortunately, Auto Scaling uses connection draining to allow in-flight requests to complete before the instance is terminated. Therefore, in theory, it doesn't matter that the instance is temporarily busy.
If you have long-running tasks on an instance that you don't want interrupted, you can configure Auto Scaling Lifecycle Hooks to move instances into a Terminating:Wait state. The instances will not receive any new traffic. Your application can then signal when the long-running task has completed (eg copying log files to S3, or finishing video rendering) and Auto Scaling will then terminate the instance.
Finally, if you want more fine-grained control over which instance will be terminated, you (or your application) can specifically Detach EC2 Instances From Your Auto Scaling Group via the management console or a describe-auto-scaling-instances API call.

Related

How does AWS autoscaling groups recognize that EC2 is idle and it should be terminated?

I am running a flask python program on EC2 which is under Load Balancer and autoscaling. In a scenario where is load increases on one Ec2 it creates another and if newly scaled Ec2 has been idle or not utilized it scales in or terminates it. The problem here is if a single user is accessing newly scaled instance which hardly takes any CPU utilization how autoscaling group will realize that it idle and if it doesn't it will terminate it leaving downtime for that user.
I have two scenarios in mind that it checks for a particular program for a amount of time in EC2 if it is running then don't, otherwise terminate it.
I see Step scaling policy but there option is only for CPU utilization that is hardly consumed if there is a single user, not even 0.1 %.
Can someone please tell me whats the best option for me and if these two options are possible then how to do it? I have been trying to ask developers since many days but could not get reliable answers in my case.
Amazon EC2 Auto-scaling does not know which of your instances are 'in use'.
Also, the decision to terminate an instance is typically made on a metric across all instances (eg CPU Utilization), rather than a metric on a specific instance.
When Auto Scaling decides to remove an instance from the Auto Scaling group, it picks an instance as follows:
It picks an Availability Zone with the most instances (to keep them balanced)
It then selects an instance based on the Termination Policy
See also: Control which Auto Scaling instances terminate during scale in - Amazon EC2 Auto Scaling
When using a Load Balancer with Auto Scaling, traffic going to the instance that will be terminated is 'drained', allowing a chance for the instance to complete existing requests.
You can further prevent an instance from terminating while it is still "in use"by implementing Amazon EC2 Auto Scaling lifecycle hooks that allow your own code to delay the Termination.
Or, if all of this is unsatisfactory, you can disable the automatic selection of an instance to terminate and instance have your own code call TerminateInstanceInAutoScalingGroup - Amazon EC2 Auto Scaling to terminate a specific instance of your choosing.
For an overview of Auto Scaling, I recommend this video from the AWS Reinvent conference: AWS re:Invent 2019: Capacity management made easy with Amazon EC2 Auto Scaling (CMP326-R1) - YouTube

AWS Spot/OnDemand Instance Management

Is there a way to elegantly Script/Configure Spot instances request, if Spot not available in some specified duration, just use OnDemand. And if Spot instance gets terminated just shift to OnDemand.
Spot Fleet does not do this (it just manages only Spot), EMR fleets have some logic around this. You can have auto scaling with Spot or on Demand not both (even though you can have 2 separate ASGs simulate this behavior).
This should be some kind of a base line use case.
Also does an Event get triggered when a Spot instance is launched or when it is Terminated. I am only seeing CLIs to check Spot status, not any CloudWatch metric/event.
Cloudwatch Instance State events can fire when any event changes states.
They can fire for any event in the lifecycle of an instance:
pending (launching), running (launch complete), shutting-down, stopped, stopping, and terminated, for any instance (or for all instances, which is probably what you want -- just disregard any instance that isn't of interest), and this includes both on-demand and spot.
http://docs.aws.amazon.com/AmazonCloudWatch/latest/events/EventTypes.html#ec2_event_type
http://docs.aws.amazon.com/AmazonCloudWatch/latest/events/LogEC2InstanceState.html
You could use this to roll your own solution -- there's not a built in mechanism for marshaling mixed fleets.
I used to do this from the ELB with health checks. You can make two groups, one with spot instances and one with reserved or on demand. Create a CW alarm when spot group contains zero healthy hosts, and scale up the other group when it fires. And the other way around, when it has enough healthy hosts scale down the other group. Use 30 sec health checks on alarm you use to scale up and 30-60 minute cooldown on scale down.
There is also Spotml which allows you to always keep a spotInstance or an onDemand instance up and running.
In addition to simply spawning the instance it also allows you to
Preserve data via persistent storage
And configure a startup script each time a new instance is spawned.
Disclosure: I'm also the creator of SpotML, it's primarily useful for ML/DataScience workflows that can largely just run on spot instances.

Automatically terminate Auto Scaling instances after a time period

We use Amazon EC2 Auto Scaling groups for a bunch of apps - as everyone knows, while you try your hardest not to have memory leaks and other "been up for a while problems" - it's still possible.
We'd like to protect against such possibilities by just bouncing the servers - ie make sure an instance only lasts, say, 24 hours before killing it. However, we want the killing to be "safe" - eg - even if there's only one instance in the group, we want to start up another instance to a working state, then kill the old box.
Is there any support for this? eg a time-to-live property on an instance?
There is no such property in Amazon EC2 nor in Auto Scaling.
You could manually set the instance health to Unhealthy, which would cause Auto Scaling to terminate and replace the instance. However, if you have only one instance then there will likely be a period where there are no instances.
You could set the Auto Scaling termination policy to OldestInstance, which means that when Auto scaling needs to terminate an instance, it will terminate the oldest instance within the AZ that has the most instances. This gets rid of old instances, but only when the group is scaled-in.
Therefore, you could supplement the Termination Policy with a script that scales-out the group and then causes it to scale-in again. For example, double the number of instances, wait for them to launch, and then halve the number of instances. This should cause them all to refresh (with a few edge conditions if your instances are spread across multiple AZs, causing non-even counts).
Another option is to restart the instance(s). This will not cause them to appear unhealthy to Auto Scaling, but they will appear unhealthy to a Load Balancer. (If you have activated ELB Health Checks within Auto Scaling, then Auto Scaling would actually terminate instances the fail the health check.) You can use Scheduled Events for Your Instances to have Amazon CloudWatch Events restart your instance(s) at certain intervals, or even have a script on the instance tell the Operating System to restart at certain intervals.
However, there is no automatic option to do exactly what you asked.
Since 2019, there has been a Maximum Instance Lifetime parameter, that almost does what you wanted.
Unfortunately, though, it isn’t possible to set the maximum instance lifetime to 24 hours (86400 seconds): the minimum is a week.
Maximum instance lifetime must be equal to 0, between 604800 and 31536000 seconds (inclusive), or not specified.

AWS - how to prevent load balancer from terminating instances under load?

I'm writing a web-service that packs up customer data into zip-files, then uploads them to S3 for download. It is an on-demand process, and the amount of data can range from a few Megabytes to multiple Gigabytes, depending on what data the customer orders.
Needless to say, scalability is essential for such a service. But I'm having trouble with it. Packaging the data into zip-files has to be done on the local harddrive of a server instance.
But the load balancer is prone to terminating instances that are still working. I have taken a look at scaling policies:
http://docs.aws.amazon.com/autoscaling/latest/userguide/as-instance-termination.html
But what I need doesn't seem to be there. The issue shouldn't be so difficult: I set the scale metric to CPU load, and scale down when it goes under 1%. But I need a guarantee that the exact instance will be terminated that breached the threshold, not another one that's still hard at work, and the available policies don't seem to present me with that option. Right now, I am at a loss how to achieve this. Can anybody give me some advice?
You can use Auto Scaling Lifecycle Hooks to perform actions before an instance is terminated. You could use this to wait for the processing to finish before proceeding with the instance termination.
It appears that you have configured an Auto Scaling group with scaling policies based upon CPU Utilization.
Please note that an Elastic Load Balancer will never terminate an Amazon EC2 instance -- if a Load Balancer health check fails, it will merely stop serving traffic to that EC2 instance until it again passes the health checks. It is possible to configure Auto Scaling to use ELB health checks, in which case Auto Scaling will terminate any instances that ELB marks as unhealthy.
Therefore, it would appear that Auto Scaling is responsible for terminating your instances, as a result of your scaling policies. You say that you wish to terminate specific instances that are unused. However, this is not the general intention of Auto Scaling. Rather, Auto Scaling is used to provide a pool of resources that can be scaled by launching new instances and terminating unwanted instances. Metrics that trigger Auto Scaling are typically based upon aggregate metrics across the whole Auto Scaling group (eg average CPU Utilization).
Given that Amazon EC2 instances are charged by the hour, it is often a good idea to keep instance running longer -- "Scale Out quickly, Scale In slowly".
Once Auto Scaling decides to terminate an instance (which it selects via a termination policy), use an Auto Scaling lifecycle hook to delay the termination until ready (eg, copying log files to S3, or waiting for a long process to complete).
If you do wish to terminate an instance after it has completed a particular workload, there is no need to use Auto Scaling -- just have the instance Shutdown when it is finished, and set the Shutdown Behavior to terminate to automatically terminate the instance upon shutdown. (This assumes that you have a process to launch new instances when you have work you wish to perform.)
Stepping back and looking at your total architecture, it would appear that you have a Load Balancer in front of web servers, and you are performing the Zip operations on the web servers? This is not a scalable solution. It would be better if your web servers pushed a message into an Amazon Simple Queue Service (SQS) queue, and then your fleet of back-end servers processed messages from the queue. This way, your front-end can continue receiving requests regardless of the amount of processing underway.
It sounds like what you need is Instance Protection, which is actually mentioned a bit more towards the bottom of the document that you linked to. As long as you have work being performed on a particular instance, it should not be automatically terminated by the Auto-Scaling Group (ASG).
Check out this blog post, on the official AWS blog, that conceptually talks about how you can use Instance Protection to prevent work from being prematurely terminated.

How can I prevent EC2 instance termination by Auto Scaling?

I would like to prevent EC2 instance termination by Auto Scaling feature if that instance is in the middle of some sort of processing.
Background:
Suppose I have an Auto Scaling group that currently has 5 instances running.
I create an alarm on average CPU usage...
Suppose 4 of the instances are idle and one is doing some heavy processing...
The average CPU load will trigger the alarm and as a result the scale-down policy will execute.
How do I get Auto Scaling to terminate one of the idle instances and not the one that is in the middle of the processing?
Update
As noted by Ryan Walls (+1), AWS meanwhile provides Instance Protection to control whether Auto Scaling can terminate a particular instance when scaling in (see the introductory blog post Instance Protection for Auto Scaling for a walk through):
You can enable the instance protection setting on an Auto Scaling
group or an individual Auto Scaling instance. When Auto Scaling
launches an instance, the instance inherits the instance protection
setting of the Auto Scaling group. [...]
It's worth noting that this instance protection only applies to regular Auto Scaling scale in events:
Instance protection does not protect Auto Scaling instances from
manual termination through the Amazon EC2 console, the
terminate-instances command, or the TerminateInstances API. Instance
protection does not protect an Auto Scaling instance from termination
if it fails health checks and must be replaced. Also, instance
protection does not protect Spot instances in an Auto Scaling group
from interruption.
As usual, the feature is available via the AWS Management Console (menu Actions->Instance Protection->Set Scale In Protection)), the AWS CLI (set-instance-protection command), and the API (SetInstanceProtection API action).
The latter two options allow automation of the scenario at hand, i.e. one would need to enable instance protection before running 'heavy processing' jobs, and disable instance protection once they are finished so that the instance is eligible for termination again.
Initial Answer
This functionality is currently not available for Auto Scaling of Amazon EC2 instances - while you are indeed able to Configure [an] Instance Termination Policy for Your Auto Scaling Group, the available policies do not include such a (fairly advanced) concept:
Auto Scaling provides the following termination policy options for you
to choose from. You can specify one or more of these options in your
termination policy.
OldestInstance — Specify this if you want the oldest instance in your Auto Scaling group to be terminated. [...]
NewestInstance — Specify this if you want the last launched instance to be terminated. [...]
OldestLaunchConfiguration — Specify this if you want the instance launched using the oldest launch configuration to be
terminated. [...]
ClosestToNextInstanceHour — Specify this if you want the instance that is closest to completing the billing hour to be
terminated. [...]
Default — Specify this if you want Auto Scaling to use the default termination policy to select instances for termination.
I just successfully dealt with the problem of long-running jobs in an auto scaling group using the relatively recent lifecycle hook feature.
The problem with trying to choose an idle node to terminate, in my case, was that the process that chooses the idle node will race against processes that submit work to the nodes. In this case it's better to use a strategy where any node can be terminated, but termination happens gracefully so that no work is lost. You can then use all of the standard auto scaling policy stuff to manage scale-in and scale-out.
The termination lifecycle hook allows the user (or a process) to perform actions on the node after it has been placed into an intermediate state (labeled Terminating:Wait) by the auto scaling group. The user (or process) is then responsible for completing the lifecycle action via an AWS API call, resulting in the shutdown of the terminated EC2 instance.
The way I set this up, in short, is:
Create a role that allows auto scaling to post a message to an SQS queue.
Create an SQS queue for the termination messages.
Create a monitor script that runs as a service in each node. My script is a simple event-driven state machine that transitions in sequence from MONITORING (polling SQS for a termination message for the node) to DRAINING (polling a job queue until no work is being performed on the node) to TERMINATED (making the complete-lifecycle call).
Standard configuration for event-driven AWS auto-scaling; that is, creating CloudWatch alarms, and the auto-scaling policies for scale-in and scale-out.
One hinderance to this approach is that the lifecycle hook management isn't supported yet in the SDKs (boto, at least, doesn't support it AFAIK), nor are there Cloud Formation resources for the hooks.
The relevant AWS documentation is here:
http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/AutoScalingGroupLifecycle.html
Amazon has finally addressed this issue in a simpler way. There is now "instance protection" where you can mark your instance as protected and it will not be terminated during a "scale in".
See https://aws.amazon.com/blogs/aws/new-instance-protection-for-auto-scaling
aws-cli is your best friend..
Disable your scale down policy on your autoscaling group.
Create a cron job or scheduled task using aws-cli to:
2a. Get the EC2 instances associated with the autoscaling group
http://docs.aws.amazon.com/cli/latest/reference/autoscaling/describe-auto-scaling-instances.html
2b. Next monitor the cloudwatch statistics on the EC2 instances
http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/US_SingleMetricPerInstance.html
http://docs.aws.amazon.com/cli/latest/reference/cloudwatch/get-metric-statistics.html
2c. Terminate the idle EC2 instance(s) from your auto-scaling group
http://docs.aws.amazon.com/cli/latest/reference/autoscaling/terminate-instance-in-auto-scaling-group.html
You can use Amazon CloudWatch to achieve this:
http://aws.typepad.com/aws/2013/01/amazon-cloudwatch-alarm-actions.html. From the article:
You can use a similar strategy to get rid of instances that are tasked with handling compute-intensive batch processes. Once the CPU goes idle and the work is done, terminate the instance and save some money!
In this case, since you will be handling the termination, you will need to remove the scale-down policy. Also see another option: https://stackoverflow.com/a/19628453/432849.