AWS autoscaling group where one instance is unique - amazon-web-services

I have an a scaling group of 2-5 instances to handle web traffic. I'm using a the rpush gem for push notifications, which requires a single daemon running to execute all the awaiting jobs. I'm already paying for the 2-5 instances, which have sufficient extra computing power to handle running the daemon, and I'd like to run the daemon on one of these instances.
The problem is, I can only use 1 API per auto-scaling group, so I'm having trouble finding a way to run the daemon on only one of the instances in the auto-scale group.
Is there a way to do this?

You could start your daemon manually on one of the instances and mark it as protected from termination. This way it won't be terminated during scaling in. While scaling out, the default new instances will be created without the deamon.
Keep in mind that while protected from termination in the auto-scaling
group, it may still be terminated by:
Manual termination through the Amazon EC2 console, the
terminate-instances command, or the TerminateInstances action. To
protect Auto Scaling instances from manual termination, enable
termination protection. For more information, see Enabling Termination
Protection in the Amazon EC2 User Guide for Linux Instances.
Health check replacement if the instance fails health checks.
Spot Instance interruption.
(source: AWS docs)

Related

When does AWS deregister an EC2 from the auto-scaling group during scale-in?

I am trying to figure out when and how AWS deregisters an EC2 from an auto-scaling group during scale-in. I am especially worried about a case when an EC2 which is about to be terminated will receive an incoming request shortly before being terminated. This would naturally cause processing of the requst to fail. The desired behavior would be for AWS to deregister the about-to-be-terminated instance from the group some configurable time before actually terminating it. I have found no documentation about this specific issue. Am I missing something?
There's no configuration that guarantees that a specific time elapses between deregistering an instance from the group and terminating it.
You can use Elastic Load Balancer health checks to remove instances that are not responding from the load balancer endpoint before they are terminated.
From AWS Documentation https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-auto-scaling-termination-policies.html
Default termination policy
The default termination policy applies multiple termination criteria before selecting an instance to terminate. When Amazon EC2 Auto Scaling terminates instances, it first determines which Availability Zones have the most instances, and it finds at least one instance that is not protected from scale in. Within the selected Availability Zone, the following default termination policy behaviour applies:
Determine whether any of the instances eligible for termination use the oldest launch template or launch configuration:
[For Auto Scaling groups that use a launch template]
Determine whether any of the instances use the oldest launch template, unless there are instances that use a launch configuration. Amazon EC2 Auto Scaling terminates instances that use a launch configuration before it terminates instances that use a launch template.
[For Auto Scaling groups that use a launch configuration]
Determine whether any of the instances use the oldest launch configuration.
After applying the preceding criteria, if there are multiple unprotected instances to terminate, determine which instances are closest to the next billing hour. If there are multiple unprotected instances closest to the next billing hour, terminate one of these instances at random.

How to keep AWS ECS from shutting down during a critical moment?

Is there a way to ensure an AWS ECS container instance doesn't shut down in the middle of running a critical task?
I have an auto-scaling AWS ECS service that scales the number of instances based on CPU usage. These instances process long-running batch jobs that may take anywhere from 5 to 30 minutes.
The problem is that sometimes, during a scale-down, an instance that's actively running a critical job gets shut down which ultimately causes the job to fail.
You can use a feature called managed termination protection.
When the scaling policy reduces the number of instances, it has no control over which instances actually terminate. The default behavior of the auto-scaling group may well terminate instances that are running tasks, even though there are instances not running tasks. This is where managed termination protection comes into the picture. With this option enabled, ECS dynamically manage instance termination protection on your behalf.
Please have a look at Controlling which Auto Scaling instances terminate during scale in and specifically the section Instance scale-in protection in the AWS documentation.

How can I control which EC2 instances get removed by an AutoScalingGroup using Amazon Web Services?

I have foreseen a problem that could happen with my application but I am unsure if it is possible to solve, and perhaps the architecture needs to be redesigned.
I am using an AutoScalingGroup (ASG) on AWS to create EC2 instances that host game servers that players can join. At the moment, the ASG is scaled manually via a matchmaking API which changes the desired capacity based on its needs. The problem occurs when a game server is finished.
When a game finishes, it signals to the matchmaker that it is finished and needs terminating, and the matchmaker will then scale down the ASG accordingly, however, it doesn't seem to know exactly which instance to remove, and it won't necessarily be the one that needs terminating.
I can terminate the instance, but then as the ASG desired capacity is never changed when the instance is terminated, another server is created.
Is there a way I can scale down the ASG, as well as specifying which servers to remove from the group?
In a nutshell, the default termination policy during scale in is designed to remove instances that use the oldest launch configuration.
Currently, Amazon EC2 Auto Scaling supports the following termination policie:
OldestInstance Terminate the oldest instance in the group. This option is useful when you're upgrading the instances in the Auto Scaling group to a new EC2 instance type. You can gradually replace instances of the old type with instances of the new type.
NewestInstance Terminate the newest instance in the group. This policy is useful when you're testing a new launch configuration but don't want to keep it in production.
OldestLaunchConfiguration Terminate instances that have the oldest launch configuration. This policy is useful when you're updating a group and phasing out the instances from a previous configuration.
ClosestToNextInstanceHour Terminate instances that are closest to the next billing hour. This policy helps you maximize the use of your instances and manage your Amazon EC2 usage costs.
Default Terminate instances according to the default termination policy. This policy is useful when you have more than one scaling policy for the group.
Instance protection
One of the possible solutions could be to use Instance protection. The auto-scaling provides an instance protection to control whether instance can be terminated when scaling-in.
Therefore, enable the instance protection for ASG to protect instances from scaling-in by default. Once you are done with you server, decrease a value of desired number of instances, remove instance protection from particular instance (either using CLI or SDK; note that this protection remains enabled for the rest of instances) and auto-scaling will terminate that exact instance.
For more information about instance protection, see Instance Protection
The oldest server is removed. If you want to scale down a specific server, you will have to kill that server before changing desired capacity.

AWS instance scheduler and autoscaling group

I configured for my AWS account the new AWS instance scheduler https://aws.amazon.com/answers/infrastructure-management/instance-scheduler/
The problem seems that, tagging ec2-instances through a scaling group the ec2-instances are correctly stopped, but since my scaling group has Min number set to 2 AWS scaling group restarts them anyway.
I would not like to set Min number to 0, just because is useful during application redeploy.
How to make the 2 services work fine?
When you stop your EC2 instances that are controlled by Auto Scaling, then Auto Scaling will see them as "unhealthy" and it will proceed to terminate and replace them.
You have 2 options.
Option 1: Pause Auto Scaling processing while your EC2 instances are stopped. By doing this, Auto Scaling won't care that your EC2 instances are stopped and won't terminate them. Just remember to resume processing after you restart your EC2 instances.
However, AWS Instance Scheduler will not manage this for you, so you'll need to find another way to schedule your EC2 instances to stop & restart.
Option 2: Scale your Auto Scaling group to 0 and back to 2. This will result in terminating your EC2 instances (when you don't need them) and re-creating them (when you want them). This will only work if your EC2 instances are ephemeral.
Again, AWS Instance Scheduler will not manage this for you. Auto Scaling scheduled actions may be able to help you with this.
Another option is to use asg standby feature before and after the aws instance scheduler. This will also let you work on the same Ami before the shutdown.
So high level solution is below:
Define ec2 instance schedule using aws instance scheduler
Define lambda that fetch the shutdown schedule and put the ec2 in standby mode before the planned shutdown.
Define lambda that fetch the startup schedule and put the ec2 instance out of standby after the ec2 planned restart.

How can I prevent EC2 instance termination by Auto Scaling?

I would like to prevent EC2 instance termination by Auto Scaling feature if that instance is in the middle of some sort of processing.
Background:
Suppose I have an Auto Scaling group that currently has 5 instances running.
I create an alarm on average CPU usage...
Suppose 4 of the instances are idle and one is doing some heavy processing...
The average CPU load will trigger the alarm and as a result the scale-down policy will execute.
How do I get Auto Scaling to terminate one of the idle instances and not the one that is in the middle of the processing?
Update
As noted by Ryan Walls (+1), AWS meanwhile provides Instance Protection to control whether Auto Scaling can terminate a particular instance when scaling in (see the introductory blog post Instance Protection for Auto Scaling for a walk through):
You can enable the instance protection setting on an Auto Scaling
group or an individual Auto Scaling instance. When Auto Scaling
launches an instance, the instance inherits the instance protection
setting of the Auto Scaling group. [...]
It's worth noting that this instance protection only applies to regular Auto Scaling scale in events:
Instance protection does not protect Auto Scaling instances from
manual termination through the Amazon EC2 console, the
terminate-instances command, or the TerminateInstances API. Instance
protection does not protect an Auto Scaling instance from termination
if it fails health checks and must be replaced. Also, instance
protection does not protect Spot instances in an Auto Scaling group
from interruption.
As usual, the feature is available via the AWS Management Console (menu Actions->Instance Protection->Set Scale In Protection)), the AWS CLI (set-instance-protection command), and the API (SetInstanceProtection API action).
The latter two options allow automation of the scenario at hand, i.e. one would need to enable instance protection before running 'heavy processing' jobs, and disable instance protection once they are finished so that the instance is eligible for termination again.
Initial Answer
This functionality is currently not available for Auto Scaling of Amazon EC2 instances - while you are indeed able to Configure [an] Instance Termination Policy for Your Auto Scaling Group, the available policies do not include such a (fairly advanced) concept:
Auto Scaling provides the following termination policy options for you
to choose from. You can specify one or more of these options in your
termination policy.
OldestInstance — Specify this if you want the oldest instance in your Auto Scaling group to be terminated. [...]
NewestInstance — Specify this if you want the last launched instance to be terminated. [...]
OldestLaunchConfiguration — Specify this if you want the instance launched using the oldest launch configuration to be
terminated. [...]
ClosestToNextInstanceHour — Specify this if you want the instance that is closest to completing the billing hour to be
terminated. [...]
Default — Specify this if you want Auto Scaling to use the default termination policy to select instances for termination.
I just successfully dealt with the problem of long-running jobs in an auto scaling group using the relatively recent lifecycle hook feature.
The problem with trying to choose an idle node to terminate, in my case, was that the process that chooses the idle node will race against processes that submit work to the nodes. In this case it's better to use a strategy where any node can be terminated, but termination happens gracefully so that no work is lost. You can then use all of the standard auto scaling policy stuff to manage scale-in and scale-out.
The termination lifecycle hook allows the user (or a process) to perform actions on the node after it has been placed into an intermediate state (labeled Terminating:Wait) by the auto scaling group. The user (or process) is then responsible for completing the lifecycle action via an AWS API call, resulting in the shutdown of the terminated EC2 instance.
The way I set this up, in short, is:
Create a role that allows auto scaling to post a message to an SQS queue.
Create an SQS queue for the termination messages.
Create a monitor script that runs as a service in each node. My script is a simple event-driven state machine that transitions in sequence from MONITORING (polling SQS for a termination message for the node) to DRAINING (polling a job queue until no work is being performed on the node) to TERMINATED (making the complete-lifecycle call).
Standard configuration for event-driven AWS auto-scaling; that is, creating CloudWatch alarms, and the auto-scaling policies for scale-in and scale-out.
One hinderance to this approach is that the lifecycle hook management isn't supported yet in the SDKs (boto, at least, doesn't support it AFAIK), nor are there Cloud Formation resources for the hooks.
The relevant AWS documentation is here:
http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/AutoScalingGroupLifecycle.html
Amazon has finally addressed this issue in a simpler way. There is now "instance protection" where you can mark your instance as protected and it will not be terminated during a "scale in".
See https://aws.amazon.com/blogs/aws/new-instance-protection-for-auto-scaling
aws-cli is your best friend..
Disable your scale down policy on your autoscaling group.
Create a cron job or scheduled task using aws-cli to:
2a. Get the EC2 instances associated with the autoscaling group
http://docs.aws.amazon.com/cli/latest/reference/autoscaling/describe-auto-scaling-instances.html
2b. Next monitor the cloudwatch statistics on the EC2 instances
http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/US_SingleMetricPerInstance.html
http://docs.aws.amazon.com/cli/latest/reference/cloudwatch/get-metric-statistics.html
2c. Terminate the idle EC2 instance(s) from your auto-scaling group
http://docs.aws.amazon.com/cli/latest/reference/autoscaling/terminate-instance-in-auto-scaling-group.html
You can use Amazon CloudWatch to achieve this:
http://aws.typepad.com/aws/2013/01/amazon-cloudwatch-alarm-actions.html. From the article:
You can use a similar strategy to get rid of instances that are tasked with handling compute-intensive batch processes. Once the CPU goes idle and the work is done, terminate the instance and save some money!
In this case, since you will be handling the termination, you will need to remove the scale-down policy. Also see another option: https://stackoverflow.com/a/19628453/432849.