AWS auto-scaling desired capacity changed automatically and is not coming down

AWS auto-scaling desired capacity changed automatically and is not coming down - amazon-web-services

I have an auto-scaling group with two target scaling policies, one is based on SQS backlog and another on CPU-Utilization. I have initially set the capacities at minimum-1,desired-1,maximum-4. During the scale-out process, my desired capacity changed to 2 automatically. Now all the messages in the queues were successfully processed and the two scaling processes are requesting the ASG to scale in. But, the desired capacity is still at 2 and is not coming down. Is there a logic that I am missing? or do I need to do it manually?

Related

Auto scale rule based on custom Cloudwatch alarm

I have an auto-scaling group of EC2 servers that run a number of processes.
This number of processes changes with the load and I'd like to trigger a scaling (up/down) based on the number of processes.
I've successfully set up a script that sends to Cloudwatch the number of processes on every servers, for every minutes, and I can see these on Cloudwatch. (I haven't set a dimension, to be able to get the value for all the servers).
Then, I created an Alarm, that uses the average for the values sent, and if it reach a certain limit, it triggers the "Add a new server" to the auto scaling group, and when it stop being on alarm, it triggers a "Remove a server".
My issue is that when I add the new server, the average drops, since there is one more server now, which move the alarm to the ok state, removing the server, and increasing again the average, triggering again the alarm, etc.
For instance, the limit is set to 10 processes on average. With 3 servers, if the average becomes 11, I trigger the alarm state, adding a server. Now with the new server, I'm at 33 processes (3 x 11) for 4 servers : 8,25 processes on average, thus triggering the "OK" alarm.
My question is: Is it possible to set up an alarm based on the number of processes without having the new trigger causes a up-down-up-down issue?
Instead of average, I can use something else to trigger the alarm, such as min/max/I-don't-know.
Thank you for your help. Happy to provide any other details if needed.

You should not create an alarm that adds instances when True and removes instances when False. This will cause a continual 'flip-flop' situation rather than trying to find a steady-state.
You could have each server regularly send a custom metric to Amazon CloudWatch. You could then use this with Target tracking scaling policies for Amazon EC2 Auto Scaling - Amazon EC2 Auto Scaling, which will calculate the average value of the metric and automatically launch/terminate instances to keep the target value around 10.
This would work well with long-running processes (perhaps 5+ minutes with several processes running concurrently), but would not be good with short sub-minute processes because it takes time to launch new instances.

I think you could look at metric math. So instead of directly triggering your alarm based on your process-count-metric only, you could perhabs calculate the average count yourself using metric math. You could use the GroupTotalInstances metric from your ASG, or just publish second custom metric having the number of instances.
In both cases, your metric for the alarm would use metric math to divide number of processes by size of ASG for each evaluation period.

Is there a way to scale in "instance" (part of ASG ) on certain custom metric?

I'm using the AutoScalingGroup to launch a group of EC2 instances. These instances are acting as workers which are continuously listening to SQS for any new request.
Requirement:
Do upscale on something like throughput (i.e Total number of messages present in SQS by total number instances).
And I want to downscale whenever any instance which is part of ASG is sitting idle (CPUIdle) for let's say more than 15 mins.
Note: I am not looking for any metric which applies as whole to a particular ASG (eg: Average CPU).
One way of doing that could be defining the custom metric and allowing it to trigger a cloudwatch alarm to do that.
Is there a better way to accomplish this?

If you are defining the scaling policy at instance level, then you defeating the entire purposes of ASG. If you need to scale based on changing conditions, such as the queue size, then you can configure ASG based on the conditions specified here
https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-using-sqs-queue.html
A custom metric to send to Amazon CloudWatch that measures the number of messages in the queue per EC2 instance in the Auto Scaling group.
A target tracking policy that configures your Auto Scaling group to scale based on the >custom metric and a set target value. CloudWatch alarms invoke the scaling policy.
If you know a specific time window when the queue size goes up or down, you can also scale based on schedule.
You can always start with a very low instance count in ASG and set the desired capacity as such (say 1) and scale up based on queue, so you can continue using ASG policies.

Is it possible to have an AWS EC2 scale group that defaults to 0 and only contains instances when there is work to do?

I am trying to setup a EC2 Scaling group that scales depending on how many items are in an SQS queue.
When the SQS queue has items visible I need the Scaling group to have 1 instance available and when the SQS queue is empty (e.g. there are no visible or non-visible messages) I want there to be 0 instances.
Desired instances it set to 0, min is set to 0 and max is set to 1.
I have setup cloudwatch alarms on my SQS queue to trigger when visible messages are greater than zero, and also triggers an alarm when non visible messages are less than one (i.e no more work to do).
Currently the Cloudwatch Alarm Triggers to create an instance but then the scaling group automatically kills the instance to meet the desired setting. I expected the alarm to adjust the desired instance count within the min and max settings but this seems to not be the case.

Yes, you can certainly have an Auto Scaling group with:
Minimum = 0
Maximum = 1
Alarm: When ApproximateNumberOfMessagesVisible > 0 for 1 minute, Add 1 Instance
This will cause Auto Scaling to launch an instance when there are messages waiting in the queue. It will keep trying to launch more instances, but the Maximum setting will limit it to 1 instance.
Scaling-in when there are no messages is a little bit tricker.
Firstly, it can be difficult to actually know when to scale-in. If there are messages waiting to be processed, then ApproximateNumberOfMessagesVisible will be greater than zero. However, there are no messages waiting, it doesn't necessarily mean you wish to scale-in because messages might be currently processing ("in flight"), as indicated by ApproximateNumberOfMessagesNotVisible. So, you only want to scale-in if both of these are zero. Unfortunately, a CloudWatch alarm can only reference one metric, not two.
Secondly, when an Amazon SQS queue is empty, it does not send metrics to Amazon CloudWatch. This sort of makes sense, because queues are mostly empty, so it would be continually sending a zero metric. However, it causes a problem that CloudWatch does not receive a metric when the queue is empty. Instead, the alarm will enter the INSUFFICIENT_DATA state.
Therefore, you could create your alarm as:
When ApproximateNumberOfMessagesVisible = 0 for 15 minutes, Remove 1 instance but set the action to trigger on INSUFFICIENT_DATA rather than ALARM
Note the suggested "15 minutes" delay to avoid thrashing instances. This is where instances are added and removed in rapid succession because messages are coming in regularly, but infrequently. Therefore, it is better to wait a while before deciding to scale-in.
This leaves the problem of having instances terminated while they are still processing messages. This can be avoided by taking advantage of Auto Scaling Lifecycle Hooks, which send a signal when an instance is about to be terminated, giving the application the opportunity to delay the termination until work is complete. Your application should then signal that it is ready for termination only when message processing has finished.
Bottom line
Much of the above depends upon:
How often your application receives messages
How long it takes to process a message
The cost savings involved
If your messages are infrequent and simple to process, it might be worthwhile to continuously run a t2.micro instance. At 2c/hour, the benefit of scaling-in is minor. Also, there is always the risk when adding and removing instances that you might actually pay more, because instances are charged by the hour -- running an instance for 30 minutes, terminating it, then launching another instance for 30 minutes will actually be charged as two hours.
Finally, you could consider using AWS Lambda instead of an Amazon EC2 instance. Lambda is ideal for short-lived code execution without requiring a server. It could totally remove the need to use Amazon EC2 instances, and you only pay while the Lambda function is actually running.

for simple conf, with per sec aws ami/ubuntu billing dont worry about wasted startup/shutdown time, just terminate your ec2 by yourself, w/o any asg down policy add a little bash in client startup code or preinstal it in cron and poll for process presence or cpu load and term ec2 or shutdown (termination is better if you attach volumes and need 'em to autodestruct) after processing is done. there's ane annoying thing about asg defined as 0/0/1 (min/desired/max) with defaults and ApproximateNumberOfMessagesNotVisible on sqs - after ec2 is fired somehow it switches to 1/0/1 and it start to loop firing ec2 even if there's nothing is sqs (i'm doing video transcoding, queing jobs to do to sns/sqs and firing ffmpeg nodes with asg defined on non empty sqs)

AWS: Autoscaling based on the size of the queue

AWS auto scaling works based on the load (number of concurrent requests). It works perfectly for web sites and web APIs. However there are situations in which the number of required EC2 instances is not related to the requests but it depends on something else such as number of items in a queue.
For example an order processing system which pulls the orders from a custom queue (and not SQS) might need to scale out to process the order quicker. How can we make this happpen?

Auto scaling groups can be configured to scale in or out by linking their scaling policies to Cloud Watch alarms. Many people use CPU utilization as a scaling trigger but you can use any Cloud Watch metric you like. In your case you could use your queue's ApproximateNumberOfMessagesVisible metric.
For example, if you create an alarm that fires when the ApproximateNumberOfMessagesVisible > 500 and link that to the scale out policy of your auto scaling group, the group will create new instances whenever the queue has more that 500 messages in it.

What order are AWS AutoScaling policies applied in?

I am planning on using AWS Autoscaling to scale my EC2 services, I have 4 policies that need to control my instance behavior, 2 for scale out and 2 for scale in. My question is what order will they be evaluated in? Scale out first then scale in? or vice-versa? Random? or something else?
Thank you,

Policies are not evaluated in an order. Each policy is compared against the metrics that policy is set up to measure, and takes actions based on the results.
For example, perhaps you have the following four policies:
Add 1 instance when an SQS queue depth is > 1000 messages
Remove 1 instance when the same SQS queue depth is < 200 messages
Add 1 instance when the average CPU of all instances in the autoscaling group is > 80%
Remove 1 instance when the average CPU of all instances in the autoscaling group is < 30%
As you can see, ordering doesn't make sense in this context. The appropriate action(s) will be executed whenever the conditions are met.
Note that without planning and testing you can encounter loops of instances that constantly cycle up and down. Drawing from the previous example, imagine that a new instance is launched because there are > 1000 messages in the queue. But the CPU usage is only 20% for all the instances, so then the 4th policy fires to remove an instance. Thus all the policies should be considered in concert.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js