Spinnaker and custom Cloudwatch metrics for scaling policies - amazon-web-services

I'm currently trying to get around the issue in AWS where CloudWatch alarms cannot contain more than one metric (in this case, SQS message counts).
Scenario:
I have an ASG that contains a set amount of on-demand instances for my application. I have another ASG, where I plan on using spot instances to scale out when it gets busy.
What I'm trying to achieve is, for my application that consumes from 3 SQS queues
if at least 1 queue has a message count above the threshold, scale out the spot instances ASG
if ALL queues have a message count below the threshold for at least X minutes, scale in
To get around this I'm attempting to publish a custom metric with a count of how many queues have a message count above a certain limit, and then use this metric to decide whether to scale in my auto scaling groups.
However... in Spinnaker, there doesn't seem to be a way to refer to a custom metric (from the UI at least) - I am missing something here or is it just not possible?
From what I understand as well, you can only publish metric data to your own namespaces - attempting to publish to any 'AWS/*' namespace will result in an error?

In your settings.js file for deck, include the following block:
providers: {
aws: {
// ...
metrics: {
customNamespaces: ['yourcustomnamespace'],
},
// ...
}
}
I don't think this is explicitly documented anywhere - you'd have to dig into the source code to find this bit of configuration.

Related

CDK: Set the "data points to alarm" parameter for a target tracking scaling policy

I have an ECS Fargate service with autoscaling setup to track a custom metric. How do I set the "datapoints to alarm" parameter for a target tracking autoscaling policy? I don't see it in the docs anywhere for CDK v2. This is how I've defined the scaling with a custom metric:
declare const fargate: FargateService;
const scaling: ScalableTaskCount = fargate.service.autoScaleTaskCount({
minCapacity: 5,
maxCapacity: 30
})
scaling.scaleToTrackCustomMetric(`QueueSizeScaling`, {
metric,
targetValue: props.itemsPerInstance,
policyName: '10k-items-per-task',
disableScaleIn: false,
scaleOutCooldown: Duration.minutes(2),
scaleInCooldown: Duration.minutes(2)
});
I can see it as an editable option in the console but can't find where to set this in CDK.:
If this is not possible/advisable for some reason (as I understand target tracking scaling is meant to be a managed action) is it sound to reduce the desired task count and then stop the ECS task from within the task itself or are there drawbacks to this? When doing that, does the order of aws-sdk calls matters so make a call to update_service first then stop_task or maybe asg.terminate_instance_in_auto_scaling_group, or vice-versa?
EDIT#1:
My scaling metric is defined like this:
const metric = new cloudwatch.Metric({
namespace: "redisQueueSizeNamespace",
metricName: `QueueSize`,
period: Duration.minutes(1),
statistic: cloudwatch.Statistic.MAXIMUM,
});
Where a lambda continuously publishes the queue size:
await cloudwatch.putMetricData({
MetricData: [
{
MetricName: `QueueSize`,
Value: Number(count)
}
],
Namespace: 'redisQueueSizeNamespace'
}).promise();
And the scaleToTrackCustomMetric method creates an AlarmHigh and AlarmLow alarm for scale in and out respectively. That was the source of the alarm screenshot.
The reason you aren't seeing it in the CDK but you are seeing it in the console is because the console tends to merge multiple different types of resources together to make it easier to start.
Specifically, the resource you're looking for is creating a cloudwatch alarm, and you can see the CDK information for this construct here: https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_cloudwatch.Alarm.html which contains the datapointsToAlarm attribute. So you'll end up creating a cloudwatch custom metric (which you've already done), then add an alarm to that metric (which you're missing).
That having been said, as you've mentioned, target scaling is meant to be more managed so you don't need alarms. You're already telling your service to create or destroy instances to meet a given threshold (presumably 10k, given the name of the policy). The way to control how often it does that is not to control the data points you need to hit, but to control the scale-out and in cooldowns that you already have set up in your configuration. For example, if you're sending data every 30 seconds, and you want a minimum of 2 data points, you'd set your scale out cooldown to 60 seconds. It will be a little more aggressive to scale out the first time, but it will wait for two 30-second data points before it scales out every time after. Adding an alarm to the metric will let you alert someone when the metric goes out of bounds, but it won't impact your target scaling.
To your question about manually decreasing the task quantity, it's possible but there are a lot of race conditions. The order in which you call the API can have various impacts on your application:
If you call update_service first to decrease your task size, then exit the container on a task, it's possible that the ecs-agent will issue a kill command against a different application instance, and you will scale down 2 instances (one killed by the ecs agent and one that you manually shut down) instead of 1. This might be OK, because the second application instance will immediately restart, however if that shuts down to 0 instances that's likely a problem.
If you call stop_task before you call update_service, it's possible the ecs-agent will restart a task instance because it believes it still needs to match the desired instances. When you call update_service, it may terminate a different instance since it's now "+1" from where it thinks it should be.
You can still do this if you want, and you can user termination protection to stop the race conditions. In general though, I would view both of these solutions as a "last ditch" measure; try to avoid doing them and just let normal autoscaling handle your scale-out/in needs.

Auto scale rule based on custom Cloudwatch alarm

I have an auto-scaling group of EC2 servers that run a number of processes.
This number of processes changes with the load and I'd like to trigger a scaling (up/down) based on the number of processes.
I've successfully set up a script that sends to Cloudwatch the number of processes on every servers, for every minutes, and I can see these on Cloudwatch. (I haven't set a dimension, to be able to get the value for all the servers).
Then, I created an Alarm, that uses the average for the values sent, and if it reach a certain limit, it triggers the "Add a new server" to the auto scaling group, and when it stop being on alarm, it triggers a "Remove a server".
My issue is that when I add the new server, the average drops, since there is one more server now, which move the alarm to the ok state, removing the server, and increasing again the average, triggering again the alarm, etc.
For instance, the limit is set to 10 processes on average. With 3 servers, if the average becomes 11, I trigger the alarm state, adding a server. Now with the new server, I'm at 33 processes (3 x 11) for 4 servers : 8,25 processes on average, thus triggering the "OK" alarm.
My question is: Is it possible to set up an alarm based on the number of processes without having the new trigger causes a up-down-up-down issue?
Instead of average, I can use something else to trigger the alarm, such as min/max/I-don't-know.
Thank you for your help. Happy to provide any other details if needed.
You should not create an alarm that adds instances when True and removes instances when False. This will cause a continual 'flip-flop' situation rather than trying to find a steady-state.
You could have each server regularly send a custom metric to Amazon CloudWatch. You could then use this with Target tracking scaling policies for Amazon EC2 Auto Scaling - Amazon EC2 Auto Scaling, which will calculate the average value of the metric and automatically launch/terminate instances to keep the target value around 10.
This would work well with long-running processes (perhaps 5+ minutes with several processes running concurrently), but would not be good with short sub-minute processes because it takes time to launch new instances.
I think you could look at metric math. So instead of directly triggering your alarm based on your process-count-metric only, you could perhabs calculate the average count yourself using metric math. You could use the GroupTotalInstances metric from your ASG, or just publish second custom metric having the number of instances.
In both cases, your metric for the alarm would use metric math to divide number of processes by size of ASG for each evaluation period.

Is there a way to scale in "instance" (part of ASG ) on certain custom metric?

I'm using the AutoScalingGroup to launch a group of EC2 instances. These instances are acting as workers which are continuously listening to SQS for any new request.
Requirement:
Do upscale on something like throughput (i.e Total number of messages present in SQS by total number instances).
And I want to downscale whenever any instance which is part of ASG is sitting idle (CPUIdle) for let's say more than 15 mins.
Note: I am not looking for any metric which applies as whole to a particular ASG (eg: Average CPU).
One way of doing that could be defining the custom metric and allowing it to trigger a cloudwatch alarm to do that.
Is there a better way to accomplish this?
If you are defining the scaling policy at instance level, then you defeating the entire purposes of ASG. If you need to scale based on changing conditions, such as the queue size, then you can configure ASG based on the conditions specified here
https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-using-sqs-queue.html
A custom metric to send to Amazon CloudWatch that measures the number of messages in the queue per EC2 instance in the Auto Scaling group.
A target tracking policy that configures your Auto Scaling group to scale based on the >custom metric and a set target value. CloudWatch alarms invoke the scaling policy.
If you know a specific time window when the queue size goes up or down, you can also scale based on schedule.
You can always start with a very low instance count in ASG and set the desired capacity as such (say 1) and scale up based on queue, so you can continue using ASG policies.

AutoScaling Based on Comparing Query Metrics

I have a not-so-complicated situation but it can be complicated on AWS cloudformations:
I would like to autoscale up and down based on the number of messages on SQS.
But I am not sure what I need to specify on AWS cloudformation, I would imagine that I would need:
some sort of lambda/cloudformation that perform query on the current number of instances on AutoScalingGroup
some sort of lambda/cloudformation that perform query on the current number of messages on SQS.
some comparison operations that compares #1 and #2.
create scale up policy when #1 < #2
create scale down policy when #1 > #2
Not sure where I should get started... can someone kind enough to show some examples?
You have several different concepts all mixed together (CloudFormation, Auto Scaling, Lambda). It is best to keep things simple, at least for an initial deployment. You can then automate it with CloudFormation later.
The most difficult part of Auto Scaling is actually determining the best Scaling Policies to use. A general rule is to quickly add capacity when it is needed, and then slowly remove capacity when it is no longer needed. This way, you can avoid churn, where instances are added and removed within short spaces of time.
The simplest setup would be:
Scale-out when the queue size is larger than X (To be determined by testing)
Scale-in when the queue is empty (You can later tweak this to be more efficient)
Use the ApproximateNumberOfMessagesVisible metric for your scaling policies. (See Amazon SQS Metrics and Dimensions). It provides a count of messages waiting to be processed. However, I have seen situations where a zero count is not actually sent as a metric, so also trigger your scale-in policy on an alarm status of INSUFFICIENT_DATA, which also means that the queue is empty.
There is no need to use AWS Lambda functions unless you have very complex requirements for when to scale.
If your requests come on a regular basis throughout the day, set the minimum to one instance to always have capacity available.
If your requests are infrequent (and there could be several hours with no requests coming in), then set the minimum to zero instances so you save money.
You will need to experiment to determine the best queue size that should trigger a scale-out event. This depends upon how frequently the messages arrive and how long they take to process. You can also experiment with the Instance Type -- figure out whether it is better to have many smaller (eg T2) instances, or fewer larger instances (eg M4 or C4, depending upon need).
If you do not need to process the requests within a short time period (that is, you can be a little late sometimes), you could consider using spot pricing that will dramatically lower your costs, with the potential to occasionally have no instances running due to a high spot price. (Or, just bid high and accept that occasionally you'll pay more than on-demand prices but in general you will save considerable costs.)
Create all of the above manually in the console, then experiment and measure results. Once it is finalized, you can then implement it as a CloudFormation stack if desired.
Update:
The Auto Scaling screens will only create an alarm based on EC2. To create an alarm on a different metric, first create the alarm, then put it in the policy.
To add a rule based on an Amazon SQS queue:
Create an SQS queue
Put a message in the queue (otherwise the metrics will not flow through to CloudWatch)
Create an alarm in CloudWatch based on the ApproximateNumberOfMessagesVisible metric (which will appear after a few minutes)
Edit your Auto Scaling policies to use the above alarm

AWS: Autoscaling based on the size of the queue

AWS auto scaling works based on the load (number of concurrent requests). It works perfectly for web sites and web APIs. However there are situations in which the number of required EC2 instances is not related to the requests but it depends on something else such as number of items in a queue.
For example an order processing system which pulls the orders from a custom queue (and not SQS) might need to scale out to process the order quicker. How can we make this happpen?
Auto scaling groups can be configured to scale in or out by linking their scaling policies to Cloud Watch alarms. Many people use CPU utilization as a scaling trigger but you can use any Cloud Watch metric you like. In your case you could use your queue's ApproximateNumberOfMessagesVisible metric.
For example, if you create an alarm that fires when the ApproximateNumberOfMessagesVisible > 500 and link that to the scale out policy of your auto scaling group, the group will create new instances whenever the queue has more that 500 messages in it.