Currently my read replicas only scale out to keep the average cpu use at 70%, this leads to a brief period where there is very few and they're taking all the load making my application slow.
How do I preemptively scale them out in preparation for this sudden load, from what I see you can't actually trigger the scaling operations manually using cloudwatch alarms?
Since you wish to preemptively modify the number of read replicas, it is not possible to trigger the scaling from an Amazon CloudWatch Alarm (since the need to scale has not yet happened).
Instead, you could call modify_db_cluster(), specifying a higher MinCapacity. This will cause the cluster to launch at least that many Read Replicas, so it will be ready for your usage spike.
Later in the day, you could make another call to lower the MinCapacity that it can scale down to.
You could put these calls in AWS Lambda functions, using Amazon CloudWatch Events to trigger the functions at the desired times.
Related
I have an auto-scaling group of EC2 servers that run a number of processes.
This number of processes changes with the load and I'd like to trigger a scaling (up/down) based on the number of processes.
I've successfully set up a script that sends to Cloudwatch the number of processes on every servers, for every minutes, and I can see these on Cloudwatch. (I haven't set a dimension, to be able to get the value for all the servers).
Then, I created an Alarm, that uses the average for the values sent, and if it reach a certain limit, it triggers the "Add a new server" to the auto scaling group, and when it stop being on alarm, it triggers a "Remove a server".
My issue is that when I add the new server, the average drops, since there is one more server now, which move the alarm to the ok state, removing the server, and increasing again the average, triggering again the alarm, etc.
For instance, the limit is set to 10 processes on average. With 3 servers, if the average becomes 11, I trigger the alarm state, adding a server. Now with the new server, I'm at 33 processes (3 x 11) for 4 servers : 8,25 processes on average, thus triggering the "OK" alarm.
My question is: Is it possible to set up an alarm based on the number of processes without having the new trigger causes a up-down-up-down issue?
Instead of average, I can use something else to trigger the alarm, such as min/max/I-don't-know.
Thank you for your help. Happy to provide any other details if needed.
You should not create an alarm that adds instances when True and removes instances when False. This will cause a continual 'flip-flop' situation rather than trying to find a steady-state.
You could have each server regularly send a custom metric to Amazon CloudWatch. You could then use this with Target tracking scaling policies for Amazon EC2 Auto Scaling - Amazon EC2 Auto Scaling, which will calculate the average value of the metric and automatically launch/terminate instances to keep the target value around 10.
This would work well with long-running processes (perhaps 5+ minutes with several processes running concurrently), but would not be good with short sub-minute processes because it takes time to launch new instances.
I think you could look at metric math. So instead of directly triggering your alarm based on your process-count-metric only, you could perhabs calculate the average count yourself using metric math. You could use the GroupTotalInstances metric from your ASG, or just publish second custom metric having the number of instances.
In both cases, your metric for the alarm would use metric math to divide number of processes by size of ASG for each evaluation period.
I have a complex REST API deployed in AWS ECS. The autoscaling policy for the same is based on RequestCount of 2000.
The scale out will happen when RequestCount is consistently higher than 2000 with standard resolution per 60 seconds. This takes at least 2 minutes before scaling happens. This is becoming a problem with short-time request surge when request count increases to 10k and above. The containers start rejecting requests(throttling).
I need to at least make the scaling happen more quickly within a minute if not within seconds. AWS CloudWatch seems to offer High-Resolution metrics, but there's very less information about:
Can I enable specific metrics with high-resolution. Is it possible that I can have request counts resolved at high granularity of 5 seconds and CPUUtilization at standard granularity of 1 minute?
How can I enable high resolution on AWS metrics?
The AWS CloudWatch Documentation seems to be insufficient to understand this process.
There's two different things that can be 'high resolution', the alarm and the metric.
A High Resolution metric just means the source is pushing values more frequently. You can't control this if your using an AWS metric, and most of them don't push more often than once a minute.
A High Resolution alarm is one where the period is less than 60 seconds and will be billed at a higher rate than standard alarms. However, this isn't very useful in most cases if the metric your basing it on only gets pushed once per minute
EDIT:
To directly answer your questions
No, I don't think any of the AWS RequestCount metrics for things like ELB have a 'high resolution on/off' toggle (although ELB might push more frequently than 1 minute by default, I'm not sure)
its based on how often the source pushes data points to cloudwatch. If the AWS metrics don't work for what you need, you would need to add something like the CloudWatch agent (or just a script in your instance) pushing metric more frequently. Be careful about the CloudWatch API call charges if you do this from a lot of sources at a high frequency though
We have a service built in AWS which only gets traffic for few minutes in entire day and then there is no traffic at all. During the burst, say, we get traffic at 200 TPS otherwise, traffic is almost zero during the entire day. This dynamodb has auto scaling enabled.
The thing I wanted to know is how should we set minWCU and minWCU for it. Should it be determined by the most traffic we expected to traffic or the minimum traffic we receive? If I do minimum traffic, say 10, and set utilization as 50%, then I see that some events gets throttled since autoscaling takes time to increase capacity units. But setting the min capacity units according to most traffic that we receive increases the cost of dynamodb, in which case we are incurring cost even when we are not using the dynamodb at all. So, are there any best practices regarding this case?
For your situation, you might be better going with on-demand mode.
DynamoDB on-demand offers pay-per-request pricing for read and write requests so that you pay only for what you use.
This frees you from managing RCUs, WCUs, and autoscaling. There would be no need for pro-active scaling
Be sure to review the considerations before making that change
If you do not have consistent traffic then its better to set to close to what the burst is, as it takes 5 minutes before scaling up you might find your credits depleted before it scales.
I have a not-so-complicated situation but it can be complicated on AWS cloudformations:
I would like to autoscale up and down based on the number of messages on SQS.
But I am not sure what I need to specify on AWS cloudformation, I would imagine that I would need:
some sort of lambda/cloudformation that perform query on the current number of instances on AutoScalingGroup
some sort of lambda/cloudformation that perform query on the current number of messages on SQS.
some comparison operations that compares #1 and #2.
create scale up policy when #1 < #2
create scale down policy when #1 > #2
Not sure where I should get started... can someone kind enough to show some examples?
You have several different concepts all mixed together (CloudFormation, Auto Scaling, Lambda). It is best to keep things simple, at least for an initial deployment. You can then automate it with CloudFormation later.
The most difficult part of Auto Scaling is actually determining the best Scaling Policies to use. A general rule is to quickly add capacity when it is needed, and then slowly remove capacity when it is no longer needed. This way, you can avoid churn, where instances are added and removed within short spaces of time.
The simplest setup would be:
Scale-out when the queue size is larger than X (To be determined by testing)
Scale-in when the queue is empty (You can later tweak this to be more efficient)
Use the ApproximateNumberOfMessagesVisible metric for your scaling policies. (See Amazon SQS Metrics and Dimensions). It provides a count of messages waiting to be processed. However, I have seen situations where a zero count is not actually sent as a metric, so also trigger your scale-in policy on an alarm status of INSUFFICIENT_DATA, which also means that the queue is empty.
There is no need to use AWS Lambda functions unless you have very complex requirements for when to scale.
If your requests come on a regular basis throughout the day, set the minimum to one instance to always have capacity available.
If your requests are infrequent (and there could be several hours with no requests coming in), then set the minimum to zero instances so you save money.
You will need to experiment to determine the best queue size that should trigger a scale-out event. This depends upon how frequently the messages arrive and how long they take to process. You can also experiment with the Instance Type -- figure out whether it is better to have many smaller (eg T2) instances, or fewer larger instances (eg M4 or C4, depending upon need).
If you do not need to process the requests within a short time period (that is, you can be a little late sometimes), you could consider using spot pricing that will dramatically lower your costs, with the potential to occasionally have no instances running due to a high spot price. (Or, just bid high and accept that occasionally you'll pay more than on-demand prices but in general you will save considerable costs.)
Create all of the above manually in the console, then experiment and measure results. Once it is finalized, you can then implement it as a CloudFormation stack if desired.
Update:
The Auto Scaling screens will only create an alarm based on EC2. To create an alarm on a different metric, first create the alarm, then put it in the policy.
To add a rule based on an Amazon SQS queue:
Create an SQS queue
Put a message in the queue (otherwise the metrics will not flow through to CloudWatch)
Create an alarm in CloudWatch based on the ApproximateNumberOfMessagesVisible metric (which will appear after a few minutes)
Edit your Auto Scaling policies to use the above alarm
I am trying to setup a EC2 Scaling group that scales depending on how many items are in an SQS queue.
When the SQS queue has items visible I need the Scaling group to have 1 instance available and when the SQS queue is empty (e.g. there are no visible or non-visible messages) I want there to be 0 instances.
Desired instances it set to 0, min is set to 0 and max is set to 1.
I have setup cloudwatch alarms on my SQS queue to trigger when visible messages are greater than zero, and also triggers an alarm when non visible messages are less than one (i.e no more work to do).
Currently the Cloudwatch Alarm Triggers to create an instance but then the scaling group automatically kills the instance to meet the desired setting. I expected the alarm to adjust the desired instance count within the min and max settings but this seems to not be the case.
Yes, you can certainly have an Auto Scaling group with:
Minimum = 0
Maximum = 1
Alarm: When ApproximateNumberOfMessagesVisible > 0 for 1 minute, Add 1 Instance
This will cause Auto Scaling to launch an instance when there are messages waiting in the queue. It will keep trying to launch more instances, but the Maximum setting will limit it to 1 instance.
Scaling-in when there are no messages is a little bit tricker.
Firstly, it can be difficult to actually know when to scale-in. If there are messages waiting to be processed, then ApproximateNumberOfMessagesVisible will be greater than zero. However, there are no messages waiting, it doesn't necessarily mean you wish to scale-in because messages might be currently processing ("in flight"), as indicated by ApproximateNumberOfMessagesNotVisible. So, you only want to scale-in if both of these are zero. Unfortunately, a CloudWatch alarm can only reference one metric, not two.
Secondly, when an Amazon SQS queue is empty, it does not send metrics to Amazon CloudWatch. This sort of makes sense, because queues are mostly empty, so it would be continually sending a zero metric. However, it causes a problem that CloudWatch does not receive a metric when the queue is empty. Instead, the alarm will enter the INSUFFICIENT_DATA state.
Therefore, you could create your alarm as:
When ApproximateNumberOfMessagesVisible = 0 for 15 minutes, Remove 1 instance but set the action to trigger on INSUFFICIENT_DATA rather than ALARM
Note the suggested "15 minutes" delay to avoid thrashing instances. This is where instances are added and removed in rapid succession because messages are coming in regularly, but infrequently. Therefore, it is better to wait a while before deciding to scale-in.
This leaves the problem of having instances terminated while they are still processing messages. This can be avoided by taking advantage of Auto Scaling Lifecycle Hooks, which send a signal when an instance is about to be terminated, giving the application the opportunity to delay the termination until work is complete. Your application should then signal that it is ready for termination only when message processing has finished.
Bottom line
Much of the above depends upon:
How often your application receives messages
How long it takes to process a message
The cost savings involved
If your messages are infrequent and simple to process, it might be worthwhile to continuously run a t2.micro instance. At 2c/hour, the benefit of scaling-in is minor. Also, there is always the risk when adding and removing instances that you might actually pay more, because instances are charged by the hour -- running an instance for 30 minutes, terminating it, then launching another instance for 30 minutes will actually be charged as two hours.
Finally, you could consider using AWS Lambda instead of an Amazon EC2 instance. Lambda is ideal for short-lived code execution without requiring a server. It could totally remove the need to use Amazon EC2 instances, and you only pay while the Lambda function is actually running.
for simple conf, with per sec aws ami/ubuntu billing dont worry about wasted startup/shutdown time, just terminate your ec2 by yourself, w/o any asg down policy add a little bash in client startup code or preinstal it in cron and poll for process presence or cpu load and term ec2 or shutdown (termination is better if you attach volumes and need 'em to autodestruct) after processing is done. there's ane annoying thing about asg defined as 0/0/1 (min/desired/max) with defaults and ApproximateNumberOfMessagesNotVisible on sqs - after ec2 is fired somehow it switches to 1/0/1 and it start to loop firing ec2 even if there's nothing is sqs (i'm doing video transcoding, queing jobs to do to sns/sqs and firing ffmpeg nodes with asg defined on non empty sqs)