AWS High Resolution Metrics for faster ECS scaling - amazon-web-services

I have a complex REST API deployed in AWS ECS. The autoscaling policy for the same is based on RequestCount of 2000.
The scale out will happen when RequestCount is consistently higher than 2000 with standard resolution per 60 seconds. This takes at least 2 minutes before scaling happens. This is becoming a problem with short-time request surge when request count increases to 10k and above. The containers start rejecting requests(throttling).
I need to at least make the scaling happen more quickly within a minute if not within seconds. AWS CloudWatch seems to offer High-Resolution metrics, but there's very less information about:
Can I enable specific metrics with high-resolution. Is it possible that I can have request counts resolved at high granularity of 5 seconds and CPUUtilization at standard granularity of 1 minute?
How can I enable high resolution on AWS metrics?
The AWS CloudWatch Documentation seems to be insufficient to understand this process.

There's two different things that can be 'high resolution', the alarm and the metric.
A High Resolution metric just means the source is pushing values more frequently. You can't control this if your using an AWS metric, and most of them don't push more often than once a minute.
A High Resolution alarm is one where the period is less than 60 seconds and will be billed at a higher rate than standard alarms. However, this isn't very useful in most cases if the metric your basing it on only gets pushed once per minute
EDIT:
To directly answer your questions
No, I don't think any of the AWS RequestCount metrics for things like ELB have a 'high resolution on/off' toggle (although ELB might push more frequently than 1 minute by default, I'm not sure)
its based on how often the source pushes data points to cloudwatch. If the AWS metrics don't work for what you need, you would need to add something like the CloudWatch agent (or just a script in your instance) pushing metric more frequently. Be careful about the CloudWatch API call charges if you do this from a lot of sources at a high frequency though

Related

How to increase resolution of CpuUtilization metric of ECS cluster past 1 min mark?

I'm trying to create a robust autoscaling process for my ECS cluster but am facing problems with resolution of CpuUtilization metric. I have turned on 'Detailed metrics' for 1-min resolution, but am not able to achieve good scaling results. I am deploying an ML model which takes roughly 1.5s to infer. I am not facing any memory bottleneck and hence, am using CpuUtilization for scaling.
I need fast scaling as when requests start piling up the response time easily shoots up to 3-5s. Currently, with 'Detailed Metrics' enabled. The scale-out time takes around 3-5 miuntes to start as 3 datapoints are checked for 1-min res metrics. If I have 5-10s res metric, then I can look at 6 data points within 30s and start the scale-out job faster.
I tried using Lambda, StepFunctions and EventBridge from this blog. But, I am not able to get CpuUtilization or MemoryUtilization, only the task, service and container counts.
Is there a way to get Cpu and Memory metrics directly from ECS? I know we can use cloudwatch.get_metric_statistics(). But, we can only get datapoints that are reported to CloudWatch. So, not useful.
You can't change that. 1 min value is set by AWS. The only thing you can do to get better resolution is to create your own custom metrics. Custom metrics can have resolution of 1 second.

Google Cloud Monitoring Writing Datapoints Faster than Maximum Sampling Period

Context:
I'm attempting to use Google Cloud's monitoring SDK to publish metrics on error status codes, latency and other server-side metrics.
Due to the rate of requests per second on my machines, this will exceed Google's metric limit of 1 datapoint for 10 seconds.
I am using the instance_id as one of the labels, so they will be unique per machine, but I will still exceed the 1 datapoint / 10 seconds.
Question:
As mentioned in a similar question, here, an option would be to log, buffer and forward the messages. This seems strange to have each customer implement this for common high-rate metric use-cases.
Is there an alternative way of recording high-rate metrics with the sdk such as latency, num requests, num errors?
Resources:
https://cloud.google.com/monitoring/quotas
https://cloud.google.com/monitoring/custom-metrics/creating-metrics#monitoring_create_metric-nodejs
One or more points were written more frequently than the maximum sampling period configured for the metric

What should minRCU and minWCU should be set for dynamodb in case of spikes only for few times?

We have a service built in AWS which only gets traffic for few minutes in entire day and then there is no traffic at all. During the burst, say, we get traffic at 200 TPS otherwise, traffic is almost zero during the entire day. This dynamodb has auto scaling enabled.
The thing I wanted to know is how should we set minWCU and minWCU for it. Should it be determined by the most traffic we expected to traffic or the minimum traffic we receive? If I do minimum traffic, say 10, and set utilization as 50%, then I see that some events gets throttled since autoscaling takes time to increase capacity units. But setting the min capacity units according to most traffic that we receive increases the cost of dynamodb, in which case we are incurring cost even when we are not using the dynamodb at all. So, are there any best practices regarding this case?
For your situation, you might be better going with on-demand mode.
DynamoDB on-demand offers pay-per-request pricing for read and write requests so that you pay only for what you use.
This frees you from managing RCUs, WCUs, and autoscaling. There would be no need for pro-active scaling
Be sure to review the considerations before making that change
If you do not have consistent traffic then its better to set to close to what the burst is, as it takes 5 minutes before scaling up you might find your credits depleted before it scales.

How can I tell if CPUUtilization in AWS is the reason my website sign-up is timing out?

Our website is hosted on AWS in a t2.small instance. User-facing sign-up is currently timing out.
Initially, I was getting a loadbalancer latency alarm notification for this instance, so I increased the limit, which seemed to work as a temporary solution.
However, once I increased the limit, I started getting 2 other alarm notifications, which were as follows:
1) production-remove-capacity-alarm
Description: None
Threshold: CPUUtilization <= 40 for 3 datapoints within 15 minutes
2) AWSEBCloudwatchAlarmLow
Description: ElasticBeanstalk Default Scale Down alarm
Threshold: NetworkOut < 2,000,000 for 1 datapoints within 5 minutes
It seems to me that I should simply change the alarm notifications so that I'm no longer alerted to #2, as I don't see how this is interfering with anything, but please correct me if I seem to be missing something.
Regarding #1, does it seem likely that somehow adjusting CPU Utilization in AWS will solve the timeout issue with website sign-up?
And if so, what specifically ought to be done?
Everything is okay. Don't panic.
The first priority is that your application operates correctly. Hopefully your adjustment to the instance type) satisfactorily fixed this (but it is still worth watching).
The above two alarms are basically saying:
CPU is under 40%
There's not a lot of network traffic
These alarms can be used to scale-in instances (reduce the number of instances) so that you are not paying for excess capacity. There would be similar alarms that let you scale-out (add additional instances).
ALARM simply means the check is True. That is, the condition has been satisfied. It does not necessarily indicate a problem.
I'm going to presume that you currently have only one instance running. If so, you can ignore those alarms (and Auto Scaling will ignore them too) because you are already at the minimum capacity.
If Auto Scaling has been configured to scale-out to more instances, these alarms would later scale-in to save you money. They're probably a bit trigger-happy, only looking at 15 minutes CPU and 5 minutes of network traffic — it would normally be better to wait for a longer period before deciding to remove capacity.
Bottom line: If your application is running correctly and you are only operating a single instance, there's nothing to worry about. It's all working as expected.

Limit AWS-Lambda budget

AWS Lambda seems nice for running stress tests.
I understand that is it should be able scale up to 1000 instances, and you are charged by 0.1s rather than per hour, which is handy for short stress tests. On the other hand, automatically scaling up gives you even less control over costs than EC2. For development having explicit budget would be nice. I understand that Amazon doesn't allow for explicit budgets since they can bring down websites in their moment of fame. However, for development having explicit budget would be nice.
Is there a workaround, or best practices for managing cost of AWS Lambda services during development? (For example, reducing the maximum time per request)
Yes, every AWS Lambda function has a setting for defining maximum duration. The default is a few seconds, but this can be expanded to 5 minutes.
AWS also has the ability to define Budgets and Forecasts so that you can set a budget per service, per AZ, per region, etc. You can then receive notifications at intervals such as 50%, 80% and 100% of budget.
You can also create Billing Alarms to be notified when expenditure passes a threshold.
AWS Lambda comes with a monthly free usage tier that includes 3 million seconds of time (at 128MB of memory).
It is unlikely that you will experience high bills with AWS Lambda it is being used for its correct purpose, which is running many small functions (rather than for long-running purposes, for which EC2 is better).