AWS Lambda - Provisioned and Reserved Concurrency - amazon-web-services

I have both provisioned and reserved concurrency set in my Lambda. Some of the connections are time consuming, so i have it as part of lambda INIT block and have set the provisioned concurrency to avoid cold-start related issues. Also, I need reserved concurrency, as I want a way to limit the connections made to the EC2 hosted db RO nodes to avoid too many connections to it.
I noticed that ConcurrentExecutions metric always report ReservedConcurrency - ProvisionedConcurrency, I wonder why it doesn't scale up to full ReservedConcurency target configured in the lambda? For eg. if ProvisionedConcurrency is 10, ReservedConcurrency is 30, I would have expected 10 lambdas provisioned during deployment and then the function can scale to accept maximum 30 concurrent executions whereas whereas I see ConcurrentExecutions metric as 20 in Cloudwatch.

Related

AWS High Resolution Metrics for faster ECS scaling

I have a complex REST API deployed in AWS ECS. The autoscaling policy for the same is based on RequestCount of 2000.
The scale out will happen when RequestCount is consistently higher than 2000 with standard resolution per 60 seconds. This takes at least 2 minutes before scaling happens. This is becoming a problem with short-time request surge when request count increases to 10k and above. The containers start rejecting requests(throttling).
I need to at least make the scaling happen more quickly within a minute if not within seconds. AWS CloudWatch seems to offer High-Resolution metrics, but there's very less information about:
Can I enable specific metrics with high-resolution. Is it possible that I can have request counts resolved at high granularity of 5 seconds and CPUUtilization at standard granularity of 1 minute?
How can I enable high resolution on AWS metrics?
The AWS CloudWatch Documentation seems to be insufficient to understand this process.
There's two different things that can be 'high resolution', the alarm and the metric.
A High Resolution metric just means the source is pushing values more frequently. You can't control this if your using an AWS metric, and most of them don't push more often than once a minute.
A High Resolution alarm is one where the period is less than 60 seconds and will be billed at a higher rate than standard alarms. However, this isn't very useful in most cases if the metric your basing it on only gets pushed once per minute
EDIT:
To directly answer your questions
No, I don't think any of the AWS RequestCount metrics for things like ELB have a 'high resolution on/off' toggle (although ELB might push more frequently than 1 minute by default, I'm not sure)
its based on how often the source pushes data points to cloudwatch. If the AWS metrics don't work for what you need, you would need to add something like the CloudWatch agent (or just a script in your instance) pushing metric more frequently. Be careful about the CloudWatch API call charges if you do this from a lot of sources at a high frequency though

What should minRCU and minWCU should be set for dynamodb in case of spikes only for few times?

We have a service built in AWS which only gets traffic for few minutes in entire day and then there is no traffic at all. During the burst, say, we get traffic at 200 TPS otherwise, traffic is almost zero during the entire day. This dynamodb has auto scaling enabled.
The thing I wanted to know is how should we set minWCU and minWCU for it. Should it be determined by the most traffic we expected to traffic or the minimum traffic we receive? If I do minimum traffic, say 10, and set utilization as 50%, then I see that some events gets throttled since autoscaling takes time to increase capacity units. But setting the min capacity units according to most traffic that we receive increases the cost of dynamodb, in which case we are incurring cost even when we are not using the dynamodb at all. So, are there any best practices regarding this case?
For your situation, you might be better going with on-demand mode.
DynamoDB on-demand offers pay-per-request pricing for read and write requests so that you pay only for what you use.
This frees you from managing RCUs, WCUs, and autoscaling. There would be no need for pro-active scaling
Be sure to review the considerations before making that change
If you do not have consistent traffic then its better to set to close to what the burst is, as it takes 5 minutes before scaling up you might find your credits depleted before it scales.

Amazon Web Service Lambda Low Invocations from SQS Trigger

I have an AWS Lambda Function setup with a trigger from a SQS queue. Current the queue has about 1.3m messages available. According to CloudWatch the Lambda function has only ever reached 431 invocations in a given minute. I have read that Lambda supports 1000 concurrent functions running at a time, so I'm not sure why it would be maxing out at 431 in a given minute. As well it looks like my function only runs for about 5.55s or so on average, so each one of those 1000 available concurrent slots should be turning over multiple times per minute, therefor giving a much higher rate of invocations.
How can I figure out what is going on here and get my Lambda function to process through that SQS queue in a more timely manner?
The 1000 concurrent connection limit you mention assumes that you have provided enough capacity.
Take a look at this, particularly the last bit.
https://docs.aws.amazon.com/lambda/latest/dg/vpc.html
If your Lambda function accesses a VPC, you must make sure that your
VPC has sufficient ENI capacity to support the scale requirements of
your Lambda function. You can use the following formula to
approximately determine the ENI capacity.
Projected peak concurrent executions * (Memory in GB / 3GB)
Where:
Projected peak concurrent execution – Use the information in Managing Concurrency to determine this value.
Memory – The amount of memory you configured for your Lambda function.
The subnets you specify should have sufficient available IP addresses
to match the number of ENIs.
We also recommend that you specify at least one subnet in each
Availability Zone in your Lambda function configuration. By specifying
subnets in each of the Availability Zones, your Lambda function can
run in another Availability Zone if one goes down or runs out of IP
addresses.
Also read this article which points out many things that might be affecting you: https://read.iopipe.com/5-things-to-know-about-lambda-the-hidden-concerns-of-network-resources-6f863888f656
As a last note, make sure your SQS Lambda trigger has a batchSize of 10 (max available).

How can I tell if CPUUtilization in AWS is the reason my website sign-up is timing out?

Our website is hosted on AWS in a t2.small instance. User-facing sign-up is currently timing out.
Initially, I was getting a loadbalancer latency alarm notification for this instance, so I increased the limit, which seemed to work as a temporary solution.
However, once I increased the limit, I started getting 2 other alarm notifications, which were as follows:
1) production-remove-capacity-alarm
Description: None
Threshold: CPUUtilization <= 40 for 3 datapoints within 15 minutes
2) AWSEBCloudwatchAlarmLow
Description: ElasticBeanstalk Default Scale Down alarm
Threshold: NetworkOut < 2,000,000 for 1 datapoints within 5 minutes
It seems to me that I should simply change the alarm notifications so that I'm no longer alerted to #2, as I don't see how this is interfering with anything, but please correct me if I seem to be missing something.
Regarding #1, does it seem likely that somehow adjusting CPU Utilization in AWS will solve the timeout issue with website sign-up?
And if so, what specifically ought to be done?
Everything is okay. Don't panic.
The first priority is that your application operates correctly. Hopefully your adjustment to the instance type) satisfactorily fixed this (but it is still worth watching).
The above two alarms are basically saying:
CPU is under 40%
There's not a lot of network traffic
These alarms can be used to scale-in instances (reduce the number of instances) so that you are not paying for excess capacity. There would be similar alarms that let you scale-out (add additional instances).
ALARM simply means the check is True. That is, the condition has been satisfied. It does not necessarily indicate a problem.
I'm going to presume that you currently have only one instance running. If so, you can ignore those alarms (and Auto Scaling will ignore them too) because you are already at the minimum capacity.
If Auto Scaling has been configured to scale-out to more instances, these alarms would later scale-in to save you money. They're probably a bit trigger-happy, only looking at 15 minutes CPU and 5 minutes of network traffic — it would normally be better to wait for a longer period before deciding to remove capacity.
Bottom line: If your application is running correctly and you are only operating a single instance, there's nothing to worry about. It's all working as expected.

How to configure EC2 autoscaling based on multiple limits on same metric?

My primary requirement is as follows:
When CPU consumption on an instance exceeds 50 % then adjust capacity of autoscaling group to 5 instances, when CPU consumption exceeds 80% then adjust capacity to 10 instances.
However if I use cloudwatch alarms to set capacity I can imagine the following race condition:
5 instances exist
CPU consumption exceeds 80 %
Alarm is triggered
Capacity is changed to 19 instances
CPU consumption drops below 50 %
Eventually CPU consumption again exceeds 50% but now capacity will be changed to 5 instances (which is something I don't want to happen)
So what I would ideally like to happen is that in response to alarm triggers I would like to ensure that capacity is altleast the corresponding threshold.
I am aware that this can be done by manually setting the capacity through AWS SDK - which could be triggered in response to lifecycle events monitored by a supervisor, but is there a better approach, preferably one that does not require setting up additional supervisors or webhooks for alarms ?
A general approach is to fine grain the scaling actions:
Do not jump that big:
if the ASG avg CPU is over 70% > Add an instance
if the ASG avg CPU is over 90% > Add "n" instances
if the ASG avg CPU is under 40% > remove an instance
if the ASG avg CPU is under 10% > remove "n" instance
All of these values are the last 5 mins AVG. So if you have a really fast pike, you need more aggressive scaling. So in half an hour you can easily add 6 servers or even more.
Also scaling works better with higher numbers. So if your system needs only 1-3 instances, it may make sense to decrease the instance size so you can have 2-6 instances. It give some extra flexibility to your system.
But again, the question is, what is your expected load? Big pikes or an expected up and down during the day?
I would suggest looking into an AWS lambda function, triggered by an SNS message from cloudwatch - it should give you free reign to put as much logic into the scaling decision as you want.
Good Luck!