I have a few AWS Lambda functions, but the troubleshooting is for one of them. this Lambda function is triggered by message queue, read DynamoDB, process, write DynamoDB. it is called up to 10 requests per second and I have set Lambda provision concurrency. Average Lambda duration is 60 ms which I am very happy with. But every day there are around 10 instances which Lambda function duration is more than 1 second up to 3 second timeout.
I put log in my Lambda, during duration spikes, read/write (getitem/putitem) DynamoDB took more than 1 second. Dynamodb is set to on-demend. it is a very simple table, two columns, ID (auto number) and a json string(about 1KB). I have tried Redis, but weird enough, still had spikes. Lambda is not put in VPC. Dynamo connection has been set to http timeout 500, max retry to 2.
Code to read DynamodDB:
Log for Duration:
When using provisioned concurrency, the Lambda service would keep a set number of the underlying containers "warm" so as to minimize start up time. Since you mention that you intermittently face higher execution durations, refer to the below debugging steps which you can do:
Check the "Concurrent Executions" metric for the Lambda function against the "Duration" metric: If the number of instances of the function executing at a particular time is higher than the set provisioned concurrency, then that would imply that s few of these instances had cold starts causing the higher duration.
Enable X-Ray tracing for the Lambda function and also add X-ray instrumentation to your code: This would give a complete idea of which network call takes up too much time and also give you the cold start "init" duration (if any).
Related
I have an android app that uses AWS lambda as a backend to fetch data - which usually takes a few seconds of execution time per request.
I would like to be able to fetch data and cache it at the end of the month if there is still free execution time available, however, I could not find a way to access that information inside a lambda function. Is there a way to do this?
if there is still free execution time available
You cannot directly do that, need to perform cloudwatch log aggregation as told by #Dan M, but the most ideal way would be to always set an alarm and notification once your overall execution time( across all functions ) exceeds your desired value
It is a very straightforward approach using cloudwatch alarms :-
You need to choose metric based on dimension which is called as across all functions to be able to view aggregated execution time for all functions in that region. Check how to view all functions
There are 2 types of metrics Invocation and performance metrics, you need to focus on performance metrics, for these metrics you need to consider their average statistic
Now create cloudwatch alarm for Duration metric ( under performance metric) and add sns rule when ever alarm breaches your desired value
According to Docs for duration metric
The amount of time that your function code spends processing an event. The billed duration for an invocation is the value of Duration rounded up to the nearest millisecond.
Docs for creating cloud watch alarm
You need to have your Lambda function send logs to Cloudwatch. Then, whenever it executes, you get something like:
REPORT RequestId: c9999f19-0b99-4996-8dea-94c9999999de Duration: 5495.59 ms Billed Duration: 5496 ms Memory Size: 128 MB Max Memory Used: 82 MB Init Duration: 306.78 ms
You can then export the logs and using whatever tool you like get your monthly total.
I have a lambda function that accepts a parameter i.e a category_id, pulls some data from an API, and updates the database based on the response.
I have to execute the same lambda function for Multiple Ids after an interval of 1 minute on daily basis.
For example, run lambda for category 1 at 12:00 AM, then run for category 2 at 12:01 AM and so one for 500+ categories.
What could be the best possible solution to achieve this?
This is what I am currently thinking:
Write Lambda using AWS SAM
Add Lambda Layer for Shared Dependencies
Attach Lambda with AWS Cloudwatch Events to run it on schedule
Add Environment Variable for category_id in lambda
Update the SAM template to use the same lambda function again and again but only change will be in the Cron expression schedule and Value of Environment Variable category_id
Problems in the Above Solution:
Number of Lambda functions will increase in the account.
Each Lambda will be attached with a Cloudwatch Event so its number will also increase
There is a quota limit of max 300 Cloudwatch Event per account (though we can request support to increase that limit)
It'll require the use of nested stacks because of the SAM template size limit as well as the number of resources per template which 200 max.
I'll be able to create only 50 Lambda Functions per nested stack, it means the number of nested stacks will also increase because 1 lambda = 4 resources (Lambda + Role + Rule + Event)
Other solutions (not sure if they can be used):
Use of Step Functions
Trigger First Lambda function only using Cron Schedule and Invoke Lambda for the next category using current lambda(only one CloudWatch Event will be required to invoke the function for the first category but time difference will vary i.e next lambda will not execute exactly after one minute).
Use Only One Lambda and One Cloud Watch Schedule Event, Lambda Function will have a list of all category ids and that function will invoke itself recursively by using one category id at a time and removing the use category id from the list (the only problem is lambda will not execute exactly after one minute for next category_id in the list)
Looking forward to hearing about the best solution.
I would suggest using a standard Worker pattern:
Create an Amazon SQS queue
Configure the AWS Lambda function so that it is triggered to run whenever a message is sent to the SQS queue
Trigger a separate process at midnight (eg another Lambda function) that sends the 500 messages to the SQS queue, each with a different category ID
This will cause the Amazon SQS functions to execute. If you only want one of the Lambda functions to be running at any time (with no parallel executions), set the function's Concurrency Limit to 1 so that only one is running at any time. When one function completes, Lambda will automatically grab another message from the queue and start executing. There will be practically no "wasted time" between executions of the function.
Given that you are doing a large amount of processing, an Amazon EC2 instance might be more appropriate.
If the bandwidth requirements are low (eg if it is just making API calls), then a T3a.micro ($0.0094 per Hour) or even T3a.nano instance ($0.0047 per Hour) can be quite cost-effective.
A script running on the instance could process a category, then sleep for 30 seconds, in a big loop. Running 500 categories at one minute each would take about 8 hours. That's under 10c each day!
The instance can then stop or self-terminate when the work is complete. See: Auto-Stop EC2 instances when they finish a task - DEV Community
I have a AWS Lambda function that I invoke with every 1 minute with >1000 SNS events. This is a problem because my account concurrency is set at 3000, so if I start adding more jobs then eventually I'm going to have >3000 concurrent Lambda instances.
Each job takes around 2-5 seconds to complete which means that within each 1 minute window the concurrency limit will only be threatened within the first 5 seconds and I'll have 0 concurrency for the remaining 55 seconds.
If I set a concurrency limit (e.g. 1000) for the lambda will it handle the first 1000 SNS events and then automatically pick up the remainder once the concurrency frees up? And will I only be charged for the actual runtime rather than time spent waiting for concurrency to reduce?
Otherwise, is there a way that AWS will allow me to spread the load of jobs throughout the 1 minute window so that I can invoke the lambda every ~5 seconds with a subset of the total number of jobs?
If I set a concurrency limit (e.g. 1000) for the lambda will it handle the first 1000 SNS events and then automatically pick up the remainder once the concurrency frees up? And will I only be charged for the actual runtime rather than time spent waiting for concurrency to reduce?
Yes. Setting the concurrency limit definitely comes in handy on your use case and is the way to go. This is one of the reasons why concurrency limit actually exists :)
Unfortunately you can't take advantage of batching with SNS because it always sends one and only event. What you could do is to hook up a SQS queue with your SNS topic and have the Lambda function subscribe to the SQS queue instead, then you can take advantage of batching (max batch size is 10), greatly reducing the amount of concurrent Lambda executions, but still, you'd need to set a concurrency limit to make sure you don't use up all the available concurrency.
Otherwise, is there a way that AWS will allow me to spread the load of jobs throughout the 1 minute window so that I can invoke the lambda every ~5 seconds with a subset of the total number of jobs?
No, but this is unnecessary because of the above.
I have a lambda function which does some work. I wanted to create a cloudwatch alarm on it for duration of lambda, i.e. how much time this lambda is taking to run?
I tried to use the following values for the alarm but I am getting a issue with this alarm, probably due to cold start problem. Following are the values I am setting:
Statistic : Average
ComparisonOperator : "GreaterThanThreshold"
Threshold: 1000
EvaluationPeriods: 5
Period: 60
Unit: Milliseconds
The issue I am facing with this is that, it keeps getting into alarm because of the cold start problem probably since it does not get called that often.
What is the best values to set for lambda? How other people are setting alarms on lambda?
Also, if lambda is not called for how much time, then it gets shutdown and a coldstart problem can occur?
Use Blue Matador. The thresholds are dynamic, account for daily variation and cold starts, and use machine learning to detect real anomalies. It does the same thing for all the services that Lambda interacts with (Dynamo, SQS, API gateway, RDS, Kinesis, S3, etc.).
disclaimer: i'm the founder of Blue Matador
If you're looking to do it yourself with Cloudwatch, I would recommend timing out after a certain period of time and returning an error. Then, you can use the Errors metric to tell how many failed over a given time period. It's not a perfect solution, but it could correctly ignore cold starts. We wrote a blog about How to Monitor AWS Lambda with CloudWatch and it includes errors, throttles, and more metrics to watch out for.
I have an AWS Lambda Function setup with a trigger from a SQS queue. Current the queue has about 1.3m messages available. According to CloudWatch the Lambda function has only ever reached 431 invocations in a given minute. I have read that Lambda supports 1000 concurrent functions running at a time, so I'm not sure why it would be maxing out at 431 in a given minute. As well it looks like my function only runs for about 5.55s or so on average, so each one of those 1000 available concurrent slots should be turning over multiple times per minute, therefor giving a much higher rate of invocations.
How can I figure out what is going on here and get my Lambda function to process through that SQS queue in a more timely manner?
The 1000 concurrent connection limit you mention assumes that you have provided enough capacity.
Take a look at this, particularly the last bit.
https://docs.aws.amazon.com/lambda/latest/dg/vpc.html
If your Lambda function accesses a VPC, you must make sure that your
VPC has sufficient ENI capacity to support the scale requirements of
your Lambda function. You can use the following formula to
approximately determine the ENI capacity.
Projected peak concurrent executions * (Memory in GB / 3GB)
Where:
Projected peak concurrent execution – Use the information in Managing Concurrency to determine this value.
Memory – The amount of memory you configured for your Lambda function.
The subnets you specify should have sufficient available IP addresses
to match the number of ENIs.
We also recommend that you specify at least one subnet in each
Availability Zone in your Lambda function configuration. By specifying
subnets in each of the Availability Zones, your Lambda function can
run in another Availability Zone if one goes down or runs out of IP
addresses.
Also read this article which points out many things that might be affecting you: https://read.iopipe.com/5-things-to-know-about-lambda-the-hidden-concerns-of-network-resources-6f863888f656
As a last note, make sure your SQS Lambda trigger has a batchSize of 10 (max available).