Is there a clear way to identify "cold starts"? Either in runtime in the Lambda itself, or via the logs? I know that cold starts are characterized by longer runtimes, which I can actually see, but I'm looking for a clear cut way.
I'm using Node.js if that matters.
Update: There are two good answers below, for two use cases:
- Identifying the cold start as the lambda runs.
- Identifying the cold start from the CloudWatch log.
If you add some initialization code to the top of your NodeJS script, you will be able to tell in the code that it is a cold start, and you will then be able to log that if you want to see it in the logs. For example:
var coldStart = true;
console.log("This line of code exists outside the handler, and only executes on a cold start");
exports.myHandler = function(event, context, callback) {
if (coldStart) {
console.log("First time the handler was called since this function was deployed in this container");
}
coldStart = false;
...
callback(...);
}
Update:
If you only care about seeing cold starts in the logs, Lambda now logs an extra "Init Duration" value in CloudWatch Logs for cold starts.
As an update, AWS now provide visible info on cold starts in the form of "Init Duration" , inside the Report section of a Cloudwatch Log. The calls that do not suffer from a cold start will not contains this information in the log
Duration: 1866.19 ms Billed Duration: 1900 ms Memory Size: 512 MB Max Memory Used: 163 MB Init Duration: 2172.14 ms
If you're looking at CloudWatch logs, each LogGroup for your Lambda function represents a separate container and therefore the first invocation for that LogGroup is your cold start.
Related
I have defined a lambda function that is invoked from API Gateway with proxy integration. Thus, I have defined an eager resource path for it:
And referenced my lambda function:
My lambda is able to process request like GET /myresource, POST /myresource.
I have tried this strategy to keep it warm, described in acloudguru. It consists of setting up a CloudWatch event rule that invokes the lambda every 5 minutes to keep it warm. Unfortunately it isn't working.
This is the behaviour I have seen:
After some period, let's say 20 minutes, I call GET /myresource from API Gateway and it takes around 15 seconds. Subsequent requests last ~30ms. The CloudWatch event is making no difference...
Let's suppose another long period without calling the gateway. If I go to the Lambda console and invoke it directly (test button) it answers right away (less than 1ms) with a 404 (that's normal because my lambda expects GET /myresource or POST /myresource).
Immediately after this lambda console execution I call GET /myresource from API Gateway and it still takes ~20 seconds. That is to say, the function was still cold despite having being invoked from the Lambda console. This might explain why the CloudWatch event doesn't work since it calls the lambda without setting the method/resource-url.
So, how can I make this particular case with API Gateway with proxy integration + Lambda stay warm to prevent those slow first request?
As of now (2019-02-27) [1], A periodic CloudWatch event rule does not deterministically solve the cold start issue. But a periodic CloudWatch event rule will reduce the probability of cold starts.
The reason is it's upto the Lambda server to decide whether to use a new Lambda container instead of an existing container to process an incoming request. Some of the related details regarding how Lambda containers are reused is explained in [1]
In order to reduce the cold start time (not to reduce the number cold starts), can you try followings? 1. increasing the memory allocated to the function, 2. reduce the deployment package size (eg- remove unnecessary dependencies), and 3. use a language like NodeJS, Python instead of Java, .Net
[1]According to reinvent session, (39:50 at https://www.youtube.com/watch?v=QdzV04T_kec), the Lambda team expects to improve the VPC cold start latency in Lambda.
[2] https://aws.amazon.com/blogs/compute/container-reuse-in-lambda/
Denis is quite right about the non deterministic lambda behaviour regarding the number of containers hit by CloudWatch events. I'll follow his advice to improve the startup time.
On the other hand I have managed to make my CloudWatch events hit the lambda function properly, reducing (in many cases) the number of cold starts.
I just had to add an additional controller mapped to "/" with a hardcoded response:
#Controller("/")
class WarmUpController {
private val logger = LoggerFactory.getLogger(javaClass)
#Get
fun warmUp(): String {
logger.info("Warming up")
return """{"message" : "warming up"}"""
}
}
With this in place the default (/) invocation from CloudWatch does keep the container warm most of the time.
After finishing successfully, a Lambda function insists on timing out.
The function's triggering event is s3:ObjectCreated:*.
The function uses MongoDB Atlas and does so according to the optimisation suggestions on https://www.mongodb.com/blog/post/optimizing-aws-lambda-performance-with-mongodb-atlas-and-nodejs, including setting:
context.callbackWaitsForEmptyEventLoop = false; before using the DB.
The function also calls some AWS SDK methods with successfully resolved promises.
After finishing my code successfully and doing everything it set out to do, I get the following in my CloudWatch logs, (both the request's END event and its timeout):
START RequestId: XXX
... my logs...
END RequestId: XXX
REPORT RequestId: XXX Duration: 6001.12 ms Billed Duration: 6000 ms Memory Size: 1024 MB Max Memory Used: 49 MB
XXX Task timed out after 6.00 seconds
The function then repeats itself twice more with the same unfortunate result.
Any immediate suspects? Where should I look?
You need to call callback(null, <any>) in order to end your function handler and tell Lambda that your function executed successfully.
Without that, Lambda will retry the same invocation after a delay and it will again finish but without telling Lambda that it finished successfully.
I'm building a server-less web-tracking system which serves its tracking pixel using AWS API Gateway, which calls a Lambda function whenever a tracking request arrives to write the tracking event into a Kinesis stream.
The Lambda function itself does not do anything fancy. It just a takes the incoming event (its own argument) and writes it to the stream. Essentially, it's just:
import boto3
kinesis_client = boto3.client("kinesis")
kinesis_stream = "my_stream_name"
def return_tracking_pixel(event, context):
...
new_record = ...(event)
kinesis_client.put_record(
StreamName=kinesis_stream,
Data=new_record,
PartitionKey=...
)
return ...
Sometimes I experience a weird spike in the Lambda execution duration that causes some of my Lambda function invocations to time-out and the tracking requests to be lost.
This is the graph of 1-minute invocation counts of the Lambda function in the in affected time period:
Between 20:50 and 23:10 I suddenly see many invocation errors (1-minute error counts):
which are obviously caused by the Lambda execution time-out (maximum duration in 1-minute intervals):
There is nothing weird going on neither with my Kinesis stream (data-in, number of put records, put_record success count etc., all looks normal), nor with my API GW (number of invocations corresponds to number of API GW calls, well within the limits of the API GW).
What could be causing the sudden (and seemingly randomly occurring) spike in the Lambda function execution duration?
EDIT: neither the lambda functions are being throttled, which was my first idea.
Just to add my 2 cents, because there's not much investigative work without extra logging or some X-Ray analysis.
AWS Lambda sometimes will force recycle containers which will feel like cold starts even though your function is being reasonably exercised and warmed up. This might bring all cold start related issues, like extra delays for ENIs if your Lambda has an attached VPC and so on... but even for a simple function like yours, 1 second timeout is sometimes too optimistic for a cold start.
I don't know of any documentation on those forced recycles, other than some people having evidence for it.
"We see a forced recycle about 7 times a day." source
"It also appears that even once warmed, high concurrency functions get recycled much faster than those with just a few in memory." source
I wonder how you could confirm this is the case. Perhaps you could check those errors appearing in Cloud Watch log streams to be from containers that never appeared before.
Hello I am new to AWS Lambda.I want to know what do we mean by Hot Lambda function (Hot Start) and Cold Lambda function (Cold Start) ?? Can anyone please explain me in detail & what is the difference between Hot Lambda and Cold Lambda
After uploading your code or after periods of inactivity your Lambda is shut down or "cold". When a new event comes in there is a brief moment where Lambda spins up a new instance of your code - this includes whatever initializing AWS does to start up the "container" as well as initializing the code that you uploaded.
So an event that is able to hit an initialized("hot") Lambda will in theory be processed faster than hitting a cold one. There isn't a guarantee on how long a Lambda will stay hot after the last event but it could be as long as 5 minutes.
It's a common belief that when people refer to "warm starts" they mean that the same container/sandbox is ready to receive a new connection - but that's not accurate.
Warm Start - invoking a function using a warm container with a prebaked unused sandbox resources from previous invocations are not recycled.
Cold Start - invoking a function when no container/sandbox is ready to receive the request. A new container must be created and the runtime and user code loaded. Cold start's latency is mostly an internal metric, externally, cold starts are only a part of the total overhead that can affect the end-user experience. In some scenarios, we can encounter a portion of the full cold start, think about scale prediction and statistical algorithms
However, using the terms "cold start" and "warm start" might be misleading, as a developer you should care about "Invocation Overheads" - The time it takes to call the user's function and return the response.
I am using aws lambda function to convert uploaded wav file in a bucket to mp3 format and later move file to another bucket. It is working correctly. But there's a problem with triggering. When i upload small wav files,lambda function is called once. But when i upload a large sized wav file, this function is triggered multiple times.
I have googled this issue and found that it is stateless, so it will be called multiple times(not sure this trigger is for multiple upload or a same upload).
https://aws.amazon.com/lambda/faqs/
Is there any method to call this function once for a single upload?
Short version:
Try increasing timeout setting in your lambda function configuration.
Long version:
I guess you are running into the lambda function being timed out here.
S3 events are asynchronous in nature and lambda function listening to S3 events is retried atleast 3 times before that event is rejected. You mentioned your lambda function is executed only once (with no error) during smaller sized upload upon which you do conversion and re-upload. There is a possibility that the time required for conversion and re-upload from your code is greater than the timeout setting of your lambda function.
Therefore, you might want to try increasing the timeout setting in your lambda function configuration.
By the way, one way to confirm that your lambda function is invoked multiple times is to look into cloudwatch logs for the event id (67fe6073-e19c-11e5-1111-6bqw43hkbea3) occurrence -
START RequestId: 67jh48x4-abcd-11e5-1111-6bqw43hkbea3 Version: $LATEST
This event id represents a specific event for which lambda was invoked and should be same for all lambda executions that are responsible for the same S3 event.
Also, you can look for execution time (Duration) in the following log line that marks end of one lambda execution -
REPORT RequestId: 67jh48x4-abcd-11e5-1111-6bqw43hkbea3 Duration: 244.10 ms Billed Duration: 300 ms Memory Size: 128 MB Max Memory Used: 20 MB
If not a solution, it will at least give you some room to debug in right direction. Let me know how it goes.
Any event Executing Lambda several times is due to retry behavior of Lambda as specified in AWS document.
Your code might raise an exception, time out, or run out of memory. The runtime executing your code might encounter an error and stop. You might run out concurrency and be throttled.
There could be some error in Lambda which makes the client or service invoking the Lambda function to retry.
Use CloudWatch logs to find the error and resolving it could resolve the problem.
I too faced the same problem, in my case it's because of application error, resolving it helped me.
Recently AWS Lambda has new property to change the default Retry nature. Set the Retry attempts to 0 (default 2) under Asynchronous invocation settings.
For some in-depth understanding on this issue, you should look into message delivery guarantees. Then you can implement a solution using the idempotent consumers pattern.
The context object contains information on which request ID you are currently handling. This ID won't change even if the same event fires multiple times. You could save this ID for every time an event triggers and then check that the ID hasn't already been processed before processing a message.
In the Lambda Configuration look for "Asynchronous invocation" there is an option "Retry attempts" that is the maximum number of times to retry when the function returns an error.
Here you can also configure Dead-letter queue service
Multiple retry can also happen due read time out. I fixed with '--cli-read-timeout 0'.
e.g. If you are invoking lambda with aws cli or jenkins execute shell:
aws lambda invoke --cli-read-timeout 0 --invocation-type RequestResponse --function-name ${functionName} --region ${region} --log-type Tail --```payload {""} out --log-type Tail \
I was also facing this issue earlier, try to keep retry count to 0 under 'Asynchronous Invocations'.