AWS Lambda Hot and Cold Start - amazon-web-services

Hello I am new to AWS Lambda.I want to know what do we mean by Hot Lambda function (Hot Start) and Cold Lambda function (Cold Start) ?? Can anyone please explain me in detail & what is the difference between Hot Lambda and Cold Lambda

After uploading your code or after periods of inactivity your Lambda is shut down or "cold". When a new event comes in there is a brief moment where Lambda spins up a new instance of your code - this includes whatever initializing AWS does to start up the "container" as well as initializing the code that you uploaded.
So an event that is able to hit an initialized("hot") Lambda will in theory be processed faster than hitting a cold one. There isn't a guarantee on how long a Lambda will stay hot after the last event but it could be as long as 5 minutes.

It's a common belief that when people refer to "warm starts" they mean that the same container/sandbox is ready to receive a new connection - but that's not accurate.
Warm Start - invoking a function using a warm container with a prebaked unused sandbox resources from previous invocations are not recycled.
Cold Start - invoking a function when no container/sandbox is ready to receive the request. A new container must be created and the runtime and user code loaded. Cold start's latency is mostly an internal metric, externally, cold starts are only a part of the total overhead that can affect the end-user experience. In some scenarios, we can encounter a portion of the full cold start, think about scale prediction and statistical algorithms
However, using the terms "cold start" and "warm start" might be misleading, as a developer you should care about "Invocation Overheads" - The time it takes to call the user's function and return the response.

Related

Need to run a aws lambda function which takes more than 15 minutes to complete?

My Lambda function has limit 15 minutes which was 5 minutes ealier.Lambda process is automatically terminated after 15 minutes but my process takes more than 15 minutes. How I can manage ?
There is no way around this. If you're doing some sort of long running processing then your other option may be to run this task on an EC2 instance. If this long running process can be broken down in to multiple steps then you could look in to Lambda Step Functions.
15 Minutes is the max and this max can not be extended.
EDIT:
Recently I started running some long running tasks that are variable in length (anywhere from a couple minutes to several hours). To accomplish this I've been using AWS Fargate and my task is node.js script that is stored as a Docker container in ECR. Doing this was fairly easy and also is fairly cheap (I think we spent a little over $1 for running this task daily in a month). This may be something worth looking in to for others who may come across this answer.
https://docs.aws.amazon.com/AmazonECS/latest/userguide/scheduled_tasks.html
Typically use a Fat Lambda strategy or Step Function
Fat Lambda Strategy
A Fat Lambda strategy is used when your task is singular but has a
long-running execution time and/or you have heavy hardware
requirements. The idea is that you would create a script that executes your long
process and put it into a docker container that's hosted in Fargate.
Meaning no limits to execution time and access to powerful hardware (How to create a Fat Lambda https://youtu.be/XUp9SHIHU8w)
Step Function strategy
A Step Function strategy is used to break down your entire process
into smaller steps. Usually, a step function strategy would work for
you if your process could have lots of miniature stages linked
together instead of a big colossal job attempting to do everything
simultaneously. Bear in mind that a "Fat Lambda" can also be triggered
by a Step Function (How to create a Step Function
https://www.youtube.com/watch?v=s0XFX3WHg0w)
Also, another note, remember lambdas can also trigger other lambdas. So you might even be able to have different lambdas run bits of your lambda code. For example, a FOR loop sends off a lot of mini lambdas to run small tasks. You might not even need a Step Function or a Fat Lambda.
If you're stuck on what to choose, follow the below. It will help you reason with your problem.
Singular Lambda >> Lambda invoke another Lambda? >> Step Functions? >> Fargate (Fat Lambda)?
If you can checkpoint the task then you can check the getRemainingTimeInMillis (docs) and if the time is running out then invoke the same lambda with a parameter where to continue.
Something like this flow:
start working (0% done)
time is running low (40% done) => start a new lambda telling it to start from 40%
old lambda is terminated, new lambda starts working (40%)
when its time is running low, start a new lambda again (80%)
the third lambda finishes the job
But it requires a very specific type of task to support this. If your require a single execution from start to finish then lambda is not a good choice for this.
What do you think about using a lambda to trigger an ECS task? An ECS task just runs a containerized application for as long as it needs to run.
This blog post is relevant: https://www.gravitywell.co.uk/insights/using-ecs-tasks-on-aws-fargate-to-replace-lambda-functions/.
Aws lambda is meant to be used for quick processing. if your task is this long then better choose some other way to develop that functionality. Although you can define the timeout property for AWS lambda, but that can not exceed 15 minutes.
As per you use case better to use EC2 for deploying you application and then terminate the EC2 instance when the processing is done or it remains idle more than the threshold time.
Refer the AW Lambda documentation - https://docs.aws.amazon.com/lambda/index.html
To add to the Step function answer - here's a very simple playbook:
Work for 10 minutes
Write progress to S3
kick off another lambda to consume your progress
terminate
Once you're done, output. Viola, infinite runtime lambda with very little effective overhead.
No, you cannot run a lambda for more than 15 mins!
But Yes you can manage this using Signals.
Basically, this will inform you to start plan B when plan A is not enough within 15 mins. If you can decouple the tasks in your process and add checkpoints in your process then the next lambda invocation can be picked up in plan B or you can somehow create entries in db in the plan B for the unprocessed parts. And reprocess them as a part of another run.
Framework here -
https://gist.github.com/kuharan/c2bfddac7bd8dc5702f6eec31729fb48

Lambdas calls speed changes

I've created a simple lambda that reads data from dynamodb.
First time I call the lambda it takes about 1500ms to complete, but then after I run the lambda again it takes about 150ms. How is it possible?
What type of caching response does AWS preform to achieve this?
AWS Lambda is provision infrastructure on your first call and it's required time also AWS needs to start a JVM with the code to be able to call the function. Starting the JVM takes time and thus will incur some overhead.
Another issue is cold ,if there is no idle container available waiting to run the code. This is all invisible to the user and AWS has full control over when to kill containers.
So above steps are involved during first call and you can see 1500 ms
Next call you have everything on place so lambda give you response in 150 ms or less .
This is as per design of serverless to save infrastructure cost ,only provision infrastructure when needed and get first call.
I would suggest please read documents
- https://aws.amazon.com/lambda/
This happens due to cold start. This happens mainly when we invoke the lambda for the first time after deployment or when a lambda function is idle for sometime.
These articles explains about how language, memory or size of the lambda affects the cold start
https://read.acloud.guru/does-coding-language-memory-or-package-size-affect-cold-starts-of-aws-lambda-a15e26d12c76
https://mikhail.io/serverless/coldstarts/aws/

How do I force a complete cold start of an AWS lambda function on a VPC?

I have a lambda function written in Python that uses a couple of heavyweight dependencies (NumPy, pandas, goodtables, etc.) and is also connected to a VPC (for access to a Postgres RDS instance)
The cold start execution time of this function is huge (16.2 seconds) when it's executed after a while (> 4-6 hours)
However, if I update the function code and invoke it a second time (shortly after the first execution), the cold start execution time reduces drastically (3 seconds)
If I invoke the function again without updating it so it's a warm start, the execution time goes down even further (313 ms)
I suspect the first cold start (16.2 secs) is when Lambda sets up an ENI for access to VPC resources and the ENI is reused during the second cold start (3 secs) so the time taken to re-create the ENI is avoided.
I am trying to optimize the cold start time of this function and want it to start from scratch to see how fast it can execute when starting completely cold (i.e. no ENI + cold start).
Is there a way to do this and do it repeatedly?
You can toggle the memory up, save and reset it back again.
You can also add a new environment variable.
This forces all existing warm lambda's to be disposed and a new cold start on the next invocation of the lambda.
Instead of just modifying the code, you can try to publish new version of you lambda function for testing purposes. According to AWS, each time you publish a new version of your lambda function, all the containers in which your function is running are destroyed and then recreated, which should force complete cold start.
I was wondering the same thing and while you could "throttle" the reserved count to zero in a testing scenario, that most likely won't be a viable option in a production one. For that, take a look at the answers in either Force Discard AWS Lambda Container or Restarting AWS lambda function to clear cache.

AWS Lambda 'full' vs 'incremental' cold start?

I'm looking into cold start issues and have the impression that there's a difference between:
a 'full' cold start, i.e. going from 0 to 1 active instance
an 'incremental' cold start, i.e. going from n to n+1 active instances
It's rather difficult to accurately test the 'incremental' scenario though.
Can anybody confirm my impression, and possibly explain why there's a difference?
When there is no instance with your Lambda in its memory, AWS does not know that an instance will be needed, that's why the first cold start takes longer. If your Lambda is already under load, AWS can predict a new instance will be needed. So it prepares a new instance upfront, after that requests are distributed to that instance. Therefore your observation is correct, cold start is an issue only for functions with zero activity.
Easy to test, write a Jmeter test and shoot requests with little delay (50 ms). Repeat after several hours of inactivity. Probably best to leave it run for a weekend On my tests the duration of the cold start increases then stabilize (though I used python with some heavy web frameworks and zappa). If needed add logs.

AWS Lambda execution duration randomly spikes and causes time-outs

I'm building a server-less web-tracking system which serves its tracking pixel using AWS API Gateway, which calls a Lambda function whenever a tracking request arrives to write the tracking event into a Kinesis stream.
The Lambda function itself does not do anything fancy. It just a takes the incoming event (its own argument) and writes it to the stream. Essentially, it's just:
import boto3
kinesis_client = boto3.client("kinesis")
kinesis_stream = "my_stream_name"
def return_tracking_pixel(event, context):
...
new_record = ...(event)
kinesis_client.put_record(
StreamName=kinesis_stream,
Data=new_record,
PartitionKey=...
)
return ...
Sometimes I experience a weird spike in the Lambda execution duration that causes some of my Lambda function invocations to time-out and the tracking requests to be lost.
This is the graph of 1-minute invocation counts of the Lambda function in the in affected time period:
Between 20:50 and 23:10 I suddenly see many invocation errors (1-minute error counts):
which are obviously caused by the Lambda execution time-out (maximum duration in 1-minute intervals):
There is nothing weird going on neither with my Kinesis stream (data-in, number of put records, put_record success count etc., all looks normal), nor with my API GW (number of invocations corresponds to number of API GW calls, well within the limits of the API GW).
What could be causing the sudden (and seemingly randomly occurring) spike in the Lambda function execution duration?
EDIT: neither the lambda functions are being throttled, which was my first idea.
Just to add my 2 cents, because there's not much investigative work without extra logging or some X-Ray analysis.
AWS Lambda sometimes will force recycle containers which will feel like cold starts even though your function is being reasonably exercised and warmed up. This might bring all cold start related issues, like extra delays for ENIs if your Lambda has an attached VPC and so on... but even for a simple function like yours, 1 second timeout is sometimes too optimistic for a cold start.
I don't know of any documentation on those forced recycles, other than some people having evidence for it.
"We see a forced recycle about 7 times a day." source
"It also appears that even once warmed, high concurrency functions get recycled much faster than those with just a few in memory." source
I wonder how you could confirm this is the case. Perhaps you could check those errors appearing in Cloud Watch log streams to be from containers that never appeared before.