What's going on behind the scenes when an AWS lambda function freezes?
That is -- many of the Lambda Runtime Docs refer broadly to the concept of a function freezing or unfreezing
The runtime and each extension indicate completion by sending a Next API request. Lambda freezes the execution environment when the runtime and each extension have completed and there are no pending events.
My understanding of this is that after a Lambda function initializes (or "cold starts") and executes the first invocation request, if there are no other invocations to process the function's execution environment will "freeze". Then, when there's another function invocation to process, the function's execution environment will "unfreeze" almost instantly without needing to initialize/cold-start again. If a frozen function goes too long without being invoked it will shutdown, and the next invocation request will need to cold start.
Does anyone know what this freezing is? It's my understanding that these execution environments are firecracker virtual machines. Is this freezing something that firecracker supports, or is it some extra magic that AWS Web Services has that they keep to themselves? Put another way, if I have a Firecracker VM running can I freeze and unfreeze it?
We can understand the freeze as after each execution, AWS Lambda putting the instance to sleep. In other words, the instance freezes (similar to a laptop in hibernate mode). The virtual CPU is turned off. This frees up resources on the worker node. The overhead from waking up such a function is negligible.
For understand how Firecracker works under the hood, take a look on this AWS re:Invent of 2019 video: AWS re:Invent 2019: Firecracker open-source innovation (OPN402)
Also, take a look on this posts:
Understanding Container Reuse in AWS Lambda
A look behind the scenes of AWS Lambda
Related
My Lambda function has limit 15 minutes which was 5 minutes ealier.Lambda process is automatically terminated after 15 minutes but my process takes more than 15 minutes. How I can manage ?
There is no way around this. If you're doing some sort of long running processing then your other option may be to run this task on an EC2 instance. If this long running process can be broken down in to multiple steps then you could look in to Lambda Step Functions.
15 Minutes is the max and this max can not be extended.
EDIT:
Recently I started running some long running tasks that are variable in length (anywhere from a couple minutes to several hours). To accomplish this I've been using AWS Fargate and my task is node.js script that is stored as a Docker container in ECR. Doing this was fairly easy and also is fairly cheap (I think we spent a little over $1 for running this task daily in a month). This may be something worth looking in to for others who may come across this answer.
https://docs.aws.amazon.com/AmazonECS/latest/userguide/scheduled_tasks.html
Typically use a Fat Lambda strategy or Step Function
Fat Lambda Strategy
A Fat Lambda strategy is used when your task is singular but has a
long-running execution time and/or you have heavy hardware
requirements. The idea is that you would create a script that executes your long
process and put it into a docker container that's hosted in Fargate.
Meaning no limits to execution time and access to powerful hardware (How to create a Fat Lambda https://youtu.be/XUp9SHIHU8w)
Step Function strategy
A Step Function strategy is used to break down your entire process
into smaller steps. Usually, a step function strategy would work for
you if your process could have lots of miniature stages linked
together instead of a big colossal job attempting to do everything
simultaneously. Bear in mind that a "Fat Lambda" can also be triggered
by a Step Function (How to create a Step Function
https://www.youtube.com/watch?v=s0XFX3WHg0w)
Also, another note, remember lambdas can also trigger other lambdas. So you might even be able to have different lambdas run bits of your lambda code. For example, a FOR loop sends off a lot of mini lambdas to run small tasks. You might not even need a Step Function or a Fat Lambda.
If you're stuck on what to choose, follow the below. It will help you reason with your problem.
Singular Lambda >> Lambda invoke another Lambda? >> Step Functions? >> Fargate (Fat Lambda)?
If you can checkpoint the task then you can check the getRemainingTimeInMillis (docs) and if the time is running out then invoke the same lambda with a parameter where to continue.
Something like this flow:
start working (0% done)
time is running low (40% done) => start a new lambda telling it to start from 40%
old lambda is terminated, new lambda starts working (40%)
when its time is running low, start a new lambda again (80%)
the third lambda finishes the job
But it requires a very specific type of task to support this. If your require a single execution from start to finish then lambda is not a good choice for this.
What do you think about using a lambda to trigger an ECS task? An ECS task just runs a containerized application for as long as it needs to run.
This blog post is relevant: https://www.gravitywell.co.uk/insights/using-ecs-tasks-on-aws-fargate-to-replace-lambda-functions/.
Aws lambda is meant to be used for quick processing. if your task is this long then better choose some other way to develop that functionality. Although you can define the timeout property for AWS lambda, but that can not exceed 15 minutes.
As per you use case better to use EC2 for deploying you application and then terminate the EC2 instance when the processing is done or it remains idle more than the threshold time.
Refer the AW Lambda documentation - https://docs.aws.amazon.com/lambda/index.html
To add to the Step function answer - here's a very simple playbook:
Work for 10 minutes
Write progress to S3
kick off another lambda to consume your progress
terminate
Once you're done, output. Viola, infinite runtime lambda with very little effective overhead.
No, you cannot run a lambda for more than 15 mins!
But Yes you can manage this using Signals.
Basically, this will inform you to start plan B when plan A is not enough within 15 mins. If you can decouple the tasks in your process and add checkpoints in your process then the next lambda invocation can be picked up in plan B or you can somehow create entries in db in the plan B for the unprocessed parts. And reprocess them as a part of another run.
Framework here -
https://gist.github.com/kuharan/c2bfddac7bd8dc5702f6eec31729fb48
I am studying Lambdas of my project and I've seen that one of them is idle. In the top of dashboard page I see block with text:
The function __ is idle. To reactivate your function, choose Restore.
I slightly confused of it because this function is very similar to others which isn't marked as idle but as well haven't been launched for couple of months.
Since I haven't find answers in AWS documentation i'd appreciate somebody to explain me what difference between functions in idle state and not, and how/why function becomes Idle?
It is related to VPC, please check this doc.
If your functions aren't active for a long period of time, Lambda
reclaims its network interfaces, and the functions become Idle. To
reactivate an idle function, invoke it. This invocation fails, and the
function enters a Pending state again until a network interface is
available.
https://docs.aws.amazon.com/lambda/latest/dg/configuration-vpc.html
Instead of a Cloudwatch event, I would suggest using Provisioned Concurrency to keep a lambda(s) warm.
https://aws.amazon.com/about-aws/whats-new/2019/12/aws-lambda-announces-provisioned-concurrency/
You will need to invoke functions often using cloudwatch events regularly if you want your Lambda functions to stay alive & warm. Otherwise they will cool down as #Traycho Ivanov says.
Setup up cloudwatch events to invoke the Lambdas you need alive every so often, but how often is a bit debated it's not clear how AWS manages this and this could easily be changed into the future that your event is not frequent enough, or maybe it is too frequent costing you a couple more cents than you would otherwise want!
i have gone through the site but unable get the root cause of my issue.
we have a lambda that will run for every 50 seconds. the first run of lambda is a cold start. during the start all the necessary dependencies for the lambda are prepared ( all the interfaces ).. Lambda handler will have its own code to interact with SQS and SWF. during the first run from the cloud watch logs it is clear that it is reading the base file to get all the services. then lambda handler will start. from second run only lambda handler will get invoked after 50th second. So far everything is going smooth.
All of sudden we noticed the lambda took more than 50 seconds ( in general it finishes below 10s). log shows that lambda got timed out and freshly it started to initializing all the dependencies again.
This is not giving any clue to us as after the timeout the subsequent run works smooth. Its not good to see lambda timed out. Definitely lambda code is without errors.
Could this be any container issue? Does the container have any time period that it will keep data active till it reaches the expiry time out.
Can we able to access the container object to find out more information? we have 2 or more dev environments. this behavior is different for different environments. for some it happens for every 3 days. some time in a day it happens thrice.
if we want to understand the properties of the container object how can we do it? Is it a grey zone that only AWS can access it? Lambda code is written in c# using net core App 2.0. thought of checking the cloud trail log for this lambda during the invocation. there too i am not able to find the reason behind the timeout.
we have more than 20 lambda's for dev and 10 for test in each different regions. its not getting clear to us which lambda will time out.
Any suggestions or idea's will help me a lot???????
thankyou.
Lambda containers will not live indefinitely. If you are seeing occasional "cold starts" then that is normal behavior. If you're running only 1 invocation at a time (i.e. you only have a single lambda instance) you can still expect to see the container recycled every few hours. In general, I understand AWS is trying to give us fewer cold starts but you can still expect to get a new container and new cold start from time to time.
I've created a simple lambda that reads data from dynamodb.
First time I call the lambda it takes about 1500ms to complete, but then after I run the lambda again it takes about 150ms. How is it possible?
What type of caching response does AWS preform to achieve this?
AWS Lambda is provision infrastructure on your first call and it's required time also AWS needs to start a JVM with the code to be able to call the function. Starting the JVM takes time and thus will incur some overhead.
Another issue is cold ,if there is no idle container available waiting to run the code. This is all invisible to the user and AWS has full control over when to kill containers.
So above steps are involved during first call and you can see 1500 ms
Next call you have everything on place so lambda give you response in 150 ms or less .
This is as per design of serverless to save infrastructure cost ,only provision infrastructure when needed and get first call.
I would suggest please read documents
- https://aws.amazon.com/lambda/
This happens due to cold start. This happens mainly when we invoke the lambda for the first time after deployment or when a lambda function is idle for sometime.
These articles explains about how language, memory or size of the lambda affects the cold start
https://read.acloud.guru/does-coding-language-memory-or-package-size-affect-cold-starts-of-aws-lambda-a15e26d12c76
https://mikhail.io/serverless/coldstarts/aws/
We're using some asynchronous Lambda functions to process images and store them to S3. In all my testing before we deployed to production the functions were run as soon as we invoked them. However, over the weekend we had a batch where there was a gap of several hours between us invoking the function and it actually running.
For example, our logs showed we invoked a function at 2015-12-13 02:46:10 (UTC), but Lambda's logs show it didn't run until 2015-12-13T12:09:33.909Z. I don't think we were being throttled as not a single Lambda function ran between 2am and 9:46am UTC.
We moved to Lambda to speed up our image processing. Is this normal? Is this just a delay we're going to have to live with?