We have a Slack slash command that executes a Lambda (written in node) in AWS. The Lambda calls an internal service we have and returns JSON. It often takes multiple executions to get the slash command to work. The caller gets the below message:
Darn - that slash command didn't work. If you see this message more than once we suggest you contact "name".
We ran a bash sript that calls the lambda once a minute for 12 hours. The average duration of the calls was about 1.5 seconds, well below the slash command expectation that a response will be returned in 3 seconds. Has anyone else experienced this issue?
Increase the timeout over 3 secs though your estimated run time is around 1.5 seconds.
Also, it is to be noted that AWS Lambda limits the total concurrent executions across all functions within a given region to 100 (default limit which can increased on request)
Related
AWS Lambda initialization issue:
We got an app (lambda based architecture) that uses an excel Spreadsheet (10MB size) which is loaded using apache POI XSSFWorkbook instance. Here what we have tried:
First we tried to load the spreadsheet at the init phase, but this was problematic since the startup of the lambda was taking long and generating the lambda to restart multiple times to finally get it up. (This is due to the init time limit of lambdas of 10 seconds. See this). Some times the response was just a timeout error (the api gateway limit).
We moved the load of the spreadsheet to the function so this was not longer part of the init phase. (Once the instance is created, that's the instance to be used while the lambda is up). This solved the time out error and the init restart. But the lambda takes close to 15 seconds (which is not acceptable) to return the response in the first call, then it behaves normally.
We tried provisioned concurrency, which allows to have lambdas 'warm' but this feature depends on the requests, so it works fine when lambda is frequently demanded, but when it is not, (two requests within 20 minutes or less), the lambda will need to go through the whole spreadsheet upload process (only init phase will keep warm) meaning, we will get again a 16 seconds request.
Is there any advice or any possible solution you can think of to manage this situation in order to never get this >16 secs request? (get the spreadsheet available in memory even if the lambda has not received requests).
Thanks in advance.
I have recently deployed a Django backend application to AWS Lambda using Zappa.
After the lambda function has not been invoked for some time, the first request to be made takes from 10 to 15 seconds to be processed. At first I thought it would be because of the cold start but even for a cold start this time is unacceptable. Then, reading through Zappa's documentation I saw that it enables by default the keep_warm feature that sends a dummy request to the lambda function every 4 minutes to keep it warm; so this excessive delay in the response to the first request to the lambda is not due to a cold start.
Then, I started using tools such as AWS X-Ray and Cloudwatch Insights to try to find the explanation for the delay. Here is what I found out:
The invokation that takes a very long time to be processed is the following:
Crossed out in red are the names of the environment variables the application uses. They are all defined and assigned a value directly in the AWS Console. What I don't understand is, first of all, why it takes so long, and secondly, why it says the environment variables are casted as None. The application works perfectly (apart from the massive delay in the first request) so the environment variables are correctly set somewhere.
This request is made every two hours religiously and the first time someone invokes the lambda function in some time, as seen in the following chart:
The dots in the x axis correspond to Zappa's dummy requests to keep the server warm. The elevated dots correspond to the invocation shown in the previous image. Finally, the spike corresponds to a user invocation. The time it took to process is the sum of the time it takes to process the long invocation (the one shown in the first image) and the time it takes to process the longest http request the client makes to the server. This request was the following:
It was a regular login request that should be resolved much faster. Other requests that are probably more demanding than this one were resolved in less than 100ms.
So, to sum up:
There is one lambda invocation that takes more than 10 seconds to be resolved. This corresponds to the first image shown. It is done every 2 hours and when a user makes a request to the server after it has been idle for some time.
Some requests take more than 2 seconds to be resolved and I have no idea as to why this could be.
Apart from these previous function invocations, all other requests are resolved in a reasonable time frame.
Any ideas as to why these invocations could be taking so much time is very much appreciated as I have spent quite some time trying to figure it out on my own and I have ran out of ideas. Thank you in advance!
Edit 1 (28/07/21): to further support my suspicion that this delay is not due to a cold start here is the "Segments Timeline" of the function in Cloudwatch/Application monitoring/Traces:
If it were a cold start, the delay should appear in the "Initialization" segment and not in the "Invocation" one.
Edit 2 (30/07/21): I forgot to mention that I had previously deployed the application using Elastic Beanstalk and didn't face this problem whatsoever so my code's performance is probably not the problem here.
Edit 3 (30/07/21): I found this thread in an AWS forum from 2016 regarding this exact issue. An AWS engineer mentioned that this behaviour is not by any means expected for a Lambda function outside of a VPC (like mine). Nevertheless, no answer was provided that explained the cause of the 10-15 seconds delay.
Edit 4 (03/08/21): I tried doubling the function's assigned memory (from 512 MB to 1024 MB) but it did not help.
I have also added some comments to the question to explain that this is probably not due to a cold start. As you rightly stated, cold starts are explicitly indicated and seem to only take about 500 ms in your case.
Cold starts this long usually only manifested themselves when lambdas were run in a VPC. And AWS has since changed the way lambdas get their network interface which has dramatically sped up that process.
That being said, a quick Google search led me to some interesting discussions on other sites about Django applications and lazy loading. I'll share some links here (even though they are not related to Lambda) in the hope they can help you find a solution:
https://community.webfaction.com/questions/11560/django-app-seems-very-slow-to-start-up-10-seconds
https://ses4j.github.io/2015/11/23/optimizing-slow-django-rest-framework-performance/
As a last note about the keep_warm. Sending those requests is quite an old trick in the book. However, be aware that there are no guarantees as to how long a lambda is kept warm by AWS. If an Init duration is indicated in the logs, however, you can be sure that it was a cold start.
If you need to ensure that a lambda function is warm and quick to respond to incoming requests, you'll have to use provisioned concurrency, which of course has its own price tag.
I can see some suggestions here on trying to increase the memory for your lambda (and I also saw that you tried from 512 to 1024). Have you tried increasing it further, say to about 3072? It's a significant increase, but this is just to prove that the problem is not due to resource limitations first.
The keep_warm feature isn't guaranteed as far as I've seen, and bulk of the (cold) start time is due to initialisation. Since the vcpu allocated to the lambda is proportional to the memory you assign to it, your lambda may initialise quicker and somehow mitigate these cold starts.
i have gone through the site but unable get the root cause of my issue.
we have a lambda that will run for every 50 seconds. the first run of lambda is a cold start. during the start all the necessary dependencies for the lambda are prepared ( all the interfaces ).. Lambda handler will have its own code to interact with SQS and SWF. during the first run from the cloud watch logs it is clear that it is reading the base file to get all the services. then lambda handler will start. from second run only lambda handler will get invoked after 50th second. So far everything is going smooth.
All of sudden we noticed the lambda took more than 50 seconds ( in general it finishes below 10s). log shows that lambda got timed out and freshly it started to initializing all the dependencies again.
This is not giving any clue to us as after the timeout the subsequent run works smooth. Its not good to see lambda timed out. Definitely lambda code is without errors.
Could this be any container issue? Does the container have any time period that it will keep data active till it reaches the expiry time out.
Can we able to access the container object to find out more information? we have 2 or more dev environments. this behavior is different for different environments. for some it happens for every 3 days. some time in a day it happens thrice.
if we want to understand the properties of the container object how can we do it? Is it a grey zone that only AWS can access it? Lambda code is written in c# using net core App 2.0. thought of checking the cloud trail log for this lambda during the invocation. there too i am not able to find the reason behind the timeout.
we have more than 20 lambda's for dev and 10 for test in each different regions. its not getting clear to us which lambda will time out.
Any suggestions or idea's will help me a lot???????
thankyou.
Lambda containers will not live indefinitely. If you are seeing occasional "cold starts" then that is normal behavior. If you're running only 1 invocation at a time (i.e. you only have a single lambda instance) you can still expect to see the container recycled every few hours. In general, I understand AWS is trying to give us fewer cold starts but you can still expect to get a new container and new cold start from time to time.
I can't get a Google Cloud Function to run for more than 60secs, even when the timeout is set to 540secs!! Any suggestions?
I set the timeout flag on deployment to --timeout=540, and I know the setting goes through, because the 540 sec timeout setting appears in the GCP WEB UI. I have also tried to manually edit the timeout to 540 through the GCP WEB UI. But in any case i still get the DEADLINE_EXCEEDED after just ~ 62000 ms.
I have tried both the pub/sub and https methods as the func trigger, but still get the premature function timeout at ~60s.
Im running the latest CLI, with these these func settings:
trigger: http/pubsub (both tested, same result)
availableMemoryMb: 2048
runtime: nodejs6
status: ACTIVE
timeout: 540s
Thanks for any inputs!
Br Markus
I have used the documentation code for delay and executed a Cloud Function with the same specifications as yours. In the documentation, the execution is delayed 120000 ms (2 mins). I edited that and put it at 500000 ms. This plus the normal time that the CF takes to execute, will reach the desired execution time (around 9 minutes). If you add 540000 to test the code, it will execute with timeout error at ~540025, because the value itself is exceeding the timeout limit of the Cloud Function and at the same time the default maximum timeout limit of a Cloud Function, which is 9 minutes.
I also tried the creating the function using this command
gcloud functions deploy [FUNCTION_NAME] --trigger-http --timeout=540.
After successful deployment, I updated the code manually in the GCP Cloud Function UI as follows
exports.timeoutTest = (req, res) => {
setTimeout(() => {
let message = req.query.message || req.body.message || 'Hello World today!';
res.status(200).send(message);
res.end();
}, 500000);
};
Both times the Cloud Function was executed and returned with status code 200. This means that you can set a timeout to be more than 60 secs which is the default value.
If you revised everything correctly and you still have this issue, I recommend you to start afresh, create a new CF and use the documentation link I provided.
The 60 seconds timeout is not resulting from GCP Cloud Function setting. For instance if this is a Django/Gunicorn App, the timeout is coming from the timeout of gunicorn that is set in app.yaml
entrypoint: gunicorn -t 3600 -b :$PORT project_name.wsgi
For instance, this will achieve a timeout of 3600 seconds for gunicorn.
I believe I'm some years late but here is my suggestion.
If you're using the "Test the function" button in the "Testing tab" of the Cloud Function (in the gcp "Cloud Console") it says right next to the button that:
Testing in the Cloud Console has a 60s timeout. Note that this is different from the limit set in the function configuration.
I hope you fixed it and this answer can help someone in the future.
Update: Second try ("Test the function") was precisely 9 minutes
From: 23:15:38
Till: 23:24:38
And it is exactly the 9 minutes, although the message again was about 60 seconds only and popped up much earlier than the actual stop.
Function execution took 540004 ms, finished with status: 'timeout'
This time with a lot of memory (2 GB), timeout clearly made it stop. The message is perhaps just popping up earlier since it has not been programmed in detail, my guess. You should always look at the logs to see what is happening.
I guess that the core of your question is outdated then: At least in 01/2022, you do have the demanded timeout time regardless of the what you may read, and you just should not care about the messages.
First try ("Test the function") 8 minutes after reached memory limit
A screenshot of how it looks like in 2022/01 if you get over the 60 seconds (with 540s maximum timeout for this example function set in the "Edit" menu of the CF):
Function being tested has exceeded the 60s timeout imposed by the Cloud Functions testing utility.
Yet, in reality, when using just the "Testing tab" the timeout is at least after 300s / 5 minutes which can be seen next to the "Test the function" button:
Testing in the Cloud Console has a 5 minute timeout. Note that this is different from the limit set in the function configuration.
But it is even more. I know from testing (started from the "Testing tab" --> "Test Function" in the Cloud Function) that you have at least 8 minutes:
From 22:31:43:
Till 22:39:53
And this was at first stopped by the 256 MB limit, secondly only by time (a bit unclear why there were both messages).
Therefore, your question about why you get only 60 seconds timeout time might rather ask why these messages are wrong (like in my case). Perhaps GCP did not make the effort to parametrize the messages for each function.
Perhaps you get even slightly more time when you start with gcloud from terminal, but that is not so likely since 9 minutes are the maximum anyway.
I am trying to setup a function which will be working somewhere on the server. It is a simple GET request and I want to trigger it every second.
I tried google cloud functions and AWS. Both of them don't have a straightforward solution to run it every second. (every 1 minute only)
Could you please suggest me a service, or combination of services that will allow me to do it. (preferably not costly)
Here are some options on AWS ...
Launch a t2.nano EC2 instance to run a script that issues GET, then sleeps for 1 second, and repeats. You can't use cron (doesn't support every second). This costs about 13 cents per day.
If you are going to do this for months/years then reduce the cost by using Reserved Instances.
If you can tolerate periods where the GET requests don't happen then reduce the cost even further by using Spot instances.
That said, why do you need to issue a GET request every second? Perhaps there is a better solution here.
You can create a AWS Lambda function, which simply loops and issues the GET request every second, and exits after 240 requets (i.e. 4 minutes). Then create a CloudWatch event that fires every 4 minutes calling the Lambda function.
Every 4 minutes because the maximum timeout you can set for a Lambda function is 5 minutes.
This setup will likely incur only some trivial cost:
At 1 event per 4 minutes, it's $1/month for the CloudWatch events generated.
At 1 call per 4 minutes to a minimally configured (128MB) Lambda function, it's 324,000 GB-second worth of execution per month, just within the free tier of 400,000 GB-second.
Since network transfer into AWS is free, the response size of your GET request is irrelevant. And the first 1GB of transfer out to the Internet is free, which should cover all the GET requests themselves.