I am using the serverless framework with nodejs(Version 4.4) to create AWS lambda functions. The default timeout is 6 seconds for lambda execution. I am connecting to mysql database using sequelize ORM. I see errors like execution timed out. Sometimes my code works properly even with this error. But sometimes nothing works after this timeout error. Its really hard for me make sense out of this timeout. I am afraid increasing the timeout will incur more charge.
If you are seeing errors like 'execution timed out' than you are probably cutting the execution of your Lambdas with a too low timeout.
There might be several reasons for this:
The initialization of the container can be slow, this should only occur for the first call of container. If you have a low memory setting and load lots of libraries it can happen that it takes quite a while(usually this shouldn't be a problem with node)
Connecting to a database can be slow
If you reuse database connections, it's possible that they are stale and this can lead to a timeout.
Your database queries may be slow.
To mitigate the problem you should temporarily add some logging to your Lambda and increase the timeout, so that you can figure out what actually takes so long. Unless you are already a heavy Lambda user you are unlikely to use up your 400.000 free GB-seconds a month. If you run your Lambdas with 128 MB this equates to 3.200.000 seconds per month / 103.225 seconds per day / 28.5 hours per day. Try to test with higher memory settings as well, depending on case this can even reduce the total GB/s consumed.
As others pointed out already you only pay for the time actually used, so if your Lambda finishes faster than the timeout you only pay for the actual time consumed(in 100 ms increments).
Related
We have a lambda setup which will occasionally take ten minutes between initialization and invocation, leading to severe performance degradation of the apps that depend on it. The lambda request is handled by API gateway, sent to the Lambda Context for our handler, and then sent to the lambda function itself. At this point it looks like the lambda is initialized, but will take anywhere from 1 to 10 minutes to invoke for the slow performing requests.
Provisioned concurrency seems to address the cold start problem, but we cant seem to find any indication that concurrency is the problem. Furthermore, we have no reason to believe this is a cold start problem, given that the request takes 10 minutes, versus 10 seconds. I have no idea where to start to address this problem. Can someone give me some tips?
I'm using axios in a lambda function to download a file from a user provided url. Obviously that file could be any size, and might be served at any speed. I am concerned that might create Denial of Service and Denial of Wallet risks.
I don't know if aws have any charges for lambda ingress, I haven't been able to find a definitive answer yet. Even if they don't though, large uploads could still force my lambdas to run for longer (costing me money) and potentially pushing me up against the rate limits I have set, in part, to mitigate flooding attack risk (denying people service).
Likewise, very slow downloads might cause my lambdas to run til they time out. My timeouts are set fairly high because there is processing to do once the file is downloaded. I'd rather bale after a small handful of seconds as the input data should always be small and fast.
So what I want is for downloads to abort if they hit a preset maximum size in bytes OR a maximum download time.
If adding these limits isn't possible with Axios then I'm open to using different libraries like node-fetch.
At the axios side itself, you can set a timeout and maxContentLength to limit the request time and download time. Lambda max timeout us 15 minutes.
If you possibly have many lengthy request, it is better to use EC2. Huge numbers of Lambda requests at high memory and high duration ends up more costly than EC2. Basically Serverless is indeed cost-effective and easy operationally especially for spiky type of workload. For steady 24/7 workload, long processing-times, better use VM.
I have recently deployed a Django backend application to AWS Lambda using Zappa.
After the lambda function has not been invoked for some time, the first request to be made takes from 10 to 15 seconds to be processed. At first I thought it would be because of the cold start but even for a cold start this time is unacceptable. Then, reading through Zappa's documentation I saw that it enables by default the keep_warm feature that sends a dummy request to the lambda function every 4 minutes to keep it warm; so this excessive delay in the response to the first request to the lambda is not due to a cold start.
Then, I started using tools such as AWS X-Ray and Cloudwatch Insights to try to find the explanation for the delay. Here is what I found out:
The invokation that takes a very long time to be processed is the following:
Crossed out in red are the names of the environment variables the application uses. They are all defined and assigned a value directly in the AWS Console. What I don't understand is, first of all, why it takes so long, and secondly, why it says the environment variables are casted as None. The application works perfectly (apart from the massive delay in the first request) so the environment variables are correctly set somewhere.
This request is made every two hours religiously and the first time someone invokes the lambda function in some time, as seen in the following chart:
The dots in the x axis correspond to Zappa's dummy requests to keep the server warm. The elevated dots correspond to the invocation shown in the previous image. Finally, the spike corresponds to a user invocation. The time it took to process is the sum of the time it takes to process the long invocation (the one shown in the first image) and the time it takes to process the longest http request the client makes to the server. This request was the following:
It was a regular login request that should be resolved much faster. Other requests that are probably more demanding than this one were resolved in less than 100ms.
So, to sum up:
There is one lambda invocation that takes more than 10 seconds to be resolved. This corresponds to the first image shown. It is done every 2 hours and when a user makes a request to the server after it has been idle for some time.
Some requests take more than 2 seconds to be resolved and I have no idea as to why this could be.
Apart from these previous function invocations, all other requests are resolved in a reasonable time frame.
Any ideas as to why these invocations could be taking so much time is very much appreciated as I have spent quite some time trying to figure it out on my own and I have ran out of ideas. Thank you in advance!
Edit 1 (28/07/21): to further support my suspicion that this delay is not due to a cold start here is the "Segments Timeline" of the function in Cloudwatch/Application monitoring/Traces:
If it were a cold start, the delay should appear in the "Initialization" segment and not in the "Invocation" one.
Edit 2 (30/07/21): I forgot to mention that I had previously deployed the application using Elastic Beanstalk and didn't face this problem whatsoever so my code's performance is probably not the problem here.
Edit 3 (30/07/21): I found this thread in an AWS forum from 2016 regarding this exact issue. An AWS engineer mentioned that this behaviour is not by any means expected for a Lambda function outside of a VPC (like mine). Nevertheless, no answer was provided that explained the cause of the 10-15 seconds delay.
Edit 4 (03/08/21): I tried doubling the function's assigned memory (from 512 MB to 1024 MB) but it did not help.
I have also added some comments to the question to explain that this is probably not due to a cold start. As you rightly stated, cold starts are explicitly indicated and seem to only take about 500 ms in your case.
Cold starts this long usually only manifested themselves when lambdas were run in a VPC. And AWS has since changed the way lambdas get their network interface which has dramatically sped up that process.
That being said, a quick Google search led me to some interesting discussions on other sites about Django applications and lazy loading. I'll share some links here (even though they are not related to Lambda) in the hope they can help you find a solution:
https://community.webfaction.com/questions/11560/django-app-seems-very-slow-to-start-up-10-seconds
https://ses4j.github.io/2015/11/23/optimizing-slow-django-rest-framework-performance/
As a last note about the keep_warm. Sending those requests is quite an old trick in the book. However, be aware that there are no guarantees as to how long a lambda is kept warm by AWS. If an Init duration is indicated in the logs, however, you can be sure that it was a cold start.
If you need to ensure that a lambda function is warm and quick to respond to incoming requests, you'll have to use provisioned concurrency, which of course has its own price tag.
I can see some suggestions here on trying to increase the memory for your lambda (and I also saw that you tried from 512 to 1024). Have you tried increasing it further, say to about 3072? It's a significant increase, but this is just to prove that the problem is not due to resource limitations first.
The keep_warm feature isn't guaranteed as far as I've seen, and bulk of the (cold) start time is due to initialisation. Since the vcpu allocated to the lambda is proportional to the memory you assign to it, your lambda may initialise quicker and somehow mitigate these cold starts.
I'm refactoring a job that uploads ~1.2mln small files to AWS; previously this upload was made file by file on a 64 CPUs machine with processes. I switched to an async + multiprocess approach following S3 rate limits and best practices and performance guidelines to make it faster. With sample data I can achieve execution times as low as 1/10th. With production loads S3 is returning "SlowDown" errors.
Actually the business logic makes the folder structure like this:
s3://bucket/this/will/not/change/<shard-key>/<items>
The objects will be equally splitted across ~30 shard-keys, making every prefix contain ~40k items.
We have every process writing on its own prefix and launching batches of 3k PUT requests in async until completion. There is a sleep after the batch write operation to ensure that we do not send another batch before 1.1sec has passed, so we will respect the 3500 PUT requests per second.
The problem is that we receive SlowDown errors for ~1 hour and then the job writes all the files in ~15 minutes. If we lower the limit to 1k/sec this gets even worse, running for hours and never finishing.
This is the distribution of the errors over time for the 3k/sec limit:
We are using Python 3.6 with aiobotocore to run async.
Doing some sort of trial and error to try to understand how to mitigate this takes forever on production data and testing with a lower quantity of data gives us different results (flawlessly works).
Did I miss any documentation regarding how to make the system scale up correctly?
I am trying to setup a function which will be working somewhere on the server. It is a simple GET request and I want to trigger it every second.
I tried google cloud functions and AWS. Both of them don't have a straightforward solution to run it every second. (every 1 minute only)
Could you please suggest me a service, or combination of services that will allow me to do it. (preferably not costly)
Here are some options on AWS ...
Launch a t2.nano EC2 instance to run a script that issues GET, then sleeps for 1 second, and repeats. You can't use cron (doesn't support every second). This costs about 13 cents per day.
If you are going to do this for months/years then reduce the cost by using Reserved Instances.
If you can tolerate periods where the GET requests don't happen then reduce the cost even further by using Spot instances.
That said, why do you need to issue a GET request every second? Perhaps there is a better solution here.
You can create a AWS Lambda function, which simply loops and issues the GET request every second, and exits after 240 requets (i.e. 4 minutes). Then create a CloudWatch event that fires every 4 minutes calling the Lambda function.
Every 4 minutes because the maximum timeout you can set for a Lambda function is 5 minutes.
This setup will likely incur only some trivial cost:
At 1 event per 4 minutes, it's $1/month for the CloudWatch events generated.
At 1 call per 4 minutes to a minimally configured (128MB) Lambda function, it's 324,000 GB-second worth of execution per month, just within the free tier of 400,000 GB-second.
Since network transfer into AWS is free, the response size of your GET request is irrelevant. And the first 1GB of transfer out to the Internet is free, which should cover all the GET requests themselves.