Lambda cold start possible solution? - amazon-web-services

Is scheduling a lambda function to get called every 20 mins with CloudWatch the best way to get rid of lambda cold start times? (not completely get rid of)...
Will this get pricey or is there something I am missing because I have it set up right now and I think it is working.
Before my cold start time would be like 10 seconds and every subsequent call would complete in like 80 ms. Now every call no matter how frequent is around 80 ms. Is this a good method until say your userbase grows, then you can turn this off?
My second option is just using beanstalk and having a server running 24/7 but that sounds expensive so I don't prefer it.

As far as I know this is the only way to keep the function hot right now. It can get pricey only when you have a lot of those functions.
You'd have to calculate yourself how much do you pay for keeping your functions alive considering how many of them do you have, how long does it take to run them each time and how much memory do you need.
But once every 20 minutes is something like 2000 times per month so if you use e.g. 128MB and make them finish under 100ms then you could keep quite a lot of such functions alive at 20 minute intervals and still be under the free tier - it would be 20 seconds per month per function. You don't even need to turn it off after you get a bigger load because it will be irrelevant at this point. Besides you can never be sure to get a uniform load all the time so you might keep your heart beating code active even then.
Though my guess is that since it is so cheap to keep a function alive (especially if you have a special argument that makes them return immediately) and that the difference is so great (10 seconds vs. 80 ms) then pretty much everyone will do it - there is pretty much no excuse not to. In that case I expect Amazon to either fight that practice (by making it hard or more expensive than it currently is - which wouldn't be a smart move) or to make it not needed in the future. If the difference between hot and cold start was 100ms then no one would bother. If it is 10 seconds than everyone needs to work around it.
There would always have to be a difference between running a code that was run a second ago and a code that was run a month ago, because having all of them in RAM and ready to go would waste a lot of resources, but I see no reason why that difference couldn't be made less noticeable or even have few more steps instead of just hot and cold start.

You can improve the cold start time by allocating more memory to your Lambda function. With the default 512MB, I am seeing cold start times of 8-10 seconds for functions written in Java. This improves to 2-3 seconds with 1536MB of memory.
Amazon says that it is the CPU allocation that really matters, but there is no way to directly change it. CPU allocation increases proportionately to memory.
And if you want close to zero cold start times, keeping the function warm is the way to go, as described rsp suggested.

Starting from December 2019 AWS Lambda supports Reserved Concurrency (so you can set the number of lambda functions that will be ready and waiting for new calls) [1]
The downside of this, is that you will be charged for the reserved concurrency. If you provision a concurrency of 1, for a lambda with 128MB being active 24 hrs for the whole month, you will be charged: 1 instance x 30 days x 24 hr x 60min x 60sec x (128/1024) = 324,000 GB-sec (almost all of the capacity AWS gives for the lambda free tier) [2]
From above you will get a lambda instance that responds very fast...subsequent concurrent calls may still suffer "cold-start" though.
What is more, you can configure application autoscaling to dynamically manage the provisioned concurrency of your lambda. [3]
Refs:
https://aws.amazon.com/blogs/aws/new-provisioned-concurrency-for-lambda-functions/
https://aws.amazon.com/lambda/pricing/
https://docs.aws.amazon.com/lambda/latest/dg/configuration-concurrency.html

Among adding more memory for lambda, there is also one more approach to reduce the cold starts: use Graal native-image tool. The jar is translated into byte code. Basically, we would do part of the work, which is done on aws. When you build your code, on loading to AWS - select "Custom runtime", not java8.
Helpful article: https://engineering.opsgenie.com/run-native-java-using-graalvm-in-aws-lambda-with-golang-ba86e27930bf
Beware:
but it also has its limitations; it does not support dynamic class loading, and reflection support is also limited

Azure has pre warming solution for serverless instances(Link). This would be a great feature in AWS lambda if and when they implement it.
Instead of user warming the instance at the application level it's handled by the cloud provider in the platform.

Hitting server would not resolve case of simultaneous requests by users, or same page sending a few api requests async.
A better solution is to dump the 'warmed-up' into docker checkpoint. It is especially useful for dynamic language when warm up is fast, yet loading of all the libraries is slow.
For details read
https://criu.org/Docker
https://www.imperial.ac.uk/media/imperial-college/faculty-of-engineering/computing/public/1819-ug-projects/StenbomO-Refunction-Eliminating-Serverless-Cold-Starts-Through-Container-Reuse.pdf
Other hints:
use more memory
use Python or JavaScript with most basic libraries, try eliminate bulky ones
create several 'microservices' to reduce chances of several users hitting same service
see more at https://www.jeremydaly.com/15-key-takeaways-from-the-serverless-talk-at-aws-startup-day/

Lambda's cold start depends on multiple factors such as your implementation, the language run-time you use, and the code size etc. If you give your Lambda function more memory, you can reduce the cold start too. You can read the best practices https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html
The serverless community also has recommendations for performance https://atlas.serverless.tech-field-community.aws.a2z.com/Performance/index.html
Lambda team also launched Provisioned Concurrency. You can now request multiple Lambda containers be kept in a "hyper ready" state, ready to re-run your function. This is the new best practice for reducing the likelihood of cold starts.
Official docs https://docs.aws.amazon.com/lambda/latest/dg/configuration-concurrency.html?icmpid=docs_lambda_console

Related

How to handle delays of scheduled jobs and thus duplications by mistake? (caching? message brokers?)

In our project, we have scheduled jobs which send shipment requests for orders every 60 seconds. There must be exactly one request per order. Some jobs have delays (take around 70 seconds instead), which results in sending a request twice for the same order just because the previous job had a delay and a new one already started. How to ensure that only one request is sent per order no matter what delay there is?
My assumptions so far:
Add a flag to the database, lookup for it before processing a request for an order (we use DynamoDb)
Temporary store the result in caches (I'd assume even something like 10 minutes, cause delayed jobs usually don't take longer than 1,5 minutes, so it'd be a safe assumption)
Temporary store it in some message broker (similar to caching). We already use SQS and SNS in our project. Would it be appropriate to store messages about orders which were already processed there? Are message brokers ever used for scheduled jobs to ensure they don't duplicate each other?
Increase the interval between jobs to 2 mins. Even though delays are not longer than 1,5 mins in total now, it will not guarantee to prevent possible longer delays in the future. However, this solution would be simple enough
What do you think? What would be a good solution in this case, in terms of simple implementation, fast performance and preventing duplicates?
So, if you want to make your operation idempotent by using de-duplication logic then you should ask the following questions to narrow down the possible options:
In worst case how many times would you receive the exact same request?
In worst case how much time would be between first and last duplicate requests?
In worst case how much requests should be evaluated nearly at the same time during peak hours?
Which storage system does allow me to use point query instead of scan?
Which storage system does have the lowest overhead during write operation to capture the I have seen this flag?
...
Depending on your answers you can justify that the given storage is suitable for your needs or not.

Benchmark AWS lambda performance

For security issues I wrote a lambda function that adds metadata to the data that should be saved in a DynamoDB database. Now I want to evaluate the impact of the additional metadata functionality. Therefore I want to check the performance of two lambda functions, one function with metadata and one without. I want to use this information for my Thesis.
How accurate are the metrics provided by lambda (memory used and execution time)? Can I use these parameters to evaluate the performance impact or is there a better way?
The metrics provided by Lambda are very accurate as they are used to determine billing, with a caveat that the execution time reported does not include network latency incurred in calling the lambda and AFAIK it also does not include time spent provisioning the container on cold starts (it does included time spent initializing your code in the container, but I don't think it includes time spent starting the container itself, copying code, etc.)
In conclusion I don't think you can get more accurate measurements for memory used but for the execution duration there are other ways you may want to look at the data depending on what you care about.
However, if you just want to A/B test two different function implementations, executing in the same environment with the same input data, you can probably rely on the Lambda reported metrics.

Do serverless functions get dedicated CPU resources?

I was telling a friend that an advantage of running a load with a lambda function is that each instance, and thus each execution, gets dedicated resources - memory and CPU (and perhaps disk, network,... but that's less relevant). And then I started wondering...
For instance, if you have a function with some CPU-intensive logic that is used by multiple tenants, then one execution should never be affected by another. If some calculation takes 5 seconds to execute, it will always take 5 seconds, no matter how many requests are processed simultaneously.
This seems self-evident for memory, but less so for CPU. From a quick test I seem to get mixed results.
So, does every function instance gets its own CPU dedicated resources?
My main focus is AWS Lambda, but the same question arises for Azure (on a Consumption plan, I guess) and Google.
Lambda uses fractional CPU allocations of instance CPU, running on an instance type comparable to compute optimized EC2 instance. That CPU share is dedicated to the Lambda, and its allocation is based on the amount of memory allocated to the function.
The CPU share dedicated to a function is based off of the fraction of
its allocated memory, per each of the two cores. For example, an
instance with ~ 3 GB memory available for lambda functions where each
function can have up to 1 GB memory means at most you can utilize ~
1/3 * 2 cores = 2/3 of the CPU. The details may be revisited in the
future
The explanation is supported by Lambda Function Configuration documentation, which states:
Performance testing your Lambda function is a crucial part in ensuring
you pick the optimum memory size configuration. Any increase in memory
size triggers an equivalent increase in CPU availabile to your
function.
So yes, you get a dedicated share of an instances' total CPU, based on your memory allocation and the formula above.
It might have been indicated more clearly that I wasn't looking for documentation, but for facts. The core question was if we can assume that one execution should never be affected by another.
As I said, a first quick test gave me mixed results, so I took the time to delve in a little deeper.
I created a very simple lambda that, for a specified number of seconds, generates and sums up random numbers (code here):
while (process.hrtime(start)[0] < duration) {
var nextRandom = randomizer();
random = random + nextRandom - 0.5;
rounds++;
}
Now, if executions on different instances are really independent, then there should be no difference between executing this lambda just once or multiple times in parallel, all other factors being equal.
But the figures indicate otherwise. Here's a graph, showing the number of 'rounds' per second that was achieved.
Every datapoint is the average of 10 iterations with the same number of parallel requests - which should rule out cold start effects and other variations. The raw results can be found here.
The results look rather shocking: they indicate that avoiding parallel executions of the same lambda can almost double the performance....?!
But sticking to the original question: this looks like the CPU fraction 'dedicated' to a lambda instance is not fixed, but depends on certain other factors.
Of course I welcome any remarks on the test, and of course, explanations for the observed behavior!

AWS Lambda function times out on 1st invocation, works on 2nd invocation

My AWS Lambda function integrated with AWS API- Gateway request URL is getting timed out for every first request but it works for the next request.
Note: We also tried to keep the Lambdas warm by scheduling them in CloudWatch, but it didn't work.
It is the problem with the cold start.
You can do few of the following to improve the cold start speed,
If you using node.js,
Webpack:
Pack all the modules that are in separate files into a single file.
If you are using other languages,
Number of Files:
Keep the number of files in less count
LazyLoad:
Don't load everything upfront, lazy load or load modules when needed.
Hope it helps.
Without knowing too much about your specific use case, here are two general suggestions:
Increase the memory allocated to your functions, which also increases CPU proportionally. Because your functions are called very infrequently, the added cost of increasing memory size will be balanced by faster cold start times and thus lower billed duration.
Reduce your code size: a smaller .zip, removing unnecessary require()'s in Node.js, etc. For example, if you are including the Async library just to remove a nested callback, consider forgoing that to improve performance.
Refer https://forums.aws.amazon.com/thread.jspa?threadID=181348 for more options.

Optimizing coldfusion resources for scheduled tasks

I've been tasked with optimizing some scheduled tasks that run for hours. One of the tasks runs through data from 1995-present. My first thought (besides revising queries,etc) was to create a cfloop through all the years and start a thread for each year.
Is this a good approach? Is there a better way to divide up the workload on such a task?
You really need to work out what is slow before you optimize anything. Otherwise you may spend a lot of time tweaking code for relatively small gains. Databases are often the bottleneck but you need to find that out first.
At a simple level, you can enable debugging (on a dev machine) and see where time is being spent. There are also tools like Fusion Reactor which will give you more insight. Alternatively you can can just add some <cflog> calls to your script and then analyze them to identify the slow blocks. Which ever way you decide to do it, you need to know where your effort is best spent.
Some other thoughts....
Does the data change?
If not then compile the data once and store it so that the scheduled tasks don't have to redo the work each time
Which version of CF are you on?
You could run out of threads if you are not careful - which would be particularly bad if your server is running other stuff. But yes, threads could be part of your solution.