Optimal way to do CPU bound task in AWS Lambda - amazon-web-services

I have to process a lot of data in my Lambda code and this computation could be parallelized. I am currently using single-threaded Python code and want to optimize it. I thought about converting it to multi-threaded Python code, but anyway it seems that Amazon Lambda doesn't have enough resources. What is the best way to do this?

AWS Lambda now supports up to 10 GB of memory and 6 vCPU cores for Lambda Functions
If you want to do CPU bound, parallelized functions on Lambda, always remember these core behaviour
The total amount number of vCPU (which correlate to the optimal thread count) is dictated by how much memory you assigned for that Lambda function
Lambda allocates CPU power in proportion to the amount of memory configured. Memory is the amount of memory available to your Lambda function at runtime. You can increase or decrease the memory and CPU power allocated to your function using the Memory (MB) setting. To configure the memory for your function, set a value between 128 MB and 10,240 MB in 1-MB increments. At 1,769 MB, a function has the equivalent of one vCPU (one vCPU-second of credits per second).
Lambda is also severely limited by the maximum amount of time it can be run: 900 seconds (15 minutes)
Depend on how your application has been architectured, you can improve the performance with these things in mind
It does support multi-threaded / multi-core processing. How-to in python can be found here
When you hit the upper limits of a single Lambda run, think about ways to break the work to multiple Lambdas running in parallel if possible. That level of horizontal scaling is what Lambda excels at.

Related

Lambda instance ram allocation

In aws lambda the ram allocated for a lambda is for one instance of that lambda or for all running instance of that lambda? Till now I believed its for each instance.
Let's consider a lambda 'testlambda' and I am configuring it to have 5 Minutes timeout and 3008 MB (current max) RAM and have "Use unreserved account concurrency" option selected:
At T one instance of 'testlambda' start running and assume that it is going to run for 100 seconds and going to use 100 MB of RAM while it is running(for the whole 100 seconds), if one more instance of 'testlambda' start at T+50s how much RAM will be available for the second instance 3008 MB or 2908 MB ?
I used to believe that the second instance will also have 3008 MB. But after seeing the recent execution logs of my lambda I am inclined to say that for the second instance will have 2908 MB.
The allocation is for each container.
Containers are not used by more than one invocation at any given time -- that is, containers are reused, but not concurrently. (And not by more than one version of one function).
If your code is leaking memory, this means subsequent but non-concurrent invocations spaced relatively close together in time will be observed as using more and more memory because they are running in the same container... but this would never happen in the scenario you described, because with the second invocation at T+50, it would never share the container with the 100-second process started at T+0.
From what i saw, at least so far, the ram is not shared. We had a lot of concurrent requests with the default ram for lambdas, if for some reason this was shared we would see problems related to memory, but that never happened.
You could test this by reducing the amount of ram of a dummy lambda that would execute for X seconds and try to call it several times to see if the memory used is greater than the memory you selected.

How does memory allocation impact processing time at AWS lambda?

My lambda function was taking about 120ms with 1024mb memory size. When I checked the log, it was using only 22mb at max, so I tried optimizing it, reducing to 128mb.
But, when I did this, the ~120ms of processing went up to about ~350ms, but still, only 22 mb was being used.
I'm a bit confused, if I just used 22mb, then why having 128 or 1024mb available impact the processing time?
The underlying CPU power is directly proportional to the memory footprint that you select. So basically that memory knob controls your CPU allocation as well.
So that is the reason why you are seeing that reducing the memory causes Lambda to take more time for execution
Following is what is documented on AWS Docs for Lambda
Compute resources that you need – You only specify the amount of memory you want to allocate for your Lambda function. AWS Lambda allocates CPU power proportional to the memory by using the same ratio as a general purpose Amazon EC2 instance type, such as an M3 type. For example, if you allocate 256 MB memory, your Lambda function will receive twice the CPU share than if you allocated only 128 MB.

Do serverless functions get dedicated CPU resources?

I was telling a friend that an advantage of running a load with a lambda function is that each instance, and thus each execution, gets dedicated resources - memory and CPU (and perhaps disk, network,... but that's less relevant). And then I started wondering...
For instance, if you have a function with some CPU-intensive logic that is used by multiple tenants, then one execution should never be affected by another. If some calculation takes 5 seconds to execute, it will always take 5 seconds, no matter how many requests are processed simultaneously.
This seems self-evident for memory, but less so for CPU. From a quick test I seem to get mixed results.
So, does every function instance gets its own CPU dedicated resources?
My main focus is AWS Lambda, but the same question arises for Azure (on a Consumption plan, I guess) and Google.
Lambda uses fractional CPU allocations of instance CPU, running on an instance type comparable to compute optimized EC2 instance. That CPU share is dedicated to the Lambda, and its allocation is based on the amount of memory allocated to the function.
The CPU share dedicated to a function is based off of the fraction of
its allocated memory, per each of the two cores. For example, an
instance with ~ 3 GB memory available for lambda functions where each
function can have up to 1 GB memory means at most you can utilize ~
1/3 * 2 cores = 2/3 of the CPU. The details may be revisited in the
future
The explanation is supported by Lambda Function Configuration documentation, which states:
Performance testing your Lambda function is a crucial part in ensuring
you pick the optimum memory size configuration. Any increase in memory
size triggers an equivalent increase in CPU availabile to your
function.
So yes, you get a dedicated share of an instances' total CPU, based on your memory allocation and the formula above.
It might have been indicated more clearly that I wasn't looking for documentation, but for facts. The core question was if we can assume that one execution should never be affected by another.
As I said, a first quick test gave me mixed results, so I took the time to delve in a little deeper.
I created a very simple lambda that, for a specified number of seconds, generates and sums up random numbers (code here):
while (process.hrtime(start)[0] < duration) {
var nextRandom = randomizer();
random = random + nextRandom - 0.5;
rounds++;
}
Now, if executions on different instances are really independent, then there should be no difference between executing this lambda just once or multiple times in parallel, all other factors being equal.
But the figures indicate otherwise. Here's a graph, showing the number of 'rounds' per second that was achieved.
Every datapoint is the average of 10 iterations with the same number of parallel requests - which should rule out cold start effects and other variations. The raw results can be found here.
The results look rather shocking: they indicate that avoiding parallel executions of the same lambda can almost double the performance....?!
But sticking to the original question: this looks like the CPU fraction 'dedicated' to a lambda instance is not fixed, but depends on certain other factors.
Of course I welcome any remarks on the test, and of course, explanations for the observed behavior!

AWS Lambda function needs more than max memory reported in CloudWatch

I have a Lambda function that reads messages off an SQS queue and inserts items into Dynamo. At first, I had it at 512MB of memory. In cloud watch, it reported the max memory used was around 58MB. I assumed I could then lower the memory to 128MB and see the same rate of processing SQS messages. However, that wasn't the case. Things noticeably slowed. Can anyone explain?
Here is cloud watch showing max memory with 512MB Lambda:
Here is cloud watch showing max memory with 128MB Lambda:
Here you can see the capacity of the Dynamo table really dropped
Here you can see the number of messages being processes really slowed, as evidence by the lower slope
This seems counter-intuitive, but there's a logical explanation:
Reducing memory also reduces the available CPU cycles. You're paying for very short term use of a fixed fraction of the resources of an EC2 instance, which has a fixed ratio of CPU to memory.
Q: How are compute resources assigned to an AWS Lambda function?
In the AWS Lambda resource model, you choose the amount of memory you want for your function, and are allocated proportional CPU power and other resources. For example, choosing 256MB of memory allocates approximately twice as much CPU power to your Lambda function as requesting 128MB of memory and half as much CPU power as choosing 512MB of memory. You can set your memory in 64MB increments from 128MB to 1.5GB.
https://aws.amazon.com/lambda/faqs/
So, how much CPU capacity are we talking about?
AWS Lambda allocates CPU power proportional to the memory by using the same ratio as a general purpose Amazon EC2 instance type, such as an M3 type.
http://docs.aws.amazon.com/lambda/latest/dg/lambda-introduction-function.html
We can extrapolate.
In the M3 class, regardless of instance size, the provisioning factors look like this:
CPU = Xeon E5-2670 v2 (Ivy Bridge) × 8 cores
Relative Compute Performance = 26 ECU
Memory = 30 GiB
An ECU is an EC2 (or possibly "Elastic" or "Equivalent") Compute Unit, where 1.0 ECU is approximately equivalent to the compute capacity of a 1GHz Opteron. It's a dimensionless quantity for simplifying comparison of the relative CPU capacity of differing instance types.
So the provisioning ratios look like this:
8/30 Cores/GiB
26/30 ECU/GiB
So at 512 MiB memory, your Lambda function's container's share of this machine would be...
8 ÷ 30 ÷ (1024/512) = 0.133 of 1 core (~13.3% CPU)
26 ÷ 30 ÷ (1024/512) = 0.433 ECU (~433 MHz equivalent)
At 128 MiB, it's only about 1/4 of that.
These numbers seem really small, but they are not inappropriate for the typical Lambda use-case -- single-threaded, asynchronous actions that are not CPU intensive.

Lambda cold start possible solution?

Is scheduling a lambda function to get called every 20 mins with CloudWatch the best way to get rid of lambda cold start times? (not completely get rid of)...
Will this get pricey or is there something I am missing because I have it set up right now and I think it is working.
Before my cold start time would be like 10 seconds and every subsequent call would complete in like 80 ms. Now every call no matter how frequent is around 80 ms. Is this a good method until say your userbase grows, then you can turn this off?
My second option is just using beanstalk and having a server running 24/7 but that sounds expensive so I don't prefer it.
As far as I know this is the only way to keep the function hot right now. It can get pricey only when you have a lot of those functions.
You'd have to calculate yourself how much do you pay for keeping your functions alive considering how many of them do you have, how long does it take to run them each time and how much memory do you need.
But once every 20 minutes is something like 2000 times per month so if you use e.g. 128MB and make them finish under 100ms then you could keep quite a lot of such functions alive at 20 minute intervals and still be under the free tier - it would be 20 seconds per month per function. You don't even need to turn it off after you get a bigger load because it will be irrelevant at this point. Besides you can never be sure to get a uniform load all the time so you might keep your heart beating code active even then.
Though my guess is that since it is so cheap to keep a function alive (especially if you have a special argument that makes them return immediately) and that the difference is so great (10 seconds vs. 80 ms) then pretty much everyone will do it - there is pretty much no excuse not to. In that case I expect Amazon to either fight that practice (by making it hard or more expensive than it currently is - which wouldn't be a smart move) or to make it not needed in the future. If the difference between hot and cold start was 100ms then no one would bother. If it is 10 seconds than everyone needs to work around it.
There would always have to be a difference between running a code that was run a second ago and a code that was run a month ago, because having all of them in RAM and ready to go would waste a lot of resources, but I see no reason why that difference couldn't be made less noticeable or even have few more steps instead of just hot and cold start.
You can improve the cold start time by allocating more memory to your Lambda function. With the default 512MB, I am seeing cold start times of 8-10 seconds for functions written in Java. This improves to 2-3 seconds with 1536MB of memory.
Amazon says that it is the CPU allocation that really matters, but there is no way to directly change it. CPU allocation increases proportionately to memory.
And if you want close to zero cold start times, keeping the function warm is the way to go, as described rsp suggested.
Starting from December 2019 AWS Lambda supports Reserved Concurrency (so you can set the number of lambda functions that will be ready and waiting for new calls) [1]
The downside of this, is that you will be charged for the reserved concurrency. If you provision a concurrency of 1, for a lambda with 128MB being active 24 hrs for the whole month, you will be charged: 1 instance x 30 days x 24 hr x 60min x 60sec x (128/1024) = 324,000 GB-sec (almost all of the capacity AWS gives for the lambda free tier) [2]
From above you will get a lambda instance that responds very fast...subsequent concurrent calls may still suffer "cold-start" though.
What is more, you can configure application autoscaling to dynamically manage the provisioned concurrency of your lambda. [3]
Refs:
https://aws.amazon.com/blogs/aws/new-provisioned-concurrency-for-lambda-functions/
https://aws.amazon.com/lambda/pricing/
https://docs.aws.amazon.com/lambda/latest/dg/configuration-concurrency.html
Among adding more memory for lambda, there is also one more approach to reduce the cold starts: use Graal native-image tool. The jar is translated into byte code. Basically, we would do part of the work, which is done on aws. When you build your code, on loading to AWS - select "Custom runtime", not java8.
Helpful article: https://engineering.opsgenie.com/run-native-java-using-graalvm-in-aws-lambda-with-golang-ba86e27930bf
Beware:
but it also has its limitations; it does not support dynamic class loading, and reflection support is also limited
Azure has pre warming solution for serverless instances(Link). This would be a great feature in AWS lambda if and when they implement it.
Instead of user warming the instance at the application level it's handled by the cloud provider in the platform.
Hitting server would not resolve case of simultaneous requests by users, or same page sending a few api requests async.
A better solution is to dump the 'warmed-up' into docker checkpoint. It is especially useful for dynamic language when warm up is fast, yet loading of all the libraries is slow.
For details read
https://criu.org/Docker
https://www.imperial.ac.uk/media/imperial-college/faculty-of-engineering/computing/public/1819-ug-projects/StenbomO-Refunction-Eliminating-Serverless-Cold-Starts-Through-Container-Reuse.pdf
Other hints:
use more memory
use Python or JavaScript with most basic libraries, try eliminate bulky ones
create several 'microservices' to reduce chances of several users hitting same service
see more at https://www.jeremydaly.com/15-key-takeaways-from-the-serverless-talk-at-aws-startup-day/
Lambda's cold start depends on multiple factors such as your implementation, the language run-time you use, and the code size etc. If you give your Lambda function more memory, you can reduce the cold start too. You can read the best practices https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html
The serverless community also has recommendations for performance https://atlas.serverless.tech-field-community.aws.a2z.com/Performance/index.html
Lambda team also launched Provisioned Concurrency. You can now request multiple Lambda containers be kept in a "hyper ready" state, ready to re-run your function. This is the new best practice for reducing the likelihood of cold starts.
Official docs https://docs.aws.amazon.com/lambda/latest/dg/configuration-concurrency.html?icmpid=docs_lambda_console