How AWS Lambda allocates CPU when concurrent lambda invoked? - amazon-web-services

I have one lambda function to test the URLs using puppeteer and chrome.
When I invoke 50 lambdas at the same time chrome is not able to load all the passed URLs.
What could be the reason for it?
I suspect it shares the CPU with time slicing.

One of the best features of AWS Lambda functions is scalability. It means it will increase the needed resources to perform the task. It is impossible to share the CPU because it will destroy the whole concept of Serverless in Lambda Functions. BUT, these scenarios could be your problem:
Multiple invocations at the same will share /tmp directory. Your code might store more than allowed ephemeral storage in your invocation which might be the reason of your problem. I suggest checking to invocation logs to see if you can find any errors for regarding the ephemeral storage.
As you said, you are sending 50 requests at same time. If the target server is just a single server, it might be flooded and the memory might get full. In that case, the server can't respond to you anymore.

Related

AWS Serverless: Force parallel lambda execution based on request or HTTP API parameters

Is there a way to force AWS to execute a Lambda request coming from an API Gateway resource in a certain execution environment? We're in a use-case where we use one codebase with various models that are 100-300mb, so on their own small enough to fit in the ephemeral storage, but too big to play well together.
Currently, a second invocation with a different model will use the existing (warmed up) lambda function, and run out of storage.
I'm hoping to attach something like a parameter to the request that forces lambda to create parallel versions of the same function for each of the models, so that we don't run over the 512 MB limit and optimize the cold-boot times, ideally without duplicating the function and having to maintain the function in multiple places.
I've tried to investigate Step Machines but I'm not sure if there's an option for parameter-based conditionality there. AWS are suggesting to use EFS to circumvent the ephemeral storage limits, but from what I can find, using EFS will be a lot slower than reading from the ephemeral /tmp/ directory.
To my knowledge: no. You cannot control the execution environments. Only thing you can do is limit the concurrent executions.
So you never know, if it is a single Lambda serving all your events triggered from API Gateway or several running in parallel. You also have no control over which one of the execution environments is serving the next request.
If your issues is the /temp directory limit for AWS Lambda, why not try EFS?

Does AWS Lambda run every invocation in a separate Firecracker VM?

I am aware of the cold-start and warm-start in AWS Lambda.
However, I am not sure during the warm-start if the Lambda architecture reuses the Firecracker VM in the backend? Or does it do the invocation in a fresh new VM?
Is there a way to enforce VM level isolation for every invocation through some other AWS solution?
Based on what stated on the documentation for Lambda execution context, Lambda tries to reuse the execution context between subsequent executions, this is what leads to cold-start (when the context is spun up) and warm-start (when an existing context is reused).
You typically see this latency when a Lambda function is invoked for the first time or after it has been updated because AWS Lambda tries to reuse the execution context for subsequent invocations of the Lambda function.
This is corroborated by another statement in the documentation for the Lambda Runtime Environment where it's stated that:
When a Lambda function is invoked, the data plane allocates an execution environment to that function, or chooses an existing execution environment that has already been set up for that function, then runs the function code in that environment.
A later passage of the same page gives a bit more info on how environments/resources are shared among functions and executions in the same AWS Account:
Execution environments run on hardware virtualized virtual machines (microVMs). A microVM is dedicated to an AWS account, but can be reused by execution environments across functions within an account. [...] Execution environments are never shared across functions, and microVMs are never shared across AWS accounts.
Additionally, there's also another doc page that gives some more details on isolation among environments but again, no mention to the ability to enforce 1 execution per environment.
As far as I know there's no way to make it so that a new execution will use a new environment rather than an existing one. AWS doesn't provide much insight in this but the wording around the topic seems to suggest that most people actually try to do the opposite of what you're looking for:
When you write your Lambda function code, do not assume that AWS Lambda automatically reuses the execution context for subsequent function invocations. Other factors may dictate a need for AWS Lambda to create a new execution context, which can lead to unexpected results, such as database connection failures.
I would say that if your concern is isolation from other customers/accounts, AWS guarantees isolation by means of virtualisation that although not being at the physical level, depending on their SLAs and your SLAs/requirements might be enough. If instead you're thinking on doing some kind of multi-tenant infrastructure that requires Lambda executions to be isolated from one another then this component might not be what you're looking for.

How to improve lambda performance?

Hi I am trying to understand the lambda architecture in depth. Below is my understanding about lambda.
Whenever we create lambda function, container will spin up. If we select python as run time the python container will spin up. Now there is cold start. For example, If we dint call lambda for long time, container will become inactive. It will call new container and it will take some time to spin up new container. This is cold start. Now I am bit confused here. If I want to avoid this delay what is the right approach? We can trigger lambda every 5 min using cloud watch. Any other good approaches to handle this?
Also there is /tmp folder where we can store static files. So /tmp is not part of container? Whenever new container spins up, /tmp data will be lost or remain? Can someone help me to understand this concepts and tell me to use best approaches to handle this? Any help would be appreciated. Thank you.
You are correct there is a cold start issue but it's been observed that it depends on a lot of factors(runtime, memory, zip size....for e.g. a java lambda will have more cold start compared to python) and basically it was a big problem for lambdas inside a user-defined VPC. wherein there is an overhead of creating an elastic network interface and then invoking the lambda. But the recent rollout has changed this and now you should not see this problem. improved-vpc-networking for lambda.
Also just in the reinvent 2019 aws have announced the Provisioned Concurrency So for lambda Functions using Provisioned Concurrency will execute with consistent start-up latency.
With Provisioned Concurrency, functions can instantaneously serve a
burst of traffic with consistent start-up latency for every invoke up
to the specified scale. Customers only pay for the amount of concurrency that they configure and for the period of time that it is configured.
Regarding the /tmp please note that Each Lambda function receives 512MB of non-persistent disk space in its own /tmp directory. So you cannot rely on it. Lambda limits If you are looking for persistent storage you should be using S3.

Serverless Database Connection Pooling

I’m trying to build an application on aws that is 100% serverless (minus the database for now) and what I’m running into is that the database is the bottleneck. My application can scale very well but my database has a finite number of connections it can accommodate and at some point, my lambdas will run into that limit. I can do connection pooling outside of the handler in my lambdas so that there is a database connection per lambda container instead of per invocation and while that does increase the number of concurrent invocations before I hit my connection limit, the limit still exists.
I have two questions.
1. Does serverless aurora solve this by autoscaling to increase the number of instances to meet the need for more connections.
2. Are there any other solutions to this problem?
Also, from other developers interested in serverless, am I trying to do something that’s not worth doing? I love how easy deployment is with serverless framework but is it better just to work with Microservices in something like Kubernetes instead?
I believe there are two potential solutions to that problem:
The first and the simplest option is to take advantage of "lambda hot state", it's the concept when Lambda reuses the execution context for subsequent invocations. As per AWS suggestion
Any declarations in your Lambda function code (outside the handler code, see Programming Model) remains initialized, providing additional optimization when the function is invoked again. For example, if your Lambda function establishes a database connection, instead of reestablishing the connection, the original connection is used in subsequent invocations. We suggest adding logic in your code to check if a connection exists before creating one.
Basically, while the lambda function is the hot stage it "might/should" reuse opened connection(s).
The limitations of the following:
you only reuse connection for single lambda type, so if you have 5 lambda functions invoked all the time you still will be using 5 connections
when you have a spike in lambda invocations, including parallel executions this approach becomes less effective since, lambda will be executed in a new execution context for majority of requests
The second option would be to use a connection pool, connection pool is an array of established database connections, so that the connections can be reused when future requests to the database are required.
While the second option provides a more consistent solution it requires much more infrastructure.
you would be required to run a separate instance for the pool, and if you want to do things properly probably at least two instances and a load balancer (unless use containers).
While it might be overwhelming to provision that much additional infrastructure for connection pooler, it still might be a valid option depending on the scale of the project, your existing infrastructure (may be you already using containers) and cost benefits
Best practices by AWS recommends to take advantage of hot start. You can read more about it here: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.Lambda.BestPracticesWithDynamoDB.html

AWS Lambda, Scaling, Implementation

I'm looking for a specific piece of documentation about the scaling of AWS Lambda.
How I think the scaling works:
Scenario: high traffic
AWS spins up multiple instances of the same Lambda Function
AWS distributes the events (probably evenly) among the instances
So what am I looking for specifically?
Is there a document where AWS states how lambda works internally or any information that concerns the process I described above (I need something to quote).
Thank you.
Officially, none of the implementation details of how AWS Lambda operates should impact your usage of the service. All you need to know is that something triggers a Lambda function, it runs and exits.
There is a limit on the number of simultaneous functions that can run (but you can ask for an increase in this limit). There is no guarantee that the functions run in a specific order.
The reality, however, is that Lambda functions are deployed as containers and those containers might be reused. For example, if you have a function that runs once per second for 200ms, it is quite likely that the container will be reused. The benefit of this is that there is no initialization time for the container if it is reused. This is particularly beneficial for Java functions that require creation of a JVM.
It also means that your function should assume that the environment will be reused — it should cleanup temporary files and reset global variables.
For more details, see: Understanding Container Reuse in AWS Lambda | AWS Compute Blog