Can we use JDBC connection pooling with AWS Lambda ? AS AWS lambda function get called on a specific event, so its life time persist even after it finishing one of its call ?
No. Technically, you could create a connection pool outside of the handler function but since you can only make use of any one single connection per invocation so all you would be doing is tying up database connections and allocating a pool of which you could only ever use 1.
After uploading your Lambda function to AWS, the first time it is invoked AWS will create a container and run the setup code (the code outside of your handler function that creates the pool- let's say N connections) before invoking the handler code.
When the next request arrives, AWS may re-use the container again (or may not. It usually does, but that's down to AWS and not under your control).
Assuming it reuses the container, your handler function will be invoked (the setup code will not be run again) and your function would use one of N the connections to your database from the pool (held at the container level). This is most likely the first connection from the pool, number 1 as it is guaranteed to not be in use, since it's impossible for two functions to run at the same time within the same container. Read on for an explanation.
If AWS does not reuse the container, it will create a new container and your code will allocate another pool of N connections. Depending on the turnover of containers, you may exhaust the database pool entirely.
If two requests arrive concurrently, AWS cannot invoke the same handler at the same time. If this were possible, you'd have a shared state problem with the variables defined at the container scope level. Instead, AWS will use two separate containers and these will both allocate a pool of N connections each, i.e. 2N connections to your database.
It's never necessary for a single invocation function to require more than one connection (unless of course you need to communicate to two independent databases within the same context).
The only time a connection pool would be useful is if it were at one level above the container scope, that is, handed down by the AWS environment itself to the container. This is not possible.
The best case you can hope for is to have a single connection per container. Even then you would have to manage this single connection to ensure the database server hasn't disconnect or rebooted. If it does, your container's connection will die and your handler will never be able to connect again (until the container dies), unless you write some code in your function to check for dropped connections. On a busy server, the container might take a long time to die.
Also keep in mind that if your handler function fails, for example half way through a transaction or having locked a table, the next request invocation will get the dirty connection state from the container. The first invocation may have opened a transaction and died. The second invocation may commit and include all the previous queries up to the failure.
I recommend not managing state outside of the handler function at all, unless you have a specific need to optimise. If you do, then use a single connection, not a pool.
Yes, the lambda is mostly persistent, so JDBC connection pooling should work. The first time a lambda function is invoked, the environment will be created and it may or may not get reused. But in practice, subsequent invocations will often reuse the same lambda process along with all program state if your triggering events occur often.
This short lambda function demonstrates this:
package test;
import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.RequestHandler;
public class TestLambda implements RequestHandler<String, String> {
private int invocations = 0;
public String handleRequest(String request, Context context) {
invocations++;
System.out.println("invocations = " + invocations);
return request;
}
}
Invoke this from the AWS console with any string as the test event. In the CloudWatch logs, you'll see the invocations number increment each time.
Kudos to the AWS RDS proxy, now you can used pooled MySql and postgrese connections without any extra configs in your Java or other any code specific to AWS Lambda. All you need is to create and Add a Database proxy your AWS Lambda function you want to reuse/pool connections. See how-to here.
Note: AWS RDS proxy is not included in the Free-Tier (more here).
It has caveat
There is no destroy method which ensures closing pool. One may say DB connection idle time would handle.
What if same DB being used for other use cases like pool maintain in regular machine Luke EC2.
As many say, if there is sudden spike in requests, create chaos to DB as there will be always some maximum connection setting at database side per user.
Related
Our current implementation is to open one database connection outside of the Lambda handler. When the backing Lambda container terminates, the connection is then left open/idle.
Can I make the new container close the previous old container's database connection?
Are there any hooks available like an onContainerClose()?
How can we close the previous open connection which cannot be used anymore, when the Lambda cold starts?
In the background, AWS Lambda functions execute in a container that isolates them from other functions & provides the resources, such as memory, specified in the function’s configuration.
Any variable outside the handler function will be 'frozen' in between Lambda invocations and possibly reused. Possibly because depending on the volume of executions, the container is almost always reused though this is not guaranteed.
You can personally test this by invoking a Lambda with the below source code multiple times & taking a look at the response:
let counter = 0
exports.handler = async (event) => {
counter++
const response = {
statusCode: 200,
body: JSON.stringify(counter),
};
return response;
};
This also includes database connections that you may want to create outside of the handler, to maximise the chance of reuse between invocations & to avoid creating a new connection every time.
Regardless of if the Lambda function is reused or not, a connection made outside of the handler will eventually be closed when the Lambda container is terminated by AWS. Granted, the issue of "zombie" connections are much less when the connection is reused but it is still there.
When you start to reach a high number of concurrent Lambda executions, the main question is how to end the unused connections leftover by terminated Lambda function containers. AWS Lambda is quite good at reliably terminating connections when the container expires but you may still run into issues getting close to your max_connections limit.
How can we close the previous open connection which cannot be used anymore, when the Lambda cold starts?
There is no native workaround via your application code or Lambda settings to completely getting rid of these zombie connections unless you handle opening and closing them yourself, and take the added duration hit of creating a new connection (still a very small number).
To clear zombie connections (if you must), a workaround would be to trigger a Lambda which would then list, inspect & kill idle leftover connections. You could either trigger this via an EventBridge rule operating on a schedule or trigger it when you are close to maxing out the database connections.
These are also great guidelines to follow:
Ensure your Lambda concurrent executions limit does not exceed your database maximum connections limit: this is to prevent the database from maxing out connections
Reduce database timeouts (if supported): limit the amount of time that connections can be idle & left open, for example in MySQL tweaking the wait_timeout variable from the default 28800s (8 hour) to 900 seconds (15 minutes) can be a great start
Reduce the number of database connections: try your best to reduce the connections you need to make to the database via good application design & caching
If all else fails, look into increasing the max connections limit on the databe
The short version:
If I am caching values in my lambda container, how can I clear this cache? I guess I could redeploy the lambda, which will force all new requests to initiate a new cold start, but this doesn't seem like a nice solution.
The long version:
I am writing a custom authorizer for AWS API Gateway (in Python) that does two things:
It gets an api-key from an http header and looks it up in a dynamo table to verify it is valid (and get some attributes attached to it).
It verifies a JWT token (using some of the attributes from #1).
After following some code (this code), I learnt that I can cache values "globally" that can be re-used across invocations of the lambda, great! But if I cache say, the dynamodb response when looking up the api key, what if I have to revoke / issue a new api key at some point?
I'd like to be able to ensure that my lambda cache gets wiped somehow.
Short answer: You can force a new container for each invoke by calling the UpdateFunctionCode or UpdateFunctionConfiguration before exiting the execution for the same function. You can keep changing function time out before returning the response and the next invoke will spin up a new execution environment (container/sandbox) with a cold start penalty.
The right approach: If you are caching the function variables, you can clear them off inside the handler and continue with the execution logic. This will ensure you are not facing cold start penalties for subsequent invocations and you can in control of choosing the "right" values.
This can be better explained in using database clients. You can create the client outside the handler, but for every invoke verify if the client is valid. Recreate the client inside the handler if the original is now invalid. This will save you some processing time - as the CPU is throttled when the function hits the handler.
Since you are working with API Gateway, the cold start penalties will contribute towards API's Integration timeout (hard limit of 29 seconds for auth and backend combined); and I will try to avoid forcing cold start as much as possible.
I've scoured for any answer but everything I've read are about concurrent lambda executions and async keyword syntax in Node however I can't find information about lambda instance execution.
The genesis of this was that I was at a meetup and someone mentioned that lambda instances (i.e. a ephemeral container hosted by AWS containing my code) can only execute one request at a time. This means that if I had 5 requests come in (for the sake of simplicity lets say to an already warm instance) they would all run in a separate instance, i.e. in 5 separate containers.
The bananas thing to me is that this undermines years of development in async programming. Starting back in 2009 node.js popularized programming with i/o in mind given that for a boring run of the mill CRUD app most of your request time is spent waiting on external DB calls or something. Writing async code allowed a single thread of execution to seemingly execute many simultaneous requests. While node didn't invent it I think it's fair to say it popularized it and has been a massive driver of backend technology development over the last decade. Many languages have added features to make async programming easier (callbacks/tasks/promises/futures or whatever you want to call them) and web servers have shifted to event loop based (node, vertx, kestrel etc) away from the single thread per request models of yester year.
Anyways enough with the history lesson, my point is that if what I heard is true then developing with lambdas throws most of that out the window. If the lambda run time will never send multiple requests through my running instance then programming in an async style will just waste resources. Say for example I'm talking C# and my lambda is for retrieving widgets. Then this code var response = await db.GetWidgets() is actually inefficient because it pushes the current threadcontext onto the stack so it can allow for other code to execute while it waits for that call to comeback. Since no other request will be invoked until the original one completes it makes more sense to program in a synchronous style save for places where parallel calls can be made.
Is this correct?
If so I'm honestly shocked it's not discussed more. Async programming has paradigm shift I've seen in the last few years and this totally changes that.
TL;DR: does lambda really only allow one request execution at a time per instance? If so this up ends major shift in server development towards asynchronous code.
Yes, you are correct - Lambda will spin up multiple containers to handle concurrent requests even if your Lambda does some work asynchronously (I have confirmed this through my own experience, and many other people have observed the same behavior - see this answer to a similar question). This is true for every supported runtime (C#, Node.js, etc).
This means that async code in your Lambda functions won't allow one Lambda container to handle multiple requests at once, as you stated. That being said, you still get all the other benefits of async code and you could still potentially improve your Lambda's performance by, for example, making many web service or database calls at once asynchronously - so this property of Lambda does not make async programming useless on the platform.
Your question is :
Since no other request will be invoked until the original one completes it makes more sense to program in a synchronous style save for places where parallel calls can be made.
No because you no longer have to wait the answer as you should do if you were using a sync process. Your trigger itself must die after the call so it will free memory. Either the lamba sends a notification or triggers a new service once it is completed, either a watcher looks at the result value (it is possible to wait the answer with a sync lambda, but it is not accurate due to the underlying async process beneath lambda system itself). As an Android developper, you can compare that to intent and broadcast, and it is completely async.
It is a complete different way to design solution because the async mechanism must be managed on the workflow layer itself and no longer in the core of the app, the solution becomes an aggregation of notifiers/watchers that triggers micro-services, it is no longer a single binary of thousand lines of code.
Each lambda function must be an individual micro-services.
Coming back to handle heavy traffic, you can run millions of Lambda in parallel as long as your micro-service is ending quickly, it won't cost much.
To ensure that your workflow is not dropping anything, you can add SQS (queue messaging) in the solution.
Further to the above answer, please see here. From what I understand, it's a synchronous loop. So, the only way to make things async from a request-handling perspective is to delegate the work to a message queue, e.g. SQS, as written here. I think this is similar to how Celery is used to make Django asynchronous. Lastly, if you truly want async handling of requests in line with async/await in node.js/python/c#/c++, if you may need to use AWS Fargate / EC2 instead of Lambda. Otherwise in Lambda, as you have mentioned yourself, it's bananas indeed. On the other hand, for heavy traffic, for which async/await shows its benefits, Lambda is not a good fit. There is a break-even analysis here about the three services: ec2, Lambda and Fargate.
I have already gone through this link , (It was published in Dec-2014),Also referred this and this
How the AWS Lambda container reuse works as on May-2016, May you please share any specific link which tells in details ? Below I have few questions all around this AWS Lambda container reuse.
Consider a use case :
A Lambda function name "processMessageLambda" receives request when it has to process a message, and that message it receives from the POST REST API( from AWS API Gateway, as this lambda function connected with).
Now this 'processMessageLambda' process the message and store it to database.
So logically it does the following :
connect to database, store the message and shutdown the connection. (It works fine in normal case.).
If requests arrive say - 10 per second , and each lambda function takes 30 seconds to execute then it actually opens many database connections.
Q1: May we use 'connection pooling' on this case(e.g. boneCP) ? as numbers of calls to "processMessageLambda" would be like hundred per second or Ten per second ?
(refer :simple example of container reuse - It works as it says, but what will happen many request would arrive say - 10 request per seconds )
Q-2: If its possible to use the connection pooling, then how this aws lambda container would be reUsed ?
If consider a case :
Lets consider that requests received by this Lambda function are Ten per second, on this case - 10 different container of this lambda function would be created or single container of lambda function would be created and that would be used with all these 10 requests ?
Q-3: If 10 different container of lambda function would be created then that means 10 database connections would be used , so those 10 container would be reUsed on further requests ?
Q-4: May you please explain from a developer point of view, that how it actually aws lambda container reuse works or how a developer think about it while reusing the aws lambda container ?
Q-5: If container reuse already in place,How a developer need to maintain the state of variables so a developer know which variable would be reused ?
I build and maintain multiple serverless applications. One thing to remember which helps me is: two Lambda functions live in different universes. This immediately answer a few questions:
1: No, you can only do connection pooling per Lambda. If you fire 100 Lambda functions, you have 100 connections.
2: -
3: The exact algorithm behind container re-use is not specified. The next requests may use an existing container or a new container. There is no way to tell. Read this blog post for more info.
4: The best way imo is to think about it that containers are not re-used at all, with the added rule to always use unique (random) filenames in scratch space if you need it (the /tmp folder). When I tried to keep the connection open for re-use, the connection timed out and THEN got re-used. This resulted in database connection issues. Now I just open and close during each invocation.
5: Keep your code stateless (except modules, NEVER use global variables) and always use a unique (random) name if you need to store files in scratch space. These two general rules of thumb save me from lots of issues.
I'm working on a project were I'm using a lambda function to connect to a relational database and to DynamoDB at the same time. To access that function I'm using API Gateway, but I found a problem: My lambda function, written in Java takes more than 10 seconds to start due to the creation of both database connections.
I know API Gateway timeout is 10 seconds, and that's not a problem executing my function witch takes less than 1 second, but the problem is when it has to start.
I would like to know how to catch this timeout exception and notify to the user that he needs to start the request again.
Is there a way to do so without moving to Node.js or accessing lambda function directly?
Since the cost of establishing a connection to a relational database is so high, I would encourage you to open the connection in the initialization code of your Lambda function (outside of the handler).
The database connection will then be re-used across multiple invocations for the lifetime of the Lambda container. Within your Lambda function handler you may want to ensure the connection is alive and hasn't timed out, and re-open as required.
The first call through API Gateway may timeout, but subsequent calls will reuse the connection for the lifetime of the container.
Another trick is to create a scheduled function to periodically call your function to keep the container "warm".
Cheers,
Ryan