How to handle EventProcessorClient error callback "Attempting to renew all of the processor's partition ownership in the storage service."? - azure-eventhub

I implemented a service that uses the EventProcessorClient component to process events from an Azure EventHub.
The error callback
private Task OnErrorAsync(ProcessErrorEventArgs args)...
is executed many times, with
args.Operation set to "Executing a load balancing cycle."
args.Exception to "Azure.Messaging.EventHubs.EventHubsException(GeneralError): Attempting to renew all of the processor's partition ownership in the storage service."
Is it a proper error or just a notification that is safe to ignore?

Related

Handling concurrent requests

I am building a recommendation service which recommend the items based on use case. For this client needs to call our API
Functionality of API:
Clients call with the list of items required and the use case.
Based on that we will return the exact items.
Stack:
AWS Lambda
Amazon DynamoDB
Problem: How do we handle concurrent fetch requests for the same use case.
Solutions:
Flow using pessimistic locking:
Acquire dbLock on the list of available items for use case.
Remove the items from the original db
Release the lock
This will increase the latency of the API
Flow using Optimistic locking:
Fetch the available items.
Remove from the list and return.
If any other thread tries to delete the items from available list that was already deleted then send an exception to client to call the API again.
Is there any other more efficient way of handling the concurrent requests?

How do I handle idle database connections made outside of a Lambda function handler?

Our current implementation is to open one database connection outside of the Lambda handler. When the backing Lambda container terminates, the connection is then left open/idle.
Can I make the new container close the previous old container's database connection?
Are there any hooks available like an onContainerClose()?
How can we close the previous open connection which cannot be used anymore, when the Lambda cold starts?
In the background, AWS Lambda functions execute in a container that isolates them from other functions & provides the resources, such as memory, specified in the function’s configuration.
Any variable outside the handler function will be 'frozen' in between Lambda invocations and possibly reused. Possibly because depending on the volume of executions, the container is almost always reused though this is not guaranteed.
You can personally test this by invoking a Lambda with the below source code multiple times & taking a look at the response:
let counter = 0
exports.handler = async (event) => {
counter++
const response = {
statusCode: 200,
body: JSON.stringify(counter),
};
return response;
};
This also includes database connections that you may want to create outside of the handler, to maximise the chance of reuse between invocations & to avoid creating a new connection every time.
Regardless of if the Lambda function is reused or not, a connection made outside of the handler will eventually be closed when the Lambda container is terminated by AWS. Granted, the issue of "zombie" connections are much less when the connection is reused but it is still there.
When you start to reach a high number of concurrent Lambda executions, the main question is how to end the unused connections leftover by terminated Lambda function containers. AWS Lambda is quite good at reliably terminating connections when the container expires but you may still run into issues getting close to your max_connections limit.
How can we close the previous open connection which cannot be used anymore, when the Lambda cold starts?
There is no native workaround via your application code or Lambda settings to completely getting rid of these zombie connections unless you handle opening and closing them yourself, and take the added duration hit of creating a new connection (still a very small number).
To clear zombie connections (if you must), a workaround would be to trigger a Lambda which would then list, inspect & kill idle leftover connections. You could either trigger this via an EventBridge rule operating on a schedule or trigger it when you are close to maxing out the database connections.
These are also great guidelines to follow:
Ensure your Lambda concurrent executions limit does not exceed your database maximum connections limit: this is to prevent the database from maxing out connections
Reduce database timeouts (if supported): limit the amount of time that connections can be idle & left open, for example in MySQL tweaking the wait_timeout variable from the default 28800s (8 hour) to 900 seconds (15 minutes) can be a great start
Reduce the number of database connections: try your best to reduce the connections you need to make to the database via good application design & caching
If all else fails, look into increasing the max connections limit on the databe

Using AWS Lambda to periodically monitor state of a remote resource

I need to implement the following feature in the backend on AWS:
- API endpoint which allows a user to start a particular long running "process" in a remote system
- the process status in this remote system should be monitored periodically (every few-several seconds) for status and when status == complete, trigger an action (the remote system does not support sending/triggering notifications or callbacks)
We use primarily lambda functions so I'm thinking about approaching it in the following way:
- my endpoint which is triggered by the user would call remote system to start the process, store record in internal DB and generate a message to SQS (with a delivery delay of X seconds)
- there would be a second lambda that would read messages from SQS & check status of the process in this remote system. When status == complete, trigger an action, when status != complete, generate another message to SQS which would again the same lambda would pick up after X seconds of delay and repeat the check and so on
I'm wondering if there is a better solution/tools to implement this kind of monitoring/notification pattern in the AWS since I'm not that familiar with all the services that AWS provides.
Would anyone comment on this approach and perhaps suggest an alternative if there is one?
Take a look at AWS Step Functions which I think is the best fit for your use case.
All you need to do is, instead of generating a SQS message, initiate an execution of a StateMachine in StepFunctions.
The following tutorial explains a iterator loop with a counter. But you can use the same logic to check the status and keep looping until status == complete
https://docs.aws.amazon.com/step-functions/latest/dg/tutorial-create-iterate-pattern-section.html
Another useful resource which I think very close to your use case
https://docs.aws.amazon.com/step-functions/latest/dg/sample-project-job-poller.html

Handle timeout in AWS API Gateway

I'm working on a project were I'm using a lambda function to connect to a relational database and to DynamoDB at the same time. To access that function I'm using API Gateway, but I found a problem: My lambda function, written in Java takes more than 10 seconds to start due to the creation of both database connections.
I know API Gateway timeout is 10 seconds, and that's not a problem executing my function witch takes less than 1 second, but the problem is when it has to start.
I would like to know how to catch this timeout exception and notify to the user that he needs to start the request again.
Is there a way to do so without moving to Node.js or accessing lambda function directly?
Since the cost of establishing a connection to a relational database is so high, I would encourage you to open the connection in the initialization code of your Lambda function (outside of the handler).
The database connection will then be re-used across multiple invocations for the lifetime of the Lambda container. Within your Lambda function handler you may want to ensure the connection is alive and hasn't timed out, and re-open as required.
The first call through API Gateway may timeout, but subsequent calls will reuse the connection for the lifetime of the container.
Another trick is to create a scheduled function to periodically call your function to keep the container "warm".
Cheers,
Ryan

How we can use JDBC connection pooling with AWS Lambda?

Can we use JDBC connection pooling with AWS Lambda ? AS AWS lambda function get called on a specific event, so its life time persist even after it finishing one of its call ?
No. Technically, you could create a connection pool outside of the handler function but since you can only make use of any one single connection per invocation so all you would be doing is tying up database connections and allocating a pool of which you could only ever use 1.
After uploading your Lambda function to AWS, the first time it is invoked AWS will create a container and run the setup code (the code outside of your handler function that creates the pool- let's say N connections) before invoking the handler code.
When the next request arrives, AWS may re-use the container again (or may not. It usually does, but that's down to AWS and not under your control).
Assuming it reuses the container, your handler function will be invoked (the setup code will not be run again) and your function would use one of N the connections to your database from the pool (held at the container level). This is most likely the first connection from the pool, number 1 as it is guaranteed to not be in use, since it's impossible for two functions to run at the same time within the same container. Read on for an explanation.
If AWS does not reuse the container, it will create a new container and your code will allocate another pool of N connections. Depending on the turnover of containers, you may exhaust the database pool entirely.
If two requests arrive concurrently, AWS cannot invoke the same handler at the same time. If this were possible, you'd have a shared state problem with the variables defined at the container scope level. Instead, AWS will use two separate containers and these will both allocate a pool of N connections each, i.e. 2N connections to your database.
It's never necessary for a single invocation function to require more than one connection (unless of course you need to communicate to two independent databases within the same context).
The only time a connection pool would be useful is if it were at one level above the container scope, that is, handed down by the AWS environment itself to the container. This is not possible.
The best case you can hope for is to have a single connection per container. Even then you would have to manage this single connection to ensure the database server hasn't disconnect or rebooted. If it does, your container's connection will die and your handler will never be able to connect again (until the container dies), unless you write some code in your function to check for dropped connections. On a busy server, the container might take a long time to die.
Also keep in mind that if your handler function fails, for example half way through a transaction or having locked a table, the next request invocation will get the dirty connection state from the container. The first invocation may have opened a transaction and died. The second invocation may commit and include all the previous queries up to the failure.
I recommend not managing state outside of the handler function at all, unless you have a specific need to optimise. If you do, then use a single connection, not a pool.
Yes, the lambda is mostly persistent, so JDBC connection pooling should work. The first time a lambda function is invoked, the environment will be created and it may or may not get reused. But in practice, subsequent invocations will often reuse the same lambda process along with all program state if your triggering events occur often.
This short lambda function demonstrates this:
package test;
import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.RequestHandler;
public class TestLambda implements RequestHandler<String, String> {
private int invocations = 0;
public String handleRequest(String request, Context context) {
invocations++;
System.out.println("invocations = " + invocations);
return request;
}
}
Invoke this from the AWS console with any string as the test event. In the CloudWatch logs, you'll see the invocations number increment each time.
Kudos to the AWS RDS proxy, now you can used pooled MySql and postgrese connections without any extra configs in your Java or other any code specific to AWS Lambda. All you need is to create and Add a Database proxy your AWS Lambda function you want to reuse/pool connections. See how-to here.
Note: AWS RDS proxy is not included in the Free-Tier (more here).
It has caveat
There is no destroy method which ensures closing pool. One may say DB connection idle time would handle.
What if same DB being used for other use cases like pool maintain in regular machine Luke EC2.
As many say, if there is sudden spike in requests, create chaos to DB as there will be always some maximum connection setting at database side per user.