My lambda function contains some code that includes queries to dynamodb. Once a query is executed, the lambda continues with the rest of the code, which is based on the result of that query. What happens if I exceed the capacity limit of the dynamodb? I can push the query to SQS and process it later, but then I will not be able to continue the execution of the lambda. Another solution would be to retry each query that fails, but if the dynamodb is extremely busy, my lambda might exceed the 5 minute limit. Seems like a lose-lose situation. What would you do?
The most fault-tolerant solution would be to decouple the querying and the processing of the results.
Instead of processing results immediately, write the results to another SQS queue and send an SNS notification.
Move the processing to a second Lambda function. This new function can be triggered by the SNS notification. It can read the results queue and process any pending messages.
Modify the original function to queue any failed queries for later.
Related
So, I am putting some entries in SQS Queue which is set as an event source for the Lambda, and this flow is working fine. As soon as entry comes in SQS queue lambda process it. so far so good.
But I have a situation where I want to let the entries to stay in SQS for 3-4 days and then let a lambda process them.
So basically if I see that okey, I have 100 entries in my SQS Queue and it's been 4 days now. I want to let lambda drain them and run some logic. Is this possible, Kindly guide me?
I think disabling lambda is not the way to fulfil the requirement, as you will miss other messages too.
SQS is messaging service and when it integrated with Lambda you can just configure retry and process the message, keeping the message in SQS, not in user control but lambda do that by design.
Lambda polls the queue and invokes your function synchronously with an
event that contains queue messages. Lambda reads messages in batches
and invokes your function once for each batch. When your function
successfully processes a batch, Lambda deletes its messages from the
queue.
enter link description here
One solution that can work to deal with your query
But I have a situation where I want to let the entries to stay in SQS
for 3-4 days and then let a lambda process them.
You also need to decide which SQS should not be processed at the moment and push these message to DynamoDb and then process these message after 4 or 5 days base on Dynamo DB TTL that was added during insertion. You can follow below steps
Add property to SQS is_dynamodb to identify the message that should not be processed at the moment
Push such message to DynamoDB
Add TTL during insertion
Check event in Lambda function that stream from DynamoDb is removed not insertion
Process messages if the event is Removed
I'd like to execute a lambda function with multiple data, only after a fixed amount of data is gathered. The fixed amount would be, for example, to consider only a specific amount of messages, or messages that are sent in a specific temporal range.
I thought to solve this problem using an SQS, on which I write the messages, and using a polling to check the SQS status. But I don't like this solution, because I'd like to trigger the lambda instantly when the criteria is matched (for example: elapsed time from the first message sent, or a fixed amount of messages)
The ideal would be to send all the messages gathered, for example, after 1 minute after the first message arrives.
To be clear:
First message arrives in the queue
From now on starts a timer (e.g 1 min)
The timer ends and It will trigger the lambda with all the messages gathered till now
Moreover, I'd like to handle different queues in parallel, based on different ids
Is there an elegant way to do so?
I have already in place a system that works with sequential lambda, that handles all the process per single message
Unfortunately, it's not an easy task to do on AWS Lambda (we have a similar use case).
SQS or Kinesis data stream as a trigger can be helpful, but have several limitations:
SQS will be pulled by AWS Lambda in a very high frequency. You will have to add a concurrency limit to your lambda to make it get triggered by more than a single item. And the maximum batch size is just 10.
The base rate for Kinesis trigger is one per second for each shard, and cannot be changed.
Aggregating records between different invocations is not a good idea because you never know if the next invocation will start on a different container so they will get lost.
Kinesis Firehose can be helpful, as you can configure max batch size and max time range for sending a new batch. You can configure it to write to an S3 bucket and configure a lambda to be triggered by new created files.
Make sure that if you use a Kinesis data stream as the source of a Kinesis firehose, the data from each shard of the data stream is seperately batched in the Firehose (this is not documented in AWS).
You can do this in a few ways. I'd do it like this:
Have the queue be an event source for a lambda function
That lambda function can: trigger a state machine OR not do anything. It triggers the state machine if there isn't one currently triggered (meaning we're in that 1 minute range).
The state machine has the following steps:
1 minute wait
Does it's processing
I have an AWS Lambda function which processes events from S3. I'd like to aggregate them before processing and let lambda process the batch.
This is depicted below:
Ideally, I'd like to be able to specify a batch size, and a timeout (say a single even, and then nothing for 5 sec, I'd like to send an 1-event batch).
Is there an idiomatic way to do it using Lambda or other AWS services?
There are a few things you can do:
1. Make upstream do the aggregation:
Make publishing the publisher's responsibility, and get the publisher to give you one event per group of objects to process. This works well if the publisher is already working in batches.
2. Insert your own aggregation step:
Trigger on each event.
Store the event somewhere.
If enough events have been stored, empty the store and pass all the contents to the processing step.
This works well if your processing step is much more expensive per event than just handling the event. Often, this can take the form of {aggregating lambda} -> {processing batch job}, since Lambda isn't great for very expensive processing.
3. Do aggregation on a time basis:
Send your events to an SQS queue.
Trigger on a timer (e.g. Cloudwatch events).
When triggered, empty the queue and process everything in it. If it's too much to process in a single invocation, immediately trigger an additional lambda.
This works well if processing is fairly cheap, and you want to minimize your number of Lambda invocations. The trigger schedule (how long you wait in between invocations) is determined by weighing how long you're willing to wait to process an event against how many invocations you're willing to pay for. Things to watch out for: 1. if you get no events at all, you will still be invoking your Lambda, and 2. if you get events faster than they can be processed, your queue will grow more and more and your processing will fall further and further behind.
I think you can achieve the batch operation by setting SQS queue as destination for S3 notification. Let's say you want to specify a batch size of 20, all your S3 events are going to SQS. You would create a CloudWatch rule to trigger a Lambda when your SQS have 20 items. Your Lambda would poll SQS for the batch of 20 items and process them.
You can also set SQS triggers but it has a limit of max batch size 10.
I have a Lambda function that’s triggered by a PUT to an S3 bucket.
I want to limit this Lambda function so that it’s only running one instance at a time – I don’t want two instances running concurrently.
I’ve had a look through the Lambda configuration and docs, but I can’t see anything obvious. I can about writing my own locking system, but it would be nice if this was already a solved problem.
How can I limit the number of concurrent invocations of a Lambda?
AWS Lambda now supports concurrency limits on individual functions:
https://aws.amazon.com/about-aws/whats-new/2017/11/set-concurrency-limits-on-individual-aws-lambda-functions/
I would suggest you to use Kinesis Streams (or alternatively DynamoDB + DynamoDB Streams, which essentially have the same behavior).
You can see Kinesis Streams as as queue. The good part is that you can use a Kinesis Stream as a Trigger to you Lambda function. So anything that gets inserted into this queue will automatically be passed over to your function, in order. So you will be able to process those S3 events one by one, one Lambda execution after the other (one instance at a time).
In order to do that, you'll need to create a Lambda function with the simple purpose of getting S3 Events and putting them into a Kinesis Stream. Then you'll configure that Kinesis Stream as your Lambda Trigger.
When you configure the Kinesis Stream as your Lambda Trigger I suggest you to use the following configuration:
Batch size: 1
This means that your Lambda will be called with only one event from Kinesis. You can select a higher number and you'll get a list of events of that size (for example, if you want to process the last 10 events in one Lambda execution instead of 10 consecutive Lambda executions).
Starting position: Trim horizon
This means it'll behave as a queue (FIFO)
A bit more info on AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AWS Lambda.
I hope this helps anyone with a similar problem.
P.S. Bear in mind that Kinesis Streams have their own pricing. Using DynamoDB + DynamoDB Streams might be cheaper (or even free due to the non-expiring Free Tier of DynamoDB).
No, this is one of the things I'd really like to see Lambda support, but currently it does not. One of the problems is that if there were a lot of S3 PUT operations happening AWS would have to queue up all the Lambda invocations somehow, and there is currently no support for that.
If you built a locking mechanism into your Lambda function, what would you do with the requests you don't process due to a lock? Would you just throw those S3 notifications away?
The solution most people recommend is to have S3 send the notifications to an SQS queue, and then have your Lambda function scheduled to run periodically, like once a minute, and check if there is an item in the queue that needs to be processed.
Alternatively, have S3 send the notifications to SQS and just have a t2.nano EC2 instance with a single-threaded service polling the queue.
I know this is an old thread, but I ran across it trying to figure out how to make sure my time sequenced SQS messages were processed in order coming out of a FIFO queue and not getting processed simultaneously/out-of-order via multiple Lambda threads running.
Per the documentation:
For FIFO queues, Lambda sends messages to your function in the order
that it receives them. When you send a message to a FIFO queue, you
specify a message group ID. Amazon SQS ensures that messages in the
same group are delivered to Lambda in order. Lambda sorts the messages
into groups and sends only one batch at a time for a group. If your
function returns an error, the function attempts all retries on the
affected messages before Lambda receives additional messages from the
same group.
Your function can scale in concurrency to the number of active message
groups.
Link: https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html
So essentially, as long as you use a FIFO queue and submit your messages that need to stay in sequence with the same MessageGroupID, SQS/Lambda automatically handles the sequencing without any additional settings necessary.
Have the S3 "Put events" cause a message to be placed on the queue (instead of involving a lambda function). The message should contain a reference to the S3 object. Then SCHEDULE a lambda to "SHORT POLL the entire queue".
PS: S3 events can not trigger a Kinesis Stream... only SQS, SMS, Lambda (see http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html#supported-notification-destinations). Kinesis Stream are expensive and used for real-time event handling.
I have a Lambda function which spawns up a number of worker Lambda functions and each worker function posts to an SQS queue if there is any error.
There is a UI which long-polls the SQS queue for any errors. My problem is that how do I know when the processing is completed?
Since the first Lambda function (which spawns the worker Lambda functions) runs asynchronously, that is, it splits the data across the worker Lambda functions and then returns/finishes. I need to have an ability to be able to figure out when the processing is completed.
The reason why I'm only posting the errors to the SQS queue and not success is because if I have 10,000 objects to process and there are 9,000 successes, I would have to do quite many ReceiveMessages (an SQS API call to retrieve items from the SQS queue) in the UI/client side (probably around 900 calls if I specify the maximum number of messages to receive as 10 per call. You cannot retrieve more than 10 messages from the queue per a call.)
How can I overcome this design issue?
I'm using API Gateway, AWS Lambda and Dynamo DB (feel free to suggest any other Amazon/AWS service that could make this easier to get the job done.)