I have a requirement where at a single point of time each SQS messages in the same SQS queue should trigger separate lambda instances
For Example the SQS queue has n number of messages at a given time then expectation is , it should trigger n instance of lambdas
Message_1 --> Lambda_function_A_instance_1
Message_2 --> Lambda_function_A_instance_2
Message_n --> Lambda_function_A_instance_n
Use case :
Need to keep separate instance of lambda functions for each messages in order to process data files which are > 10G..
So each message will invoke new lamnda instance of the same function
To configure Lambda to read only one message from SQS at a time, you need to set the BatchSize parameter to 1 when creating the SQS event source mapping. This will ensure that only one message is read from the queue at a time.
Additionally, you can set the VisibilityTimeout parameter to a value that is greater than the Lambda function's timeout value to ensure that the message is not processed multiple times.
Related
I am building an application using SQS standard queue, which will trigger Lambda function when a new message is available in the queue. I am aware that my Lambda function might receive a duplicated message which it has processed before:
On rare occasions, one of the servers that stores a copy of a message might be unavailable when you receive or delete a message. If this occurs, the copy of the message isn't deleted on that unavailable server, and you might get that message copy again when you receive messages.
Based on my understanding, the Lambda service will launch multiple instances of my Lambda function, and that different instances of my Lambda function might process the same message. But according to the developer guide,
Immediately after a message is received, it remains in the queue. To prevent other consumers from processing the message again, Amazon SQS sets a visibility timeout, a period of time during which Amazon SQS prevents other consumers from receiving and processing the message.
I would like to know whether one instance of my Lambda function might received a duplicated message while another instance of my Lambda function is still processing the message. That is, I want to know whether it is possible for one Lambda instance to receive a message while the message is in visibility timeout.
Thank you in advance.
After reading these docs
No if its still processing the message then it can't but if the processing is failed then it might.
In my view it depends on the number of messages and batch.
Lambda reads messages in batches and invokes your function once for each batch. When your function successfully processes a batch, Lambda deletes its messages from the queue.
This establishes the fact the concurrency of lambda depends on batch.
When Lambda reads a batch, the messages stay in the queue but are hidden for the length of the queue's visibility timeout. If your function successfully processes the batch, Lambda deletes the messages from the queue. By default, if your function encounters an error while processing a batch, all messages in that batch become visible in the queue again
This established the fact the all messages in the batch should be processed otherwise they will be visible in the queue.
For standard queues, Lambda uses long polling to poll a queue until it becomes active. When messages are available, Lambda reads up to five batches and sends them to your function. If messages are still available, Lambda increases the number of processes that are reading batches by up to 60 more instances per minute. The maximum number of batches that an event source mapping can process simultaneously is 1,000.
This establishes that a single lambda can read up to 5 batches.
Combining all the 3.
If it is upto 5 batch a single lambda can process 5 batches, if it is more than 5 batch multiple lambda will be invoked and in this time if the previous lambda couldn't process the message and it reappears in the queue and the new invoked lambda will be able to read the previous messages which was not be ables to processed.
This blog also explains the same better than docs https://data.solita.fi/lessons-learned-from-combining-sqs-and-lambda-in-a-data-project/
I'm having a use case where I have an Amazon SQS fifo queue with lambda function. I need to make sure that fifo triggers the lambda only when the previous lambda execution is completed (also the events come in order). As from aws docs, fifo supports exactly once processing but it does not mention anywhere that it would not push more event on lambda untill the first message is completely processed.
I need to make sure that the next message is processed only when the previous message is completely processed by the lambda function.
Is there are way to ensure that message 2 is only processed by lambda when message 1 is completely processed by lambda?
fifo supports exactly once processing but it does not mention anywhere
that it would not push more event on lambda untill the first message
is completely processed.
SQS never pushes anything anywhere. You have to poll SQS for messages. When you configure Lambda integration with SQS Lambda is actually running a process behind the scenes to poll SQS for you.
AWS FIFO queues allow you to force messages to be processed in order by specifying a Message Group ID. When you specify the same Message Group ID for multiple messages, then the FIFO queue will only make one of those messages available at a time (in first-in-first-out) order. Only after the first message is removed from the queue is the second message made available, etc...
In addition to this, you should configure AWS Lambda SQS integration with a Batch Size of 1, so that it doesn't try to wait for multiple messages to be available before processing. And you could configure the Reserved Concurrency on the Lambda function to 1, as mentioned in the other answer, so that only one instance of the Lambda function can be running at a time.
It is actually pretty easy to do this. It is not clarified, since it will by default simply use up the available account concurrency and handle as many messages in parallel as is possible.
You can influence this by setting the reserved concurrency for the lambda function to 1. This will ensure no more than 1 lambda function will be executed at the same time.
I have a lambda setup that has event source configured with 5 different SQS queues. Now, if the batch size of the lambda is configured to be 10, will the 10 records in SQSEvent in the lambda handler will be from the same queue or can the 10 records in the batch be from any of the 5 queues ?
The behavior is undocumented, but it's almost certainly the case that the Lambda service will batch events from one, and only, queue in a single Lambda invocation.
That said, if it's critical to your application that you be able to distinguish one queue source from another, then either:
create one Lambda handler per queue (it could simply call your common handler function with an indicator of which queue was the source), or
check the value of the eventSourceARN in each record in the event
I got an SQS FIFO queue, I want to know if there is a way to trigger an AWS lambda once the queue is not empty.
For example, if my queue is empty and a new message enters trigger lambda, but if the queue is already containing at least one message and a new message enters no lambda will be triggered.
Is it possible?
There is an Amazon CloudWatch metric called ApproximateNumberOfMessagesVisible that shows the number of messages in the queue. The documentation says that "For FIFO queues, the result is exact."
You could create a CloudWatch Alarm that triggers when the number of messages drops to zero for a period of time. The Alarm can send a message to an Amazon SNS topic. If you subscribe your AWS Lambda function to this topic, it will be triggered when the queue is empty for the specified duration (eg over a period of 5 minutes). It will only be triggered when the alarm enters the 'Alarm' state and it will not trigger again until the alarm exits the state and enters the state again.
Important: When configuring the alarm, go to the Additional configuration and set Missing data treatment to "Treat missing data as bad (breaching threshold)". This is required because the SQS queue will not send metrics if the queue is empty. (Many queues are empty, so this saves a lot of metric storage!)
Unusual pattern.
You could perhaps set the Lambda function concurrency to 1, meaning that only one invocation can happen concurrently, and then have your Lambda function kick off your workflow and then remove the actual SQS event trigger that caused the Lambda to be invoked in the first place. That should prevent further invocations. Add the SQS event trigger back when you're done to get ready for the next batch of messages.
You may set a concurrent execution limit to 1 to make sure only 1 lambda instance reads the queue. But I'm not sure this is something you may want to do. Lambda can read 10 messages at most on single execution and if your queue gets too many incoming messages then your message consumption process may take too much time.
I've observed an abnormal (well, in my POV) feature, where when I setup SQS to trigger a Lambda, when new messages arrive, lambdas get triggered with more than 1 record/message inside its event body.
Full setup is S3 (PutObjectEvent) -> SNS topic -> SQS -> Lambda.
The abnormal behaviour is that for example, let's say I put 15 objects inside S3, which then forwards an event to SNS per each object, which then I can observe, SQS gets populated with 15 messages. However, when Lambdas start triggering, out of those 15 messages, only 11 Lambdas trigger, some of them containing more than 1 record/message inside its event body.
I've scoured the AWS documentation, but haven't found a concrete answer. Please note, these Lambdas do NOT poll SQS or fail or keep retrying. They execute perfectly fine, its just that inspected event body shows more than 1 record inside of it.
Look at the sample event data for an SQS Lambda message here. The message is an array of records, which directly implies that there may be more than one SQS record in the message.
The documentation on SQS Lambda integration also clearly states that the Batch Size setting controls how many records a Lambda function may receive from SQS in a single call, with the default being 10. If you only want your Lambda functions to receive one message at a time you need to modify the Batch Size setting to be 1.