Can a SQS consumer receives duplicated message during visibility timeout of message? - amazon-web-services

I am building an application using SQS standard queue, which will trigger Lambda function when a new message is available in the queue. I am aware that my Lambda function might receive a duplicated message which it has processed before:
On rare occasions, one of the servers that stores a copy of a message might be unavailable when you receive or delete a message. If this occurs, the copy of the message isn't deleted on that unavailable server, and you might get that message copy again when you receive messages.
Based on my understanding, the Lambda service will launch multiple instances of my Lambda function, and that different instances of my Lambda function might process the same message. But according to the developer guide,
Immediately after a message is received, it remains in the queue. To prevent other consumers from processing the message again, Amazon SQS sets a visibility timeout, a period of time during which Amazon SQS prevents other consumers from receiving and processing the message.
I would like to know whether one instance of my Lambda function might received a duplicated message while another instance of my Lambda function is still processing the message. That is, I want to know whether it is possible for one Lambda instance to receive a message while the message is in visibility timeout.
Thank you in advance.

After reading these docs
No if its still processing the message then it can't but if the processing is failed then it might.
In my view it depends on the number of messages and batch.
Lambda reads messages in batches and invokes your function once for each batch. When your function successfully processes a batch, Lambda deletes its messages from the queue.
This establishes the fact the concurrency of lambda depends on batch.
When Lambda reads a batch, the messages stay in the queue but are hidden for the length of the queue's visibility timeout. If your function successfully processes the batch, Lambda deletes the messages from the queue. By default, if your function encounters an error while processing a batch, all messages in that batch become visible in the queue again
This established the fact the all messages in the batch should be processed otherwise they will be visible in the queue.
For standard queues, Lambda uses long polling to poll a queue until it becomes active. When messages are available, Lambda reads up to five batches and sends them to your function. If messages are still available, Lambda increases the number of processes that are reading batches by up to 60 more instances per minute. The maximum number of batches that an event source mapping can process simultaneously is 1,000.
This establishes that a single lambda can read up to 5 batches.
Combining all the 3.
If it is upto 5 batch a single lambda can process 5 batches, if it is more than 5 batch multiple lambda will be invoked and in this time if the previous lambda couldn't process the message and it reappears in the queue and the new invoked lambda will be able to read the previous messages which was not be ables to processed.
This blog also explains the same better than docs https://data.solita.fi/lessons-learned-from-combining-sqs-and-lambda-in-a-data-project/

Related

How does AWS Lambda determine if messages are still in SQS queue?

When using AWS Lambda with a SQS queue (as event source), it is written in the doc
If messages are still available, Lambda increases the number of
processes that are reading batches by up to 60 more instances per
minute.
https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html
My question here is how does the Lambda service determine "If messages are still available" ?
Answering the "how" question in a slightly different way:
Behind the scenes, Lambda operates a "State Manager" control-plane service that discovers work from the queue. State Manager also manages scaling of the fleet of "Poller" workers that do the actual retrieving, batching, invoking, and deleting.
These implementation details are from the Event Source Mapping section of the re:Invent 2022 video A closer look at AWS Lambda (SVS404-R). Here is a screenshot:
One of the calls to the SQS API is to get queue attributes (Java API, others similar). This returns a response and one of the attributes of the response is "approximate number of messages". With this you or AWS can determine about how many messages are in the queue.
From this, AWS can determine if it's worth spinning up additional instances. You too can get this information from the queue.
I imagine it uses the ApproximateNumberOfMessagesVisible metric on the SQS queue to check how many messages are available, and uses that number, plus your batch size configuration, to determine how many more Lambda instances your function needs to be scaled out to.
I believe the documentation refers to Lambda polling the queue to know whether there are still messages. Read more about it here.
Lambda polls the queue and invokes your Lambda function synchronously
with an event that contains queue messages. Lambda reads messages in
batches and invokes your function once for each batch. When your
function successfully processes a batch, Lambda deletes its messages
from the queue.
Event Source Mapping:
Lambda only sees messages that are visible, via the visibility timeout setting on the SQS queue. This is to prevent other queue consumers processing the message. I believe as an event-source, Lambda receives messages from the SQS queue, via being mapped to it.
As per the documentation you shared,for standard queues, Long Polling is in effect. Long polling basically waits for a certain amount of time to verify if there is a message in the queue. refer to the following docs :
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-short-and-long-polling.html
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/confirm-queue-is-empty.html

How to implement Amazon SQS (fifo)-lambda with message processing EXACTLY ONE BY ONE

I'm having a use case where I have an Amazon SQS fifo queue with lambda function. I need to make sure that fifo triggers the lambda only when the previous lambda execution is completed (also the events come in order). As from aws docs, fifo supports exactly once processing but it does not mention anywhere that it would not push more event on lambda untill the first message is completely processed.
I need to make sure that the next message is processed only when the previous message is completely processed by the lambda function.
Is there are way to ensure that message 2 is only processed by lambda when message 1 is completely processed by lambda?
fifo supports exactly once processing but it does not mention anywhere
that it would not push more event on lambda untill the first message
is completely processed.
SQS never pushes anything anywhere. You have to poll SQS for messages. When you configure Lambda integration with SQS Lambda is actually running a process behind the scenes to poll SQS for you.
AWS FIFO queues allow you to force messages to be processed in order by specifying a Message Group ID. When you specify the same Message Group ID for multiple messages, then the FIFO queue will only make one of those messages available at a time (in first-in-first-out) order. Only after the first message is removed from the queue is the second message made available, etc...
In addition to this, you should configure AWS Lambda SQS integration with a Batch Size of 1, so that it doesn't try to wait for multiple messages to be available before processing. And you could configure the Reserved Concurrency on the Lambda function to 1, as mentioned in the other answer, so that only one instance of the Lambda function can be running at a time.
It is actually pretty easy to do this. It is not clarified, since it will by default simply use up the available account concurrency and handle as many messages in parallel as is possible.
You can influence this by setting the reserved concurrency for the lambda function to 1. This will ensure no more than 1 lambda function will be executed at the same time.

Trigger AWS Lambda once SQS fifo queue is not empty

I got an SQS FIFO queue, I want to know if there is a way to trigger an AWS lambda once the queue is not empty.
For example, if my queue is empty and a new message enters trigger lambda, but if the queue is already containing at least one message and a new message enters no lambda will be triggered.
Is it possible?
There is an Amazon CloudWatch metric called ApproximateNumberOfMessagesVisible that shows the number of messages in the queue. The documentation says that "For FIFO queues, the result is exact."
You could create a CloudWatch Alarm that triggers when the number of messages drops to zero for a period of time. The Alarm can send a message to an Amazon SNS topic. If you subscribe your AWS Lambda function to this topic, it will be triggered when the queue is empty for the specified duration (eg over a period of 5 minutes). It will only be triggered when the alarm enters the 'Alarm' state and it will not trigger again until the alarm exits the state and enters the state again.
Important: When configuring the alarm, go to the Additional configuration and set Missing data treatment to "Treat missing data as bad (breaching threshold)". This is required because the SQS queue will not send metrics if the queue is empty. (Many queues are empty, so this saves a lot of metric storage!)
Unusual pattern.
You could perhaps set the Lambda function concurrency to 1, meaning that only one invocation can happen concurrently, and then have your Lambda function kick off your workflow and then remove the actual SQS event trigger that caused the Lambda to be invoked in the first place. That should prevent further invocations. Add the SQS event trigger back when you're done to get ready for the next batch of messages.
You may set a concurrent execution limit to 1 to make sure only 1 lambda instance reads the queue. But I'm not sure this is something you may want to do. Lambda can read 10 messages at most on single execution and if your queue gets too many incoming messages then your message consumption process may take too much time.

How can I ensure that a downstream API called by Lambda integrated with SQS is called at least 2 times before message going to DLQ?

I have lambda using SQS events as inputs. The SQS queue also has a DLQ.
The lambda function invokes a downstream Restful API (call this operation DoPostToAPI())
I need to guarantee that the lambda function attempts to call DoPostToAPI() at least 2 times (before message goes to DLQ)
What configuration of Lambda Retries and SQS Redrive policy would I need to set in order to accomplish the above requirement?
I need to be 100% certain that messages that arrive on the DLQ only arrive because they have attempted to been sent to downstream API DoPostToAPI() 2 times, and that messages dont arrive in DLQ for any other reason, if possible.
To me, it makes sense that messages should only arrive on the DLQ if the operation was attempted, and not for other reasons (i.e. I dont want messages to arrive on DLQ purely because of throttling, since the DoPostToAPI() should be attempted first before sending to DLQ) Why would I want messages on DLQ if the lambda function operation wasnt even attempted? In order words, I need the lambda operation to be guaranteed to be invoked before item moves to DLQ.
Can I get some help on this? Is it possible to guarantee that messages on the DLQ have arrived because of failed DoPostToAPI() api calls? Or is it (more unfortunate) possible that messages arrive on DLQ for reasons other than failed calls to downstream API?
From what I have read online so far, its possible that lambda , after doing receive on SQS message and moving the message to invisibile on the queue, could run into throttling issues and re-attempt the lambda invocation. But if it runs into lambda throttling again, it could end up back on main queue, which if it reaches its max receive count, could place the message on the DLQ without the lambda having been attempted at all. Is this correct?
For simplicity lets imagine the following inputs
SQSQueue1
SQSQueue1DLQ
LambdaFunction1 --> ServiceClient1.DoPostToAPI()
What is the interplay between the lambda "maximum_retry_attempts" and the SQS redrive_policy "maxReceiveCount"
In order to ensure your lambda attempts retries when using SQS, You only need set the SQS property
maxReceiveCount
This value controls how many lambda invocations will be attempted for a given batch before a message goes to the Dead Letter queue.
Unfortunately, the lambda property
maximum_retry_attempts
Does not apply for lambda functions using SQS as function event trigger.

AWS lambda missing few SQS event miss leading to message in flight

My Lambda configuration is as below
Lambda Concurrency is set to 50
And SQS trigger batch size is set to 1
Issue:
When my queue is flooded with 200+ messages, some of the sqs triggers are missed and the message from the queue goes to inflight state without even triggering the lambda. This is adding a latency in processing by the timeout value set for lambda as I need to wait for the message to come out of flight for it to be reprocessed.
Any inputs will be highly appreciated.
SQS is integrated with Lambda through event source mappings.
Thanks to the mappings, the Lambda service is long polling the SQS queue, and invoking your function on your behalf. What's more it automatically removes the messages from the queue if your Lambda successfully processes them.
Since you want to process 200+ messages, and you set concurrency to 50 with batch size of 1, it means that you can process only 50 messages in parallel. The rest will be throttled. When this happens:
If your function is throttled, returns an error, or doesn't respond, the message becomes visible again. All messages in a failed batch return to the queue, so your function code must be able to process the same message multiple times without side effects.
To rectify the issue, the following two immediate actions can be considered:
increase concurrency of your function to 200 or more.
increase batch size to 10. With the batch size and concurrency of 50, you can process 500 (10 x 50) messages concurrently.
Also since you are heavily throttled, setting up a dead-letter queue can be useful. The DLQ helps captures problematic or missed messages from the queue, so that you can process them later or inspect:
If a message fails to be processed multiple times, Amazon SQS can send it to a dead-letter queue. When your function returns an error, Lambda leaves it in the queue. After the visibility timeout occurs, Lambda receives the message again. To send messages to a second queue after a number of receives, configure a dead-letter queue on your source queue.