SQS FIFO Lambda event source behaviour when MessageGroupId specified - amazon-web-services

I see that SQS FIFO queues were recently added as a Lambda event source.
I'm trying to understand how FIFO messages with the MessageGroupId parameter specified will be handled by Lambda.
From the boto3 SQS docs -
Messages that belong to the same message group are processed in a FIFO manner (however, messages in different message groups might be processed out of order). To interleave multiple ordered streams within a single queue, use MessageGroupId values (for example, session data for multiple users). In this scenario, multiple consumers can process the queue, but the session data of each user is processed in a FIFO fashion.
The FIFO behaviour, coupled with the phrase "interleave multiple orders streams within a single queue", suggests to me that messages with a specific MessageGroupId will be handled by Lambda with a concurrency of one (ie no parallelisation because of FIFO), but that you will get concurrent Lambda execution across different MessageGroupId values.
Is this a correct interpretation ?

After a number of experiments and talking with an AWS moderator, I concluded that this wasn't possible - you can't guarantee that an event passed to a Lambda from an SQS FIFO queue will only contain a single MessageGroupId :-(
Can't help but feel like the SQS team have missed a trick here though. If you guarantee the above then it must limit Lambda concurrency to one, in order to guarantee the FIFO principle :-(

Related

Why does SQS FIFO queue with lambda trigger cannot guarantee only once delivery?

I came across an AWS article where it is mentioned only once delivery of a message is not guaranteed when the FIFO queue is used with a lambda trigger.
Amazon SQS FIFO queues ensure that the order of processing follows the message order within a message group. However, it does not guarantee only once delivery when used as a Lambda trigger. If only once delivery is important in your serverless application, it’s recommended to make your function idempotent. You could achieve this by tracking a unique attribute of the message using a scalable, low-latency control database like Amazon DynamoDB.
I am more interested in knowing the reason behind this behaviour when it comes to lambda trigger. I assume, with standard queues only once delivery is not guaranteed since SQS stores messages in multiple servers for redundancy and high availability and there is a chance of same message getting delivered again while multiple lambdas polling the queue.
Can someone please explain the reason for the same behaviour in FIFO queue with lambda trigger or the working internally?
By default lambda polls synchronously from SQS. So when lambda processes messages from the queue they become invisible i.e Visibility timeout gets triggered till the lambda either finishes the process to eventually delete them from the queue or fails to retry them again.
That's why lambda cannot guarantee exactly-once delivery since there can be a retry in lambda cause of timeout (15min max) or other code dependency errors.
To prevent this you either make your process idempotent or use Batch response to delete the message even in case of failure.

How to process messages from SQS queues by groups

I currently have a SQS queue that triggers a lambda function, but SQS have a 120k limit on in-flight messages, meaning that only 120k messages can be processed by the lambda consumer. That works fine for the most case. But imagine I have clients A and B, if A sends 500k to the queue and after that B sends only 1 message, B needs to wait all messages in the queue to be processed. It does not make sense to create one queue for each client (at least not manually). How can I, for example, process messages from clients in a round-robin manner, give everyone the same time to process their tasks?
I have been looking into Kinesis data stream, but not so sure if this will solve my problem. Would I need to create an SNS that redirects to a lambda responsible to round-robin the message to queues that do the same thing but this way I would avoid sending client B to same queue as client A?
You can check out SQS Fifo Queues. Fifo Queues have a concept of grouping using the group Id attribute. all messages in one group are processed in order. The limitations are you can only process one message per group at any given moment.
The above concept is the same as having one queue per tenant, just using one dedicated Queue.
Refer to the AWS Docs
https://aws.amazon.com/blogs/compute/solving-complex-ordering-challenges-with-amazon-sqs-fifo-queues/
Article on ordering inside SQS
https://aws.amazon.com/blogs/compute/solving-complex-ordering-challenges-with-amazon-sqs-fifo-queues/
Note:
FIFO queues are more expensive than standard queues, please refer to the pricing guide before making any decisions.

How to implement Amazon SQS (fifo)-lambda with message processing EXACTLY ONE BY ONE

I'm having a use case where I have an Amazon SQS fifo queue with lambda function. I need to make sure that fifo triggers the lambda only when the previous lambda execution is completed (also the events come in order). As from aws docs, fifo supports exactly once processing but it does not mention anywhere that it would not push more event on lambda untill the first message is completely processed.
I need to make sure that the next message is processed only when the previous message is completely processed by the lambda function.
Is there are way to ensure that message 2 is only processed by lambda when message 1 is completely processed by lambda?
fifo supports exactly once processing but it does not mention anywhere
that it would not push more event on lambda untill the first message
is completely processed.
SQS never pushes anything anywhere. You have to poll SQS for messages. When you configure Lambda integration with SQS Lambda is actually running a process behind the scenes to poll SQS for you.
AWS FIFO queues allow you to force messages to be processed in order by specifying a Message Group ID. When you specify the same Message Group ID for multiple messages, then the FIFO queue will only make one of those messages available at a time (in first-in-first-out) order. Only after the first message is removed from the queue is the second message made available, etc...
In addition to this, you should configure AWS Lambda SQS integration with a Batch Size of 1, so that it doesn't try to wait for multiple messages to be available before processing. And you could configure the Reserved Concurrency on the Lambda function to 1, as mentioned in the other answer, so that only one instance of the Lambda function can be running at a time.
It is actually pretty easy to do this. It is not clarified, since it will by default simply use up the available account concurrency and handle as many messages in parallel as is possible.
You can influence this by setting the reserved concurrency for the lambda function to 1. This will ensure no more than 1 lambda function will be executed at the same time.

AWS SQS redrive policy, which end of the queue do messges go to

In an AWS SQS standard queue you can set a redrive policy which will cause messages to be retried if there is a failure where by the message is not deleted from the queue.
In my case i have > 1,000,000 messages on the queue which take a couple of hours to process. When a message fails and is put back on the queue will it be put to the end of the queue or the front. Will the messages get retried in a minute or two or in two or three hours when all the other messages have been processed?
There is no guarantee which order messages are returned, so once you return a message it could be retried immediately, when all the others are processed, or anywhere in the middle - there may be some undocumented common patterns for when retries happen, but its not something you can count on or design around.
Q: Does Amazon SQS provide message ordering?
Yes. FIFO (first-in-first-out) queues preserve the exact order in
which messages are sent and received. If you use a FIFO queue, you
don't have to place sequencing information in your messages. For more
information, see FIFO Queue Logic in the Amazon SQS Developer Guide.
Standard queues provide a loose-FIFO capability that attempts to
preserve the order of messages. However, because standard queues are
designed to be massively scalable using a highly distributed
architecture, receiving messages in the exact order they are sent is
not guaranteed.
https://aws.amazon.com/sqs/faqs/

AWS SQS standard queue or FIFO queue when message can not be duplicated?

We plan to use AWS SQS service to queue events created from web service and then use several workers to process those events. One event can only be processed one time. According to AWS SQS document, AWS SQS standard queue can "occasionally" produce duplicated message but with unlimited throughput. AWS SQS FIFO queue will not produce duplicated message but with throughput limitation of 300 API calls per second (with batchSize=10, equivalent of 3000 messages per second). Our current peak hour traffic is only 80 messages per second. So, both are fine in terms of throughput requirement. But, when I started to use AWS SQS FIFO queue, I found that I need to do extra work like providing extra parameters
"MessageGroupId" and "MessageDeduplicationId" or need to enable "ContentBasedDeduplication" setting. So, I am not sure which one is a better solution. We just need the message not duplicated. We don't need the message to be FIFO.
Solution #1:
Use AWS SQS FIFO queue. For each message, need to generate a UUID for "MessageGroupId" and "MessageDeduplicationId" parameters.
Solution #2:
Use AWS SQS FIFO queue with "ContentBasedDeduplcation" enabled. For each message, need to generate a UUID for "MessageGroupId".
Solution #3:
Use AWS SQS standard queue with AWS ElasticCache (either Redis or Memcached). For each message, the "MessageId" field will be saved in the cache server and checked for duplication later on. Existence means this message has been processed. (By the way, how long should the "MessageId" exists in the cache server. AWS SQS document does not mention how far back a message could be duplicated.)
You are making your systems complicated with SQS.
We have moved to Kinesis Streams, It works flawlessly. Here are the benefits we have seen,
Order of Events
Trigger an Event when data appears in stream
Deliver in Batches
Leave the responsibility to handle errors to the receiver
Go Back with time in case of issues
Buggier Implementation of the process
Higher performance than SQS
Hope it helps.
My first question would be that why is it even so important that you don't get duplicate messages? An ideal solution would be to use a standard queue and design your workers to be idempotent. For e.g., if the messages contain something like a task-ID and store the completed task's result in a database, ignore those whose task-ID already exists in DB.
Don't use receipt-handles for handling application-side deduplication, because those change every time a message is received. In other words, SQS doesn't guarantee same receipt-handle for duplicate messages.
If you insist on de-duplication, then you have to use FIFO queue.