SQS Queues/ Visibility Timeouts/ message groups - amazon-web-services

I am new to AWS. I am trying to understand SQS here. I have gone over a few trainings also but I still could not get some answers there in the discussion forum. I am re-iterating my question here. Note that I know that a few questions below have obvious answers and are therefore more of a rhetoric. My confusion stems from the fact that my understanding of the topic at present leads me to give conflicting answers to the follow up questions that spring up in my mind after the obvious known ones and takes away the confidence of whatever I think I understand alright.
If I have a Standard queue named MyQueue and there are 100 messages, and if there are 2 completely separate applications (as consumers; note they are not a consumer group of the same applications like you have in Kafka; instead they are 2 separate applications) for this queue, then the consumers may receive
(i) out of order messages and
(ii) multiple copies of the messages
Both of my applications do not need to bother about the order of the messages. But for the sake of the question lets say we have a perfect order of delivery, no multiple copies and no network issues and both consumers finish their processing if each message well within the Visibility Timeout window.
Q1: Will both the applications individually receive 100 messages each or will a message that is made available to one consumer won't ever be delivered to the other consumer? If the latter is true ( with no network issues, out of order delivery, multiple deliveries), then:
Is SNS-SQS fanout the way to ensure that the same message is processed by multiple consumers?
Is the consumer supposed to delete the message from the queue after processing? So, if a message is picked up by a processor, and it goes into visibility timeout while the processing happens and then is not deleted by the consumer even after the processing is complete before the visibility timeout, then will the message appear back for other consumers possibly to consume it? If that is the case, then won't the same thing apply to a FIFO queue as well?
Other Questions:
Q2: Is the Visibility timeout applicable to both Standard Queue and FIFO Queue? If it is also applicable to FIFO Queue which promises exactly once delivery, then, if the Visibility Timeout appears before the consumer ends processing a message, then it reappears in the queue only to be delivered again thereby going back to at least once processing. Can someone confirm?
Q3: What are multiple message Groups within a FIFO Queue? Are they like partitions of a queue?

Q: Will both the applications individually receive 100 messages each?
A consumer can request up to 10 messages per API call. These will become 'invisible' and will not be provided to other consumers. (Well, there actually is a small possibility that a message might be provided to multiple consumers. It is rare, but it can happen. If this is bad for your use-case, then you should track the messages in a database to ensure they are only processed once each.)
Q: Is SNS-SQS fanout the way to ensure that the same message is processed by multiple consumers?
It is very strange to want to want a single message consumed by 'multiple consumers'. The normal desire is to process each message once. If you do want a message processed by multiple consumers then, yes, you could send the message to SNS, which could then send it to multiple queues.
Q:Is the consumer supposed to delete the message from the queue after processing?
Yes. Amazon SQS does not know when a message is processed. The consumer must delete the message via the ReceiptHandle provided when the message was received. If a message times-out and another consumer receives it, SQS will provide a different ReceiptHandle so it knows which process requested the delete.
This also applies to FIFO queues.
Q: Is the Visibility timeout applicable to both Standard Queue and FIFO Queue?
Yes. If the visibility timeout expires, the message will be provided to another consumer. The "exactly once delivery" avoids the rare situation mentioned above when a message in a Standard queue might be provided more than once. However, if visibility times-out, even in a FIFO queue, then it will intentionally be visible on the queue again.
Q: What are multiple message Groups within a FIFO Queue? Are they like partitions of a queue?
A message group is a way of grouping messages that must be delivered in-order.
Let's say there are two message groups, A and B, and they send messages in this order: A1, B1, A2, B2
Message B1 can be provided even if A1 is not yet deleted. However, message A2 will not be provided until A1 is deleted. Think of them as 'mini-queues'. This allows processing of lots of messages are are unrelated, without having to wait for all previous messages to be deleted.
See: Using the Amazon SQS Message Group ID - Amazon Simple Queue Service

Q1: Will both the applications individually receive 100 messages each or will a message that is made available to one consumer won't ever be delivered to the other consumer?
Neither of these is quite accurate.
Standard queues never intentionally deliver a message more than once. It is possible that messages may occasionally be delivered more than once -- but this is the exception and is an artifact of the fact that SQS is a distributed system and situations could arise where, for example, the queue had a message stored in multiple replicas and the fact that a message was not known to all replicas due to an internal failure.
If a message is inadvertently delivered more than once, it could be to multiple consumers or the same consumer. The consumer "connections" to SQS are actually stateless, resetting each time a list of messages is delivered, so SQS does not have a sense of which consumer it delivered each message to.
Consumers delete their messages after processing, otherwise their visibilitt timeout expires and they are delivered again and again -- to whichever consumer the luck of the draw delivers them to, each time. As noted, SQS has no concept of consumer identity or state. (In high volume applications, a single consumer may actually have multiple connections to SQS, all receiving messages in parallel, because the network round-trips and cycle of receive/delete will otherwise limit a single consumer to a few hundred messages per second. Whether these connections are handled using asynchronous I/O, threads, etc., is unimportant to SQS, which doesn't care which consumer is on a given connection.)
If you want all messages sent to all consumers, you need fan-out from SNS to SQS.
Q2: Is the Visibility timeout applicable to both Standard Queue and FIFO Queue?
Yes. Because (noted above) the connection to SQS is not a persistent, stateful connection, SQS uses visibility timeout as the indication that a consumer has lost the message or failed ungracefully, so the message needs to be made accessible again. (Dead letter queues prevent this from happening endlessly, moving a message to a different queue, since repeated failures indicate a problem with a consumer, or a "poison pill" message.)
FIFO queues retain in-order delivery, here, and you could argue that they revert to "at least once" delivery, but the idea is that this should never happen. If it does, then your visibility timeout is too short or your consumer is crashing or otherwise misplacing messages.
Q3: What are multiple message Groups within a FIFO Queue?
Message groups allow FIFO queues to support in-order, parallel processing of groups messages whose ordering relative to each other across group boundaries doesn't matter. Messages are delivered in order, within each group.
If a FIFO queue, if all messages are sent with the same group ID, then only one consumer can be working at a time.
In-order delivery (simple illustration) means that message 2 will not be delivered to any consumer until message 1 has been received and deleted -- finished -- by a consumer. In order delivery includes all processing (not merely the initial "delivery"). Or if 20 messages in the queue have the same group ID and two consumers request 10 messages each, one consumer gets 10 and the other gets nothing -- yet -- because those second 10 messages have to be sequestered, until the first 10 have been processed (else we are no longer "in order").
In the 20 messages scenario, if 14 were in group A and 6 were in group B, one consumer would receive A1-A10, A11-A14 would be sequestered until A1-A10 were complete, but while the first consumer is busy, another consumer could have B1-B6 at the same time.
Note again that there is no consumer affinity. If A1-A10 and B1-B6 were deleted at the same instant, A11-A14 would next be delivered to one consumer, but not necessarily the one that handled A1-A10.

Related

When can two AWS SQS FIFO queue consumers process messages with the same group message id?

We have provisioned a single AWS SQS FIFO queue. There is a single process that adds items to this queue. All items added have the same group message id.
We start two independent identical processes, Consumer A and Consumer B. The only thing the consumers do is pull items off the queue and throw away the results. Assume that there are no network or service interruptions on either the AWS end or at our end.
I have looked carefully through AWS's documentation and cannot find an answer to this question: When can two AWS SQS FIFO queue consumers process messages with the same group message id?
I did see this text:
When messages that belong to a particular message group ID are invisible, no other consumer can process messages with the same message group ID.
Does the above apply only when all messages with the same message group ID are invisible? Or is it enough for just some of the messages for a given group message id be invisible.
For example, imagine that when the above two consumers start there are already 10,000 messages on the queue all with the same group message id. Since a consumer can be sent a maximum of 10 messages at once, does this mean that while consumer A is processing 10 messages consumer B can't get any messages?
I am looking for a reference in the AWS documentation that clarifies this, or perhaps, someone has done an experiment that decides this one way or the other.
If any single message of a group is invisible (in-flight / currently being processed) no other consumer can process / receive any message(s).
That is the only way to ensure in-order / FIFO processing because if the processing of the currently invisible message fails it needs to be reprocessed and it must not happen that another message within the same group was already processed in the meantime.
Don't have a doc for that because in my mind this is obvious and the only logical thing to do. Anything else will break the FIFO principle.
And to answer "When can two AWS SQS FIFO queue consumers process messages with the same group message id?" - never. Unless the visibility timeout of one message expires, it becomes visible again and the second consumer picks it up. But at that point from the perspective of the queue only the second consumer is working on the message, the first does not count anymore.

Concurrent processing of user Messages in SQS

We have several consumer processes which poll from a standard SQS and process the message. Each message is associated with a user. For each user, we can process 100 messages per minute. Beyond that, the API which we are using for processing would start giving 500 errors.
Now since the Queue contains messages for other users, we can't cherry-pick those users since they have their quota under the limit.
One solution to this is using FIFO and implementing message groups. But FIFO has a peculiar limitation.
You can have a maximum of 20,000 in-flight messages
This would have been completely fine, but the issue is that when a message is in flight from a message group, SQS adds the count of all the messages in that group to the in-flight count.
This article explains more in detail:
https://tomgregory.com/3-surprising-facts-about-aws-sqs-fifo-queues/#:~:text=A%20FIFO%20queue%20has%20a%20maximum%20inflight%20message%20limit%20of%2020%2C000.
In this article read "20,000 message buffer" header. That might explain what's happening.
https://aws.amazon.com/premiumsupport/knowledge-center/sqs-message-backlog/
The second solution which I could think of is to make the producer of the microservice smart. But in our case, the producer is a completely different microservice. And the owners of that microservice hardly listen.
We definitely want our consumers to scale to provide minimum wait time to each user but can't because of the above reasons.
I genuinely feel SQS was not the correct choice for this design, but can't convince my superiors of the same.
Is there a way we can overcome this situation or did we hit a dead end?
This would have been completely fine, but the issue is that when a
message is in flight from a message group, SQS adds the count of all
the messages in that group to the in-flight count
I do not think this is the case for a FIFO. I am using a FIFO where I process one message at a time per consumer(There are 3 of them). There are SQS messages from the same message group, but the inflight message count for me is always 3, i.e each of the 3 consumers processing one of them. When either of them processes the message, and the processing time for each SQS message is variable here, it picks up the next one in the queue. The inflight messages count remains 3 all the time.

Ensuring message from SQS being consumed by single consumer at a given time

I have configured an SQS FIFO queue with one publisher and three consumers. I would like to process all the messages published to the queue in order. All the messages published in the queue belong to the same group.
I have gone through the SQS documentation and understood that by configuring a suitable visibility timeout, we can ensure that only one consumer processing a message at any given time. Within the visibility timeout period, the message will not be handed over to any other consumer.
Here my question is when one message is being processed by one consumer, is there any chance that other messages in the queue get consumed by another consumer?
If yes, what is the way to ensure, at a given time only one consumer is consuming messages from queue?
other messages in the queue get consumed by another consumer?
If they are in different message groups, then different messages can be read by other consumers. But if all your messages in the FIFO queue belong to the same group, their order and processing is guaranteed by AWS:
Messages that belong to the same message group are always processed one by one, in a strict order relative to the message group (however, messages that belong to different message groups might be processed out of order).

Can I view an available message that isn't receivable in a Fifo queue

I have an SQS FIFO queue that uses thousands of message group ids for ordering and exactly-once processing.
Most messages are processed quickly by the consumer, and deleted from the queue.
However since some messages can take a while to process, the VisibilityTimeout on the queue is 2 hours.
Occasionally I'll end up with one or two messages showing as available in my queue, but they're not receivable because a message with the same message group id is in-flight.
I know I can't receive these messages, but is there any way to view the messages to know which message group id is causing issues?
Unfortunately, you can't view in-flight messages as they are simply not visible to other consumers.
However, if you have some messages that cause issues, e.g., they are non-receivable, you may consider setting up dead letter queue (DLQ):
Dead-letter queues are useful for debugging your application or messaging system because they let you isolate problematic messages to determine why their processing doesn't succeed.
This way these "bad" messages will end up in a DLQ, which will allow you to inspect them, be automatically notified about their presence or process them in a different way.

Is it possible to set up SQS standard queue to be sure to process only once my messages?

Is it possible to setup my SQS queue on AWS in order to process only once my message?
Maybe tweaking on long/short polling (is it going to have any impact on processing only once?)
or visibilityTimeout seconds,
or taking some best practice on my workers' application?
Or should I move definitely to a FIFO queue to be sure I have granted only once processing?
SQS will definitely process the message at least once but there a chance to process message more than once. Say you have a visibility timeout of 30 seconds and the consumer took 35 seconds to process the message then the message will again be available in the queue for other processes. If you don't have a problem with duplicate messages and expecting high throughput then SQS standard would be the right choice. Even you tweak with short polling or long polling you cannot guarantee that you can avoid duplication with SQS standard.
If you need to process message exactly once and if you strictly don't need any duplication then FIFO would be the right choice. Keep in mind throughput of FIFO wouldn't be that high as SQS standard. FIFO queues can support up to 300 messages per second
FIFO queues are designed to never introduce duplicate messages. However, your message producer might introduce duplicates in certain scenarios: for example, if the producer sends a message, does not receive a response, and then resends the same message. Amazon SQS APIs provide deduplication functionality that prevents your message producer from sending duplicates. Any duplicates introduced by the message producer are removed within a 5-minute deduplication interval.
Please read more about SQS standard here
Please read more about SQS FIFO here