Delete some messages from the AWS SQS queue before polling - amazon-web-services

I have one Node.js application that pool messages from an AWS SQS queue and process these messages.
However some of these messages are not relevant to my service and they will be filtered out and not processed.
I am wondering if I can do this filtering stuffs from AWS before receiving these irrelevant messages ...
For example if the message does not have the following attribute data.name than this message will be deleted before reaching my application ...
Filtering these messages before sending them to to the queue is not possible (according to my client).

No that is not possible without polling the message itself. So you would need some other consumer polling the messages and returning them to queue (not calling DeleteMessage on the received delete handle) if they meet your requirements but that would be overkill in most of the cases, depending on the ratio of "good" and "bad" messages but still you would have to process "good" messages twice.
Better way would be to set up additional consumer and 2 queues. Producer sends messages to the first queue which is polled by the first consumer whose sole purpose is to filter messages and to send "good" messages to the second queue which would then be polled by your current consumer application. But again, this is much more costly.
If you can't filter messages before sending them to queue then filter them in your consuming application or you will have to pay some extra for this extra functionality.

Related

Is there a way I can query a specific queue which is part of a larger queue in SQS?

I have a process which reads a message from SQS and process it. Each message has a group_message_key, and the processing of each message is relatively fast, but if I read a message with a different group_message_key, there is an extra processing time. Therefore I try to group the messages in order to avoid having these context switches.
I would like to have the option to initially read from a general queue, where all the messages are queued, and only after i have read the first message, query the message queue to deliver only messages with the specific group_message_key.
I am currently using Amazon SQS, but I don't mind at all changing to other message broker which can provide the feature I am missing (ie: Rabbit, Kafka).
I read from the same queue in parallel from many different processes so the solution would still need to support this. Amazon SQS has a FIFO queue which although it doesn't allow requesting messages with a specific group_message_key, it does tries to group these messages together, the problem is that it doesn't allow many workers to process the same message_group_id
I presume that you are referring to the Amazon SQS Message Group ID.
This Message Group ID is unique to FIFO queues and ensures that messages with the same Message Group ID are always processed in-order.
To provide an example, imagine a number of buses sending their GPS coordinates to an Amazon SQS queue. A queue consumer wishes to retrieve these coordinates from the queue and plot the path of each bus on a map. It is important to always retrieve messages from a specific bus in the order that the messages were sent, but messages from multiple buses can also be processed.
This need can be accomplished by having each bus send a unique ID in the Message Group ID when sending its coordinates. When a consumer pulls a message from the queue, no further messages with the same Message Group ID will be provided to a queue consumer until this particular message has been processed. This ensures that the messages from a given bus are never processed out-of-order.
However, it is not possible to request messages with a specific Message Group ID. If your request to receive message from the queue indicates that you are willing to receive a Batch of messages, then multiple messages with the same Message Group ID might be provided.

AWS SQS FIFO can't receive all messages

I'm learning AWS SQS and I've sent 6 messages to a FIFO queue, with the same GroupId. But when I try to poll for messages, I can only receive 2 of them (Why? I set the MaxNumberOfMessages=10 using boto3 API, but I can only receive 2. How can I receive all of the messages?).
(As shown in this picture, I have 5 messages available, but I can only receive 2 messages.)
I tried to delete one of two received messages and poll again. The deleted one is gone, and I received a new message. But in total, it's still 2 messages.
Using an Amazon SQS FIFO queue means that you want to receive messages in order. It will also try to ensure ordering within a Message Group.
This means that, if some messages for a given Message Group ID are currently being processed ("in flight"), no more messages for that Message Group will be provided since an earlier message might be returned to the queue if not fully processed. This could result in messages being processed out-of-order.
From Using the Amazon SQS message group ID - Amazon Simple Queue Service:
To interleave multiple ordered message groups within a single FIFO queue, use message group ID values (for example, session data for multiple users). In this scenario, multiple consumers can process the queue, but the session data of each user is processed in a FIFO manner.
When messages that belong to a particular message group ID are invisible, no other consumer can process messages with the same message group ID.
Therefore, your choices are:
Don't uses a FIFO queue, or
Use different Message Group IDs, or
Be happy with what it is doing because that is desired FIFO behaviour
From AWS Docs:
The maximum number of messages to return. Amazon SQS never returns more messages than this value (however, fewer messages might be returned).
Just like doc's write, you can get less messages. You have to call ReceiveMessage multiple times, usually done in a loop. You can also increase WaitTimeSeconds so that the ReceiveMessage does not return immedietly if there are no messages.

Can I view an available message that isn't receivable in a Fifo queue

I have an SQS FIFO queue that uses thousands of message group ids for ordering and exactly-once processing.
Most messages are processed quickly by the consumer, and deleted from the queue.
However since some messages can take a while to process, the VisibilityTimeout on the queue is 2 hours.
Occasionally I'll end up with one or two messages showing as available in my queue, but they're not receivable because a message with the same message group id is in-flight.
I know I can't receive these messages, but is there any way to view the messages to know which message group id is causing issues?
Unfortunately, you can't view in-flight messages as they are simply not visible to other consumers.
However, if you have some messages that cause issues, e.g., they are non-receivable, you may consider setting up dead letter queue (DLQ):
Dead-letter queues are useful for debugging your application or messaging system because they let you isolate problematic messages to determine why their processing doesn't succeed.
This way these "bad" messages will end up in a DLQ, which will allow you to inspect them, be automatically notified about their presence or process them in a different way.

SQS Queues/ Visibility Timeouts/ message groups

I am new to AWS. I am trying to understand SQS here. I have gone over a few trainings also but I still could not get some answers there in the discussion forum. I am re-iterating my question here. Note that I know that a few questions below have obvious answers and are therefore more of a rhetoric. My confusion stems from the fact that my understanding of the topic at present leads me to give conflicting answers to the follow up questions that spring up in my mind after the obvious known ones and takes away the confidence of whatever I think I understand alright.
If I have a Standard queue named MyQueue and there are 100 messages, and if there are 2 completely separate applications (as consumers; note they are not a consumer group of the same applications like you have in Kafka; instead they are 2 separate applications) for this queue, then the consumers may receive
(i) out of order messages and
(ii) multiple copies of the messages
Both of my applications do not need to bother about the order of the messages. But for the sake of the question lets say we have a perfect order of delivery, no multiple copies and no network issues and both consumers finish their processing if each message well within the Visibility Timeout window.
Q1: Will both the applications individually receive 100 messages each or will a message that is made available to one consumer won't ever be delivered to the other consumer? If the latter is true ( with no network issues, out of order delivery, multiple deliveries), then:
Is SNS-SQS fanout the way to ensure that the same message is processed by multiple consumers?
Is the consumer supposed to delete the message from the queue after processing? So, if a message is picked up by a processor, and it goes into visibility timeout while the processing happens and then is not deleted by the consumer even after the processing is complete before the visibility timeout, then will the message appear back for other consumers possibly to consume it? If that is the case, then won't the same thing apply to a FIFO queue as well?
Other Questions:
Q2: Is the Visibility timeout applicable to both Standard Queue and FIFO Queue? If it is also applicable to FIFO Queue which promises exactly once delivery, then, if the Visibility Timeout appears before the consumer ends processing a message, then it reappears in the queue only to be delivered again thereby going back to at least once processing. Can someone confirm?
Q3: What are multiple message Groups within a FIFO Queue? Are they like partitions of a queue?
Q: Will both the applications individually receive 100 messages each?
A consumer can request up to 10 messages per API call. These will become 'invisible' and will not be provided to other consumers. (Well, there actually is a small possibility that a message might be provided to multiple consumers. It is rare, but it can happen. If this is bad for your use-case, then you should track the messages in a database to ensure they are only processed once each.)
Q: Is SNS-SQS fanout the way to ensure that the same message is processed by multiple consumers?
It is very strange to want to want a single message consumed by 'multiple consumers'. The normal desire is to process each message once. If you do want a message processed by multiple consumers then, yes, you could send the message to SNS, which could then send it to multiple queues.
Q:Is the consumer supposed to delete the message from the queue after processing?
Yes. Amazon SQS does not know when a message is processed. The consumer must delete the message via the ReceiptHandle provided when the message was received. If a message times-out and another consumer receives it, SQS will provide a different ReceiptHandle so it knows which process requested the delete.
This also applies to FIFO queues.
Q: Is the Visibility timeout applicable to both Standard Queue and FIFO Queue?
Yes. If the visibility timeout expires, the message will be provided to another consumer. The "exactly once delivery" avoids the rare situation mentioned above when a message in a Standard queue might be provided more than once. However, if visibility times-out, even in a FIFO queue, then it will intentionally be visible on the queue again.
Q: What are multiple message Groups within a FIFO Queue? Are they like partitions of a queue?
A message group is a way of grouping messages that must be delivered in-order.
Let's say there are two message groups, A and B, and they send messages in this order: A1, B1, A2, B2
Message B1 can be provided even if A1 is not yet deleted. However, message A2 will not be provided until A1 is deleted. Think of them as 'mini-queues'. This allows processing of lots of messages are are unrelated, without having to wait for all previous messages to be deleted.
See: Using the Amazon SQS Message Group ID - Amazon Simple Queue Service
Q1: Will both the applications individually receive 100 messages each or will a message that is made available to one consumer won't ever be delivered to the other consumer?
Neither of these is quite accurate.
Standard queues never intentionally deliver a message more than once. It is possible that messages may occasionally be delivered more than once -- but this is the exception and is an artifact of the fact that SQS is a distributed system and situations could arise where, for example, the queue had a message stored in multiple replicas and the fact that a message was not known to all replicas due to an internal failure.
If a message is inadvertently delivered more than once, it could be to multiple consumers or the same consumer. The consumer "connections" to SQS are actually stateless, resetting each time a list of messages is delivered, so SQS does not have a sense of which consumer it delivered each message to.
Consumers delete their messages after processing, otherwise their visibilitt timeout expires and they are delivered again and again -- to whichever consumer the luck of the draw delivers them to, each time. As noted, SQS has no concept of consumer identity or state. (In high volume applications, a single consumer may actually have multiple connections to SQS, all receiving messages in parallel, because the network round-trips and cycle of receive/delete will otherwise limit a single consumer to a few hundred messages per second. Whether these connections are handled using asynchronous I/O, threads, etc., is unimportant to SQS, which doesn't care which consumer is on a given connection.)
If you want all messages sent to all consumers, you need fan-out from SNS to SQS.
Q2: Is the Visibility timeout applicable to both Standard Queue and FIFO Queue?
Yes. Because (noted above) the connection to SQS is not a persistent, stateful connection, SQS uses visibility timeout as the indication that a consumer has lost the message or failed ungracefully, so the message needs to be made accessible again. (Dead letter queues prevent this from happening endlessly, moving a message to a different queue, since repeated failures indicate a problem with a consumer, or a "poison pill" message.)
FIFO queues retain in-order delivery, here, and you could argue that they revert to "at least once" delivery, but the idea is that this should never happen. If it does, then your visibility timeout is too short or your consumer is crashing or otherwise misplacing messages.
Q3: What are multiple message Groups within a FIFO Queue?
Message groups allow FIFO queues to support in-order, parallel processing of groups messages whose ordering relative to each other across group boundaries doesn't matter. Messages are delivered in order, within each group.
If a FIFO queue, if all messages are sent with the same group ID, then only one consumer can be working at a time.
In-order delivery (simple illustration) means that message 2 will not be delivered to any consumer until message 1 has been received and deleted -- finished -- by a consumer. In order delivery includes all processing (not merely the initial "delivery"). Or if 20 messages in the queue have the same group ID and two consumers request 10 messages each, one consumer gets 10 and the other gets nothing -- yet -- because those second 10 messages have to be sequestered, until the first 10 have been processed (else we are no longer "in order").
In the 20 messages scenario, if 14 were in group A and 6 were in group B, one consumer would receive A1-A10, A11-A14 would be sequestered until A1-A10 were complete, but while the first consumer is busy, another consumer could have B1-B6 at the same time.
Note again that there is no consumer affinity. If A1-A10 and B1-B6 were deleted at the same instant, A11-A14 would next be delivered to one consumer, but not necessarily the one that handled A1-A10.

AWS SQS Receive Messages -- How to Know when Queue is Empty

I want to get all the messages in the queue to process them. However the property for MaxNumberOfMessages is 10 (based on documentation)
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/APIReference/API_ReceiveMessage.html
How can I read in all messages so I can process them? Or how would I know when queue is empty?
thanks
When you receive messages from the queue, they are marked as "in flight." After you successfully process them, you send a call to the queue to delete them. This call will include IDs of each of the messages.
When the queue is empty, the next read will have an empty Messages array.
Usually when I do this I wrap my call to read the queue in a loop (a while loop) and only keep processing if I have Messages after doing a read.
It shouldn't make any difference if it's a FIFO queue or a standard one.
To check if the queue is empty you have to verify the total number of messages in the queue is zero. SQS does not provide a single metric for this, rather you have to calculate the sum of three different metrics.
From the docs:
To confirm that a queue is empty (AWS CLI, AWS API)
Stop all producers from sending messages.
Repeatedly run one of the following commands:
AWS CLI: get-queue-attributes
AWS API: GetQueueAttributes
Observe the metrics for the following attributes:
ApproximateNumberOfMessagesDelayed
ApproximateNumberOfMessagesNotVisible
ApproximateNumberOfMessages
When all of them are 0 for several minutes, the queue is empty.
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/confirm-queue-is-empty.html
Getting an empty response from a ReceiveMessage call does NOT necessarily mean the queue is empty. You can have messages in the queue and still receive an empty response if:
Messages are delayed - You can set delays on individual messages for standard queues or at the queue level for standard and FIFO queues. During the delay period messages are invisible to consumers.
Messages are in-flight - When a consumer receives a message, that message remains in the queue until the consumer deletes it by calling DeleteMessage. While the message is in this state it is considered in-flight and is not available for other consumers.
Multiple messages have the same message group id in a FIFO queue - When a consumer receives a message from a FIFO queue, no other consumer can receive messages from the same message group. This ensures messages are processed in FIFO order.
By summing the metrics listed above, you can account for all of these scenarios.