Process messages from Amazon SQS Dead Letter Queue - amazon-web-services

I want to process messages from an Amazon SQS Dead Letter Queue.
What is the best way to process them?
Receive messages from dead letter queue and process it.
Receive messages from dead letter queue put back in main queue and then process it?
I just need to process messages from dead letter queue once in a while.

After careful consideration of various options, I am going with the option 2 "Receive messages from dead letter queue put back in main queue and then process it" you mentioned.
Make sure that while transferring the messages from one queue messages are not lost.
Before putting messages from DLQ to main queue, make sure that the errors faced in the main listener (mainly coding errors if any) are resolved or if any network issues are resolved.
The listener of the main queue has retried the message already and retrying it again. So please make sure to either skip already successful steps of message processing in case message is being retried. Also revert successfully processed steps in case of any errors. (This will will help in the message retry as well.)
DLQ is meant for unexpected errors. So you may have an on-demand job for doing this.

Presumably the message ended up in the Dead Letter Queue for a reason, after failing several times.
It would not be a good idea to put it back in the main queue because, presumably, it would fail again and you would create an infinite loop.
Initially, dead messages should be examined manually to determine the causes of failure. Then, based on this information, an alternate flow could be developed.

Related

Understanding SQS message receive amount

I have a queue which is supposed to receive the messages sent by a lambda function. This function is supposed to send each different message once only. However, I saw a scary amount of receive count on the console:
Since I cannot find any explanation about receive count in the plain English, I need to consult StackOverflow Community. I have 2 theories to verify:
There are actually not so many messages and the reason why "receive count" is that high is simply because I polled the messages for a looooong time so the messages were captured more than once;
the function that sends the messages to the queue is SQS-triggered, those messages might be processed by multiple processors. Though I set VisibilityTimeout already, are the messages which are processed going to be deleted? If they aren't remained, there are no reasons for them to be caught and processed for a second time.
Any debugging suggestion will be appreciated!!
So, receive count is basically the amount of times the lambda (or any other consumer) has received the message. It can be that a consumer receives a message more than once (this is by design, and you should handle that in your logic).
That being said, the receive count also increases if your lambda fails to process the message (or even hits the execution limits). The default is 3 times, so if something with your lambda is wrong, you will have at least 3 receives per message.
Also, when you are polling the message, via the AWS console, you are basically increasing the receive count.

SQS queue sometimes freezes

SQS sometimes stops receiving messages or allowing message consumption, then resumes after ~5 mins. Do you know if there is a setting that can produce this behavior? I was playing around with the settings but could not change this behavior.
Notice: When I send a message, I get the ID and the OK as it was received, but the message is not in the queue.
If you are getting an ID and message is not in the queue,I believe you are using FIFO and it ignores dupliate messages within a default time frame (5 min. ?). Whatever is feeding the queue need to use a good deduplication id in case if you want to process duplicate messages.
Read this

Why one SQS message doesn't get deleted while others do?

I came across this question in my AWS study:
You create an SQS queue and decide to test it out by creating a simple
application which looks for messages in the queue. When a message is
retrieved, the application is supposed to delete the message. You
create three test messages in your SQS queue and discover that
messages 1 and 3 are quickly deleted but message 2 remains in the
queue. What is a possible cause for this behavior? Choose the 2
correct answers
Options:
A. The order that messages are received in is not guaranteed in SQS
B. Message 2 uses JSON formatting
C. You failed to set the correct permissions on message 2
D. Your application is using short polling
Correct Answer:
A. The order that messages are received in is not guaranteed in SQS
D. Your application is using short polling
Why A is considered as one the answer here? I understand A is correct from the SQS feature definition, however, it does not explain to the issue in this question, right? Why it is not the permission issue?
Anything I am missing?
Thank you.
I think that a justification for A & D is:
Various workers might be pulling messages from the queue
Given that it is not a FIFO queue, then message order is not guaranteed (A)
Short-polling will not necessarily check every 'server', it will simply return a message (D)
Message 2 simply hasn't been processed yet
Frankly, I don't think that D is so relevant, because Long Polling returns as soon as it gets a message and it simply means that no worker has requested the message yet.
B is irrelevant because message content has no impact on retrieval.
C is incorrect because there are no permissions on individual messages. Only the queue, or users accessing the queue, have permissions.

How to ensure SQS FIFO is blocked while having a message in the corresponding deadletter queue

Imagine the following lifetime of an Order.
Order is Paid
Order is Approved
Order is Completed
We chose to use an SQS FIFO to ensure all these messages are processed in the order they are produced, to avoid for example changing the status of an order to Approved only after it was Paid and not after has been Completed.
But let's say that there is an error while trying to Approve an order, and after several attempts the message will be moved to the Deadletter queue.
The problem we noticed is the subsequent message, that is "Order is completed", it is processed, even though the previous message, "Approved", it is in the deadletter queue.
How we should handle this?
Should we check the contents of deadletter queue for having messages with the same MessageGroupID as the consuming one, assuming we could do this?
Is there a mechanism that we are missing?
Sounds to me like you are using a single Queue for multiple types of events, where I would probably recommend (at least) three seperate queues:
An order paid event queue
An order approved event queue
An order completed event queue
When a order payment comes in, an event is put into the first queue, once your system has successfully processed that payment, it removes the item from the first queue (deletes the message), and then inserts 'Order Approved' event into the 2nd queue.
The process responsible for processing those events, only watches that queue and does what it needs to do, and once complete, deletes the message and inserts a third message into the third queue so that yet another process can see and act on that message - process it and then delete it.
If anything fails along the way the message will eventually endup in a dead letter queue - either the same on, or one per queue - that makes no difference, but nothing that was supposed to happen AFTER the event failed would happen.
Doesn't even sound to me like you need a FIFO queue at all in this case, though there is no real harm (except for the slighlty higher cost, and lower throughput limits).
Source from AWS https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-dead-letter-queues.html:
Don't use a dead-letter queue with a FIFO queue if you don't want to break the exact order of messages or operations. For example, don't use a dead-letter queue with instructions in an Edit Decision List (EDL) for a video editing suite, where changing the order of edits changes the context of subsequent edits.

How to prevent other workers from accessing a message which is being currently processed?

I am working on a project that will require multiple workers to access the same queue to get information about a file which they will manipulate. Files are ranging from size, from mere megabytes to hundreds of gigabytes. For this reason, a visibility timeout doesn't seem to make sense because I cannot be certain how long it will take. I have though of a couple of ways but if there is a better way, please let me know.
The message is deleted from the original queue and put into a
‘waiting’ queue. When the program finished processing the file, it
deletes it, otherwise the message is deleted from the queue and put
back into the original queue.
The message id is checked with a database. If the message id is
found, it is ignored. Otherwise the program starts processing the
message and inserts the message id into the database.
Thanks in advance!
Use the default-provided SQS timeout but take advantage of ChangeMessageVisibility.
You can specify the timeout in several ways:
When the queue is created (default timeout)
When the message is retrieved
By having the worker call back to SQS and extend the timeout
If you are worried that you do not know the appropriate processing time, use a default value that is good for most situations, but don't make it so big that things become unnecessarily delayed.
Then, modify your workers to make a ChangeMessageVisiblity call to SQS periodically to extend the timeout. If a worker dies, the message stops being extended and it will reappear on the queue to be processed by another worker.
See: MessageVisibility documentation