SQS sometimes stops receiving messages or allowing message consumption, then resumes after ~5 mins. Do you know if there is a setting that can produce this behavior? I was playing around with the settings but could not change this behavior.
Notice: When I send a message, I get the ID and the OK as it was received, but the message is not in the queue.
If you are getting an ID and message is not in the queue,I believe you are using FIFO and it ignores dupliate messages within a default time frame (5 min. ?). Whatever is feeding the queue need to use a good deduplication id in case if you want to process duplicate messages.
Read this
Related
I have a queue which is supposed to receive the messages sent by a lambda function. This function is supposed to send each different message once only. However, I saw a scary amount of receive count on the console:
Since I cannot find any explanation about receive count in the plain English, I need to consult StackOverflow Community. I have 2 theories to verify:
There are actually not so many messages and the reason why "receive count" is that high is simply because I polled the messages for a looooong time so the messages were captured more than once;
the function that sends the messages to the queue is SQS-triggered, those messages might be processed by multiple processors. Though I set VisibilityTimeout already, are the messages which are processed going to be deleted? If they aren't remained, there are no reasons for them to be caught and processed for a second time.
Any debugging suggestion will be appreciated!!
So, receive count is basically the amount of times the lambda (or any other consumer) has received the message. It can be that a consumer receives a message more than once (this is by design, and you should handle that in your logic).
That being said, the receive count also increases if your lambda fails to process the message (or even hits the execution limits). The default is 3 times, so if something with your lambda is wrong, you will have at least 3 receives per message.
Also, when you are polling the message, via the AWS console, you are basically increasing the receive count.
I have an Amazon SQS queue and a dead letter queue.
My python program gets a message from the SQS queue and then, if it raise an exception, it will send the message to the dead letter queue.
Now I have a program that checks dead letter queue if those messages can still be processed. If it is, it will be sent back to main SQS queue. You see, what I expect here is an infinite loop of sorts in my testing but apparently, the message disappears after 2 tries. Why is it like this?
When I put an extra field in the message (which is random value) it somehow does what I expect (infinite loop of sending back and forth). Is there a mechanism in SQS that prevents what I do when message is the same?
def handle_retrieved_messages(self):
if not self._messages:
return None
for message in self._messages:
try:
logger.info(
"Processing Dead Letter message: {}".format(
message.get("Body")
)
)
message_body = self._convert_json_to_dict(message.get("Body"))
reprocessed = self._process_message(
message_body, None, message_body
)
except Exception as e:
logger.exception(
"Failed to process the following SQS message:\n"
"Message Body: {}\n"
"Error: {}".format(message.get("Body", "<empty body>"), e)
)
# Send to error queue
self._delete_message(message)
self._sqs_sender.send_message(message_body)
else:
self._delete_message(message)
if not reprocessed:
# Send to error queue
self._sqs_sender.send_message(message_body)
self._process_message will check if message_body has reprocess flag set to true. If true, send it back to main queue.
Now I made the contents of the message with error so every time it is processed in Main queue, it will go to dead letter. And then I expect this to keep on loop but SQS looks like has a mechanism to stop this from happening (which is good).
Question is what setting is that?
The normal way that an Amazon SQS queue works is:
Messages are sent to the queue
An application calls ReceiveMessage() on the queue to receive a message (or multiple messages). This increments the Receive Count on a message.
This puts the message(s) into an invisible state. This means that the message is still in the queue, but it is not visible if another application tries to receive messages from the queue
Once the application has finished processing the message, it calls DeleteMessage(), providing the message handle of the message. This removes the message from the queue.
However, if the application does not delete the message within the invisibility timeout period, then the message appears on the queue again. This is done in case the application has crashed. Instead of losing the message, it is put back on the queue so that another (or the same) application can process it again.
If a message exceeds the invisibility timeout period AND its Receive Count now exceeds the Maximum Receives setting, it is not put back on the queue. Instead, it is placed on the Dead Letter Queue (DLQ).
So, the normal process is that Amazon SQS moves messages to the DLQ after the message has been received more than (in your case) 10 attempted Receives. It is NOT the job of your application to move the message to the Dead Letter Queue!
If you want to handle all the 'dead letter' processing yourself (eg moving to different queues), then turn off the DLQ functionality on the queue itself. This is probably causing your messages to disappear or go to the wrong location.
By the way, when deleting a message, you need to provide the MessageHandle of the message, not the message itself.
I want to process messages from an Amazon SQS Dead Letter Queue.
What is the best way to process them?
Receive messages from dead letter queue and process it.
Receive messages from dead letter queue put back in main queue and then process it?
I just need to process messages from dead letter queue once in a while.
After careful consideration of various options, I am going with the option 2 "Receive messages from dead letter queue put back in main queue and then process it" you mentioned.
Make sure that while transferring the messages from one queue messages are not lost.
Before putting messages from DLQ to main queue, make sure that the errors faced in the main listener (mainly coding errors if any) are resolved or if any network issues are resolved.
The listener of the main queue has retried the message already and retrying it again. So please make sure to either skip already successful steps of message processing in case message is being retried. Also revert successfully processed steps in case of any errors. (This will will help in the message retry as well.)
DLQ is meant for unexpected errors. So you may have an on-demand job for doing this.
Presumably the message ended up in the Dead Letter Queue for a reason, after failing several times.
It would not be a good idea to put it back in the main queue because, presumably, it would fail again and you would create an infinite loop.
Initially, dead messages should be examined manually to determine the causes of failure. Then, based on this information, an alternate flow could be developed.
I am working on a project that will require multiple workers to access the same queue to get information about a file which they will manipulate. Files are ranging from size, from mere megabytes to hundreds of gigabytes. For this reason, a visibility timeout doesn't seem to make sense because I cannot be certain how long it will take. I have though of a couple of ways but if there is a better way, please let me know.
The message is deleted from the original queue and put into a
‘waiting’ queue. When the program finished processing the file, it
deletes it, otherwise the message is deleted from the queue and put
back into the original queue.
The message id is checked with a database. If the message id is
found, it is ignored. Otherwise the program starts processing the
message and inserts the message id into the database.
Thanks in advance!
Use the default-provided SQS timeout but take advantage of ChangeMessageVisibility.
You can specify the timeout in several ways:
When the queue is created (default timeout)
When the message is retrieved
By having the worker call back to SQS and extend the timeout
If you are worried that you do not know the appropriate processing time, use a default value that is good for most situations, but don't make it so big that things become unnecessarily delayed.
Then, modify your workers to make a ChangeMessageVisiblity call to SQS periodically to extend the timeout. If a worker dies, the message stops being extended and it will reappear on the queue to be processed by another worker.
See: MessageVisibility documentation
I'm using Amazon SQS queues in a very simple way. Usually, messages are written and immediately visible and read. Occasionally, a message is written, and remains In-Flight(Not Visible) on the queue for several minutes. I can see it from the console. Receive-message-wait time is 0, and Default Visibility is 5 seconds. It will remain that way for several minutes, or until a new message gets written that somehow releases it. A few seconds delay is ok, but more than 60 seconds is not ok.
There a 8 reader threads that are long polling always, so its not that something is not trying to read it, they are.
Edit : To be clear, none of the consumer reads are returning any messages at all and it happens regardless of whether or not the console is open. In this scenario, only one message is involved, and it is just sitting in the queue invisible to the consumers.
Has anyone else seen this behavior and what I can do to improve it?
Here is the sdk for java I am using:
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk</artifactId>
<version>1.5.2</version>
</dependency>
Here is the code that does the reading (max=10,maxwait=0 startup config):
void read(MessageConsumer consumer) {
List<Message> messages = read(max, maxWait);
for (Message message : messages) {
if (tryConsume(consumer, message)) {
delete(message.getReceiptHandle());
}
}
}
private List<Message> read(int max, int maxWait) {
AmazonSQS sqs = getClient();
ReceiveMessageRequest rq = new ReceiveMessageRequest(queueUrl);
rq.setMaxNumberOfMessages(max);
rq.setWaitTimeSeconds(maxWait);
List<Message> messages = sqs.receiveMessage(rq).getMessages();
if (messages.size() > 0) {
LOG.info("read {} messages from SQS queue",messages.size());
}
return messages;
}
The log line for "read .." never appears when this is happening, and its what causes me to go in with the console and see if the message is there or not, and it is.
It sounds like you are misinterpreting what you are seeing.
Messages "in flight" are not pending delivery, they're messages that have already been delivered but not further acted on by the consumer.
Messages are considered to be in flight if they have been sent to a client but have not yet been deleted or have not yet reached the end of their visibility window.
— https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-available-cloudwatch-metrics.html
When a consumer receives a message, it has to -- at some point -- either delete the message, or send a request to increase the timeout for that message; otherwise the message becomes visible again after the timeout expires. If a consumer fails to do one of these things, the message automatically becomes visible again. The visibility timeout is how long the consumer has before one of these things must be done.
Messages should not be "in flight" without something having already received them -- but that "something" can include the console itself, as you'll note on the pop-up you see when you choose "View/Delete Messages" in the console (unless you already checked the "Don't show this again" checkbox):
Messages displayed in the console will not be available to other applications until the console stops polling for messages.
Messages displayed in the console are "in flight" while the console is observing the queue from the "View/Delete Messages" screen.
The part that does not make obvious sense is messages being in flight "for several minutes" if your default visibility timeout is only 5 seconds and nothing in your code is increasing that timeout... however... that could be explained almost perfectly by your consumers not properly disposing of the message, causing it to timeout and immediately be redelivered, giving the impression that a single instance of the message was remaining in-flight, when in fact, the message is briefly transitioning back to visible, only to be claimed almost immediately by another consumer, taking it back to in-flight again.
It may happen when you send or lock a message and within some seconds you try to get the fresh list of messages. Amazon SQS stores the data into multiple servers and in multiple data centers http://aws.amazon.com/sqs/faqs/#How_reliably_is_my_data_stored_in_Amazon_SQS.
To get rid of these issues you need to wait more so that queue would have more time to give appropriate results.