Deleting message from SQS FIFO queue: The receipt handle has expired - amazon-web-services

I switched to a FIFO queue and I got this error message when I tried to delete a message from the queue
Value {VALUE} for parameter ReceiptHandle is invalid. Reason: The receipt handle has expired.
It appears that the error happens because I tried to delete the message after visibility timeout has expired. I changed the default visibility timeout 0 to the maximum, 12 hours, this solved partially the issue. Sometimes it could happens that a message still in my queue for longer than 12 hours before I could perform it and than delete it, so I will get the error again. Is there any solution to increase the visibility timeout for more than 12 hours or to bypass this error by another way?

You can do it in AWS Console, but the trick is, you have to do it while the Polling progress is still active.
For example, when you poll for 10 seconds, and 10 messages, you need to delete the message within 10 seconds or before 10th messages arrived, whichever comes first, after the polling stopped, your window of deletion closed.
You get error when polling stopped
Adjust polling duration, and message count
While polling, select the message and delete
Message deleted successfully.

TLDR: You want to look into the ChangeMessageVisibility API.
Details
The reason for visibility timeout is to make sure the process handling the message hasn't unexpectedly died, and allow the message to be processed by a different worker.
If your process needs to take longer than the configured visibility timeout, it essentially needs to send some signal to SQS that says "I'm still alive and working on this message". That's what ChangeMessageVisibility is for.
If you have wide variability in the time required to consume and process a message, I suggest setting a small-ish default visibility timeout and having your workers emit a "heartbeat" (using ChangeMessageVisibility) to indicate they're still alive and working on the message. That way you can still recover relatively quickly when a worker legitimately fails.
Note there is also ChangeMessageVisibilityBatch for doing this on batches of messages.

Try increasing the value of VisibilityTimeout parameter in sqs.receive_message() for the message you wish to delete using ReceiptHandle

change VisibilityTimeout:0 to VisibilityTimeout:60 it's working
const params = {
AttributeNames:[
"SentTimestamp"
],
MaxNumberOfMessages:10,
MessageAttributeNames:[
"All"
],
QueueUrl:queueURL,
VisibilityTimeout:60,
WaitTimeSeconds:0,
};
sqs.receiveMessage(params,function (err,data) {
console.log(data);
if (err) {
console.log("Receive Error", err);
}else if (data.Messages) {
let deleteParams = {
QueueUrl: queueURL,
ReceiptHandle: data.Messages[0].ReceiptHandle
};
sqs.deleteMessage(deleteParams, function(err, data) {
if (err) {
console.log("Delete Error", err);
} else {
console.log("Message Deleted", data);
}
});
}
});

setting VisibilityTimeout greater than 0 will work

Related

Amazon Java SQS Client: How can I selectively delete a message from the queue?

I have a Spring Boot class the receives messages from a (currently) FIFO SQS queue like so:
ReceiveMessageRequest receiveMessageRequest = new ReceiveMessageRequest()
.withQueueUrl(queueUrl)
.withMaxNumberOfMessages(numMessages);
Map<String, String> messageMap = new HashMap<>();
try {
List<Message> messages = sqsClient.receiveMessage(receiveMessageRequest).getMessages();
if (!messages.isEmpty()) {
if (messages.size() == 1) {
Message message = messages.get(0);
String messageBody = message.getBody();
String receiptHandle = message.getReceiptHandle();
// snipped
}
}
}
I want the ability to "skip around" messages and find only a particular message to remove from this queue. My lead is certain this can be done, but I have doubts. These are my thoughts:
If I change to a Standard Queue, can this be done?
I see you have to receive a message to get the receiptHandle for the DeleteMessageRequest.
But if I receive a message I want processed, not the message to delete, how do I put it
back in the queue?
Do I extend the visibilityTimeout to let the message be picked up later?
yes, exactly as you described: receive the message, extract the receipt handle, submit a delete message request
yes
by simply not doing anything, the message will automatically pop back up in the queue after its visibility timeout expires. Note that even such a basic receive increases the receive counter and may push the message into a dlq depending on your configuration
no, extending the visibility timeout will only delay further processing even more

how can I prevent dynamoDB Stream handler from infinitely processing a record when I use batchItemFailures

I have a dynamoDB stream which is triggering a lambda handler that looks like this:
let failedRequestId: string
await asyncForEachSerial(event.Records, async (record) => {
try {
await handle(record.dynamodb.OldImage, record.dynamodb.NewImage, record, context)
return true
} catch (e) {
failedRequestId = record.dynamodb.SequenceNumber
}
return false //break;
})
return {
batchItemFailures:[ { itemIdentifier: failedRequestId } ]
}
I have my lambda set up with a DestinationConfig.onFailure pointing to a DLQ I configured in SQS. The idea behind the handler is to process a batch of events and interrupt at the first failure. Then it reports the most recent failure in 'batchItemFailures' which tells the stream to continue at that record next try. (I pulled the idea from this article)
My current issue is that if there is a genuine failure of my handle() function on one of those records, then my exit code will trigger that record as my checkpoint for the next handler call. However the dlq condition doesn't ever trigger and I end up processing that record over and over again. I should also note that I am trying to avoid reprocessing records multiple times since handle() is not idempotent.
How can I elegantly handle errors while maintaining batching, but without triggering my handle() function more than once for well-behaved stream records?
I'm not sure if you have found the answer you were looking for. I'll respond in case someone else come across this issue.
There are 2 other parameters you'd want to use to avoid that issue. Quoting documentation (https://docs.aws.amazon.com/lambda/latest/dg/with-ddb.html):
Retry attempts – The maximum number of times that Lambda retries when the function returns an error. This doesn't apply to service errors or throttles where the batch didn't reach the function.
Maximum age of record – The maximum age of a record that Lambda sends to your function.
Basically, you'll have to specify how many time the failures should be retried and how far back in the events Lambda should be looking at.

AWS SQS Dead Letter Queue - only in certain cases

My question isn't very easy but I believe I will find with Your help the solution for my problem. I have a microservice that reads messages from AWS SQS queue and saves it in Redis.
On AWS I have two queues:
AQueue (standard queue)
DeadLetterQueue
I'd like to:
Remove messages from my standard queue (AQueue) and move them to DeadLetterQueue when for example there is 5 times error parsing
When for example my Redis is temporarily unavailable, I'd like to NOT to remove currently read messages. In This case these messages should be read over and over again until the Redis will work.
HOW can I do this? On AWS I have set my standard queue (AQueue) to send messages to DeadLetterQueue when the messages will fail 5 times
My listener:
#SqsListener(value = "${amazon.sqs.destination}", deletionPolicy = SqsMessageDeletionPolicy.NEVER)
public void receive(String requestJSON, Acknowledgment acknowledgment) {
try (Jedis jedis = jedisPool.getResource()) {
if (redisPassword != null && !redisPassword.isEmpty()) {
jedis.auth(redisPassword);
}
long key = jedis.incr("Trace:");
Trace trace = Trace.fromJSON(requestJSON);
trace.setTechnicalId(Long.toString(key));
traceRepository.save(trace);
acknowledgment.acknowledge();
}catch (IOException e) {
log.error("Parse error: " + e.getMessage());
queueMessagingTemplate.convertAndSend(deadLetterQueue, requestJSON);
acknowledgment.acknowledge();
} catch(Exception e){
log.error("Problem with NOSQL database Redis: " + e.getMessage());
}
Unfortunately, EVEN WHEN I don't call acknowledgment.acknowledge(); my message after 5 attempts is moving to DeadLetterQueue.
It sounds like you've got the logic reversed - NOT acking the message will mean that it'll be treated like a failure (and retried, and eventually moved to the dead-letter-queue). ACK the message when you want to mark it as consumed, don't ACK it if you want it to retry/DLQ.
(Note - I'm not familiar with the spring tooling involved here, but I'm assuming straightforward mapping to core SQS concepts)
The logic should be:
If Redis is down, do not pull messages from SQS (this will keep them in the queue)
If a message is successfully processed, delete the message using the supplied MessageHandle
Configure the Queue to move messages to the Dead Letter Queue after 5 processing attempts

How to avoid receiving messages multiple times from a ServcieBus Queue when using the WebJobs SDK

I have got a WebJob with the following ServiceBus handler using the WebJobs SDK:
[Singleton("{MessageId}")]
public static async Task HandleMessagesAsync([ServiceBusTrigger("%QueueName%")] BrokeredMessage message, [ServiceBus("%QueueName%")]ICollector<BrokeredMessage> queue, TextWriter logger)
{
using (var scope = Program.Container.BeginLifetimeScope())
{
var handler = scope.Resolve<MessageHandlers>();
logger.WriteLine(AsInvariant($"Handling message with label {message.Label}"));
// To avoid coupling Microsoft.Azure.WebJobs the return type is IEnumerable<T>
var outputMessages = await handler.OnMessageAsync(message).ConfigureAwait(false);
foreach (var outputMessage in outputMessages)
{
queue.Add(outputMessage);
}
}
}
If the prerequisites for the handler aren't fulfilled, outputMessages contains a BrokeredMessage with the same MessageId, Label and payload as the one we are currently handling, but it contains a ScheduledEnqueueTimeUtcin the future.
The idea is that we complete the handling of the current message quickly and wait for a retry by scheduling the new message in the future.
Sometimes, especially when there are more messages in the Queue than the SDK peek-locks, I see messages duplicating in the ServiceBus queue. They have the same MessageId, Label and payload, but a different SequenceNumber, EnqueuedTimeUtc and ScheduledEnqueueTimeUtc. They all have a delivery count of 1.
Looking at my handler code, the only way this can happen is if I received the same message multiple times, figure out that I need to wait and create a new message for handling in the future. The handler finishes successfully, so the original message gets completed.
The initial messages are unique. Also I put the SingletonAttribute on the message handler, so that messages for the same MessageId cannot be consumed by different handlers.
Why are multiple handlers triggered with the same message and how can I prevent that from happening?
I am using the Microsoft.Azure.WebJobs version is v2.1.0
The duration of my handlers are at max 17s and in average 1s. The lock duration is 1m. Still my best theory is that something with the message (re)locking doesn't work, so while I'm processing the handler, the lock gets lost, the message goes back to the queue and gets consumed another time. If both handlers would see that the critical resource is still occupied, they would both enqueue a new message.
After a little bit of experimenting I figured out the root cause and I found a workaround.
If a ServiceBus message is completed, but the peek lock is not abandoned, it will return to the queue in active state after the lock expires.
The ServiceBus QueueClient, apparently, abandons the lock, once it receives the next message (or batch of messages).
So if the QueueClient used by the WebJobs SDK terminates unexpectedly (e.g. because of the process being ended or the Web App being restarted), all messages that have been locked appear back in the Queue, even if they have been completed.
In my handler I am now completing the message manually and also abandoning the lock like this:
public static async Task ProcessQueueMessageAsync([ServiceBusTrigger("%QueueName%")] BrokeredMessage message, [ServiceBus("%QueueName%")]ICollector<BrokeredMessage> queue, TextWriter logger)
{
using (var scope = Program.Container.BeginLifetimeScope())
{
var handler = scope.Resolve<MessageHandlers>();
logger.WriteLine(AsInvariant($"Handling message with label {message.Label}"));
// To avoid coupling Microsoft.Azure.WebJobs the return type is IEnumerable<T>
var outputMessages = await handler.OnMessageAsync(message).ConfigureAwait(false);
foreach (var outputMessage in outputMessages)
{
queue.Add(outputMessage);
}
await message.CompleteAsync().ConfigureAwait(false);
await message.AbandonAsync().ConfigureAwait(false);
}
}
That way I don't get the messages back into the Queue in the reboot scenario.

How to make sure last AMQP message is published successfully before closing connection?

I have multiple processes working together as a system. One of the processes acts as main process. When the system is shutting down, every process need to send a notification (via RabbitMQ) to the main process and then exit. The program is written in C++ and I am using AMQPCPP library.
The problem is that sometimes the notification is not published successfully. I suspect exiting too soon is the cause of the problem as AMQPCPP library has no chance to send the message out before closing its connection.
The documentation of AMQPCPP says:
Published messages are normally not confirmed by the server, and the RabbitMQ will not send a report back to inform you whether the message was succesfully published or not. Therefore the publish method does not return a Deferred object.
As long as no error is reported via the Channel::onError() method, you can safely assume that your messages were delivered.
This can of course be a problem when you are publishing many messages. If you get an error halfway through there is no way to know for sure how many messages made it to the broker and how many should be republished. If this is important, you can wrap the publish commands inside a transaction. In this case, if an error occurs, the transaction is automatically rolled back by RabbitMQ and none of the messages are actually published.
Without a confirmation from RabbitMQ server, it's hard to decide when it is safe to exit the process. Furthermore, using transaction sounds like overkill for a notification.
Could anyone suggest a simple solution for a graceful shutting down without losing the last notification?
It turns out that I can setup a callback when closing the channel. So that I can safely close connection when all channels are closed successfully. I am not entirely sure if this process ensures all outgoing messages are really published. However from the test result, it seems that the problem is solved.
class MyClass
{
...
AMQP::TcpConnection m_tcpConnection;
AMQP::TcpChannel m_channelA;
AMQP::TcpChannel m_channelB;
...
};
void MyClass::stop(void)
{
sendTerminateNotification();
int remainChannel = 2;
auto closeConnection = [&]() {
--remainChannel;
if (remainChannel == 0) {
// close connection when all channels are closed.
m_tcpConnection.close();
ev::get_default_loop().break_loop();
}
};
auto closeChannel = [&](AMQP::TcpChannel & channel) {
channel.close()
.onSuccess([&](void) { closeConnection(); })
.onError([&](const char * msg)
{
std::cout << "cannot close channel: "
<< msg << std::endl;
// close the connection anyway
closeConnection();
}
);
closeChannel(m_channelA);
closeChannel(m_channelB);
}