AWS SQS Dead Letter Queue - only in certain cases - amazon-web-services

My question isn't very easy but I believe I will find with Your help the solution for my problem. I have a microservice that reads messages from AWS SQS queue and saves it in Redis.
On AWS I have two queues:
AQueue (standard queue)
DeadLetterQueue
I'd like to:
Remove messages from my standard queue (AQueue) and move them to DeadLetterQueue when for example there is 5 times error parsing
When for example my Redis is temporarily unavailable, I'd like to NOT to remove currently read messages. In This case these messages should be read over and over again until the Redis will work.
HOW can I do this? On AWS I have set my standard queue (AQueue) to send messages to DeadLetterQueue when the messages will fail 5 times
My listener:
#SqsListener(value = "${amazon.sqs.destination}", deletionPolicy = SqsMessageDeletionPolicy.NEVER)
public void receive(String requestJSON, Acknowledgment acknowledgment) {
try (Jedis jedis = jedisPool.getResource()) {
if (redisPassword != null && !redisPassword.isEmpty()) {
jedis.auth(redisPassword);
}
long key = jedis.incr("Trace:");
Trace trace = Trace.fromJSON(requestJSON);
trace.setTechnicalId(Long.toString(key));
traceRepository.save(trace);
acknowledgment.acknowledge();
}catch (IOException e) {
log.error("Parse error: " + e.getMessage());
queueMessagingTemplate.convertAndSend(deadLetterQueue, requestJSON);
acknowledgment.acknowledge();
} catch(Exception e){
log.error("Problem with NOSQL database Redis: " + e.getMessage());
}
Unfortunately, EVEN WHEN I don't call acknowledgment.acknowledge(); my message after 5 attempts is moving to DeadLetterQueue.

It sounds like you've got the logic reversed - NOT acking the message will mean that it'll be treated like a failure (and retried, and eventually moved to the dead-letter-queue). ACK the message when you want to mark it as consumed, don't ACK it if you want it to retry/DLQ.
(Note - I'm not familiar with the spring tooling involved here, but I'm assuming straightforward mapping to core SQS concepts)

The logic should be:
If Redis is down, do not pull messages from SQS (this will keep them in the queue)
If a message is successfully processed, delete the message using the supplied MessageHandle
Configure the Queue to move messages to the Dead Letter Queue after 5 processing attempts

Related

Camel AWS SQS and concurrentConsumers

Camel version: camel-aws2-sqs-starter: 3.12.0
I am trying to use and understand concurentConsumers with a SQS queue:
from("aws2-sqs://queuexxx?concurrentConsumers=5&amazonSQSClient=#sqsClient&waitTimeSeconds=20")
.process(exchange -> {
System.out.println("Message received...");
})
.process(exchange -> {
try {
Thread.sleep(5000);
} catch (InterruptedException e) {
e.printStackTrace();
}});
With the above queue, if I send 3 messages at the same time, I have to wait 5 seconds to see the second message ("Message received...") and 5 more seconds to see the third one.
My understanding of concurentConsumers (also described here) is that with a value of 5 I would see the 3 messages at the same time since 3 threads will consume them.
If I add the Thread.sleep in a seda route, I'm having this behavior (= The 3 messages are read at the same time).
Turning on the Camel logs it seems that the next polling is done only after the Delete for the previous message is made (which is with a delay of 5s).
I would understand the above behavior with concurentConsumers=1 but I don't with concurentConsumers=5. Could someone tell me what I've misunderstood ?
Thank you in advance!
I ran into the same issue. I believe it relates to this defect https://issues.apache.org/jira/browse/CAMEL-17592. According to JIRA it will be fixed in the next camel release.

Amazon Java SQS Client: How can I selectively delete a message from the queue?

I have a Spring Boot class the receives messages from a (currently) FIFO SQS queue like so:
ReceiveMessageRequest receiveMessageRequest = new ReceiveMessageRequest()
.withQueueUrl(queueUrl)
.withMaxNumberOfMessages(numMessages);
Map<String, String> messageMap = new HashMap<>();
try {
List<Message> messages = sqsClient.receiveMessage(receiveMessageRequest).getMessages();
if (!messages.isEmpty()) {
if (messages.size() == 1) {
Message message = messages.get(0);
String messageBody = message.getBody();
String receiptHandle = message.getReceiptHandle();
// snipped
}
}
}
I want the ability to "skip around" messages and find only a particular message to remove from this queue. My lead is certain this can be done, but I have doubts. These are my thoughts:
If I change to a Standard Queue, can this be done?
I see you have to receive a message to get the receiptHandle for the DeleteMessageRequest.
But if I receive a message I want processed, not the message to delete, how do I put it
back in the queue?
Do I extend the visibilityTimeout to let the message be picked up later?
yes, exactly as you described: receive the message, extract the receipt handle, submit a delete message request
yes
by simply not doing anything, the message will automatically pop back up in the queue after its visibility timeout expires. Note that even such a basic receive increases the receive counter and may push the message into a dlq depending on your configuration
no, extending the visibility timeout will only delay further processing even more

Processing AWS Lambda messages in Batches

I am wondering something, and I really can't find information about it. Maybe it is not the way to go but, I would just like to know.
It is about Lambda working in batches. I know I can set up Lambda to consume batch messages. In my Lambda function I iterate each message, and if one fails, Lambda exits. And the cycle starts again.
I am wondering about slightly different approach
Let's assume I have three messages: A, B and C. I also take them in batches. Now if the message B fails (e.g. API call failed), I return message B to SQS and keep processing the message C.
Is it possible? If it is, is it a good approach? Because I see that I need to implement some extra complexity in Lambda and what not.
Thanks
There's an excellent article here. The relevant parts for you are...
Using a batchSize of 1, so that messages succeed or fail on their own.
Making sure your processing is idempotent, so reprocessing a message isn't harmful, outside of the extra processing cost.
Handle errors within your function code, perhaps by catching them and sending the message to a dead letter queue for further processing.
Calling the DeleteMessage API manually within your function after successfully processing a message.
The last bullet point is how I've managed to deal with the same problem. Instead of returning errors immediately, store them or note that an error has occurred, but then continue to handle the rest of the messages in the batch. At the end of processing, return or raise an error so that the SQS -> lambda trigger knows not to delete the failed messages. All successful messages will have already been deleted by your lambda handler.
sqs = boto3.client('sqs')
def handler(event, context):
failed = False
for msg in event['Records']:
try:
# Do something with the message.
handle_message(msg)
except Exception:
# Ok it failed, but allow the loop to finish.
logger.exception('Failed to handle message')
failed = True
else:
# The message was handled successfully. We can delete it now.
sqs.delete_message(
QueueUrl=<queue_url>,
ReceiptHandle=msg['receiptHandle'],
)
# It doesn't matter what the error is. You just want to raise here
# to ensure the trigger doesn't delete any of the failed messages.
if failed:
raise RuntimeError('Failed to process one or more messages')
def handle_msg(msg):
...
For Node.js, check out https://www.npmjs.com/package/#middy/sqs-partial-batch-failure.
const middy = require('#middy/core')
const sqsBatch = require('#middy/sqs-partial-batch-failure')
const originalHandler = (event, context, cb) => {
const recordPromises = event.Records.map(async (record, index) => { /* Custom message processing logic */ })
return Promise.allSettled(recordPromises)
}
const handler = middy(originalHandler)
.use(sqsBatch())
Check out https://medium.com/#brettandrews/handling-sqs-partial-batch-failures-in-aws-lambda-d9d6940a17aa for more details.
As of Nov 2019, AWS has introduced the concept of Bisect On Function Error, along with Maximum retries. If your function is idempotent this can be used.
In this approach you should throw an error from the function even if one item in the batch is failing. AWS with split the batch into two and retry. Now one half of the batch should pass successfully. For the other half the process is continued till the bad record is isolated.
Like all architecture decisions, it depends on your goal and what you are willing to trade for more complexity. Using SQS will allow you to process messages out of order so that retries don't block other messages. Whether or not that is worth the complexity depends on why you are worried about messages getting blocked.
I suggest reading about Lambda retry behavior and Dead Letter Queues.
If you want to retry only the failed messages out of a batch of messages it is totally doable, but does add slight complexity.
A possible approach to achieve this is iterating through a list of your events (ex [eventA, eventB, eventC]), and for each execution, append to a list of failed events if the event failed. Then, have an end case that checks to see if the list of failed events has anything in it, and if it does, manually send the messages back to SQS (using SQS sendMessageBatch).
However, you should note that this puts the events to the end of the queue, since you are manually inserting them back.
Anything can be a "good approach" if it solves a problem you are having without much complexity, and in this case, the issue of having to re-execute successful events is definitely a problem that you can solve in this manner.
SQS/Lambda supports reporting batch failures. How it works is within each batch iteration, you catch all exceptions, and if that iteration fails add that messageId to an SQSBatchResponse. At the end when all SQS messages have been processed, you return the batch response.
Here is the relevant docs section: https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html#services-sqs-batchfailurereporting
To use this feature, your function must gracefully handle errors. Have your function logic catch all exceptions and report the messages that result in failure in batchItemFailures in your function response. If your function throws an exception, the entire batch is considered a complete failure.
To add to the answer by David:
SQS/Lambda supports reporting batch failures. How it works is within each batch iteration, you catch all exceptions, and if that iteration fails add that messageId to an SQSBatchResponse. At the end when all SQS messages have been processed, you return the batch response.
Here is the relevant docs section: https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html#services-sqs-batchfailurereporting
I implemented this, but a batch of A, B and C, with B failing, would still mark all three as complete. It turns out you need to explicitly define the lambda event source mapping to expect a batch failure to be returned. It can be done by adding the key of FunctionResponseTypes with the value of a list containing ReportBatchItemFailures. Here is the relevant docs: https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html#services-sqs-batchfailurereporting
My sam template looks like this after adding this:
Type: SQS
Properties:
Queue: my-queue-arn
BatchSize: 10
Enabled: true
FunctionResponseTypes:
- ReportBatchItemFailures

Deleting message from SQS FIFO queue: The receipt handle has expired

I switched to a FIFO queue and I got this error message when I tried to delete a message from the queue
Value {VALUE} for parameter ReceiptHandle is invalid. Reason: The receipt handle has expired.
It appears that the error happens because I tried to delete the message after visibility timeout has expired. I changed the default visibility timeout 0 to the maximum, 12 hours, this solved partially the issue. Sometimes it could happens that a message still in my queue for longer than 12 hours before I could perform it and than delete it, so I will get the error again. Is there any solution to increase the visibility timeout for more than 12 hours or to bypass this error by another way?
You can do it in AWS Console, but the trick is, you have to do it while the Polling progress is still active.
For example, when you poll for 10 seconds, and 10 messages, you need to delete the message within 10 seconds or before 10th messages arrived, whichever comes first, after the polling stopped, your window of deletion closed.
You get error when polling stopped
Adjust polling duration, and message count
While polling, select the message and delete
Message deleted successfully.
TLDR: You want to look into the ChangeMessageVisibility API.
Details
The reason for visibility timeout is to make sure the process handling the message hasn't unexpectedly died, and allow the message to be processed by a different worker.
If your process needs to take longer than the configured visibility timeout, it essentially needs to send some signal to SQS that says "I'm still alive and working on this message". That's what ChangeMessageVisibility is for.
If you have wide variability in the time required to consume and process a message, I suggest setting a small-ish default visibility timeout and having your workers emit a "heartbeat" (using ChangeMessageVisibility) to indicate they're still alive and working on the message. That way you can still recover relatively quickly when a worker legitimately fails.
Note there is also ChangeMessageVisibilityBatch for doing this on batches of messages.
Try increasing the value of VisibilityTimeout parameter in sqs.receive_message() for the message you wish to delete using ReceiptHandle
change VisibilityTimeout:0 to VisibilityTimeout:60 it's working
const params = {
AttributeNames:[
"SentTimestamp"
],
MaxNumberOfMessages:10,
MessageAttributeNames:[
"All"
],
QueueUrl:queueURL,
VisibilityTimeout:60,
WaitTimeSeconds:0,
};
sqs.receiveMessage(params,function (err,data) {
console.log(data);
if (err) {
console.log("Receive Error", err);
}else if (data.Messages) {
let deleteParams = {
QueueUrl: queueURL,
ReceiptHandle: data.Messages[0].ReceiptHandle
};
sqs.deleteMessage(deleteParams, function(err, data) {
if (err) {
console.log("Delete Error", err);
} else {
console.log("Message Deleted", data);
}
});
}
});
setting VisibilityTimeout greater than 0 will work

Message retry and Dead Letter Queue in WSO2 2.2.0 Message Broker

We are evaluating the WSO2 stack and in particular the Message Broker v 2.2.0 and are not able to make the message retry limit work.
According to this documentation page, once the client has rejected a message 10 times it will be removed from the queue and placed on the dead letter queue.
https://docs.wso2.com/display/MB220/Maximum+Delivery+Attempts
Out definition of rejection is either:
a) Not sending acknowledgement in the case of using Session.CLIENT_ACKNOWLEDGE or
b) Rolling back the transaction in the case of using a transacted session.
Using the WSO2 example client code we are unable to observe this behaviour using any combination of client acknowledgement modes or induced failures. The message remains active in the queue and can be taken from it any number of times. Acknowledging it or committing the session removes it from the queue as you would expect.
Can anyone confirm if this feature actually works and if so, show us what a client has to do to trigger it. We have been testing using the WSO2 provided sample client code and an unmodified out-of-the-box server config:
https://docs.wso2.com/display/MB220/Sending+and+Receiving+Messages+Using+Queues
Any help would be appreciated as we are unable to continue with WSO2 without understanding exactly how this aspect of the system works.
This feature is working as expected. In order to test that you need to do some modification to the provided receiver client in the sample code.
Add the given system property
Change the abknlowdgment mode to CLIENT_ACK
Get the message for 10 times without sending the ACK to server
With these changes you can cater your requirement.
Here I am posting the modified method in the QueueReceiver class
public void receiveMessages() throws NamingException, JMSException {
Properties properties = new Properties();
System.setProperty("AndesAckWaitTimeOut", "30000");
properties.put(Context.INITIAL_CONTEXT_FACTORY, QPID_ICF);
properties.put(CF_NAME_PREFIX + CF_NAME, getTCPConnectionURL(userName, password));
System.out.println("getTCPConnectionURL(userName,password) = " + getTCPConnectionURL(userName, password));
InitialContext ctx = new InitialContext(properties);
// Lookup connection factory
QueueConnectionFactory connFactory = (QueueConnectionFactory) ctx.lookup(CF_NAME);
QueueConnection queueConnection = connFactory.createQueueConnection();
queueConnection.start();
QueueSession queueSession =
queueConnection.createQueueSession(false, QueueSession.CLIENT_ACKNOWLEDGE);
//Receive message
Queue queue = queueSession.createQueue(queueName);
MessageConsumer queueReceiver = queueSession.createConsumer(queue);
int count =0;
while (count < 12) {
TextMessage message = (TextMessage) queueReceiver.receive();
System.out.println("Got message ==>" + message.getText());
count++;
}
queueReceiver.close();
queueSession.close();
queueConnection.stop();
queueConnection.close();
}
Please note that this modification is done for just proofing that the feature is working.