Camel AWS SQS and concurrentConsumers - concurrency

Camel version: camel-aws2-sqs-starter: 3.12.0
I am trying to use and understand concurentConsumers with a SQS queue:
from("aws2-sqs://queuexxx?concurrentConsumers=5&amazonSQSClient=#sqsClient&waitTimeSeconds=20")
.process(exchange -> {
System.out.println("Message received...");
})
.process(exchange -> {
try {
Thread.sleep(5000);
} catch (InterruptedException e) {
e.printStackTrace();
}});
With the above queue, if I send 3 messages at the same time, I have to wait 5 seconds to see the second message ("Message received...") and 5 more seconds to see the third one.
My understanding of concurentConsumers (also described here) is that with a value of 5 I would see the 3 messages at the same time since 3 threads will consume them.
If I add the Thread.sleep in a seda route, I'm having this behavior (= The 3 messages are read at the same time).
Turning on the Camel logs it seems that the next polling is done only after the Delete for the previous message is made (which is with a delay of 5s).
I would understand the above behavior with concurentConsumers=1 but I don't with concurentConsumers=5. Could someone tell me what I've misunderstood ?
Thank you in advance!

I ran into the same issue. I believe it relates to this defect https://issues.apache.org/jira/browse/CAMEL-17592. According to JIRA it will be fixed in the next camel release.

Related

how can I prevent dynamoDB Stream handler from infinitely processing a record when I use batchItemFailures

I have a dynamoDB stream which is triggering a lambda handler that looks like this:
let failedRequestId: string
await asyncForEachSerial(event.Records, async (record) => {
try {
await handle(record.dynamodb.OldImage, record.dynamodb.NewImage, record, context)
return true
} catch (e) {
failedRequestId = record.dynamodb.SequenceNumber
}
return false //break;
})
return {
batchItemFailures:[ { itemIdentifier: failedRequestId } ]
}
I have my lambda set up with a DestinationConfig.onFailure pointing to a DLQ I configured in SQS. The idea behind the handler is to process a batch of events and interrupt at the first failure. Then it reports the most recent failure in 'batchItemFailures' which tells the stream to continue at that record next try. (I pulled the idea from this article)
My current issue is that if there is a genuine failure of my handle() function on one of those records, then my exit code will trigger that record as my checkpoint for the next handler call. However the dlq condition doesn't ever trigger and I end up processing that record over and over again. I should also note that I am trying to avoid reprocessing records multiple times since handle() is not idempotent.
How can I elegantly handle errors while maintaining batching, but without triggering my handle() function more than once for well-behaved stream records?
I'm not sure if you have found the answer you were looking for. I'll respond in case someone else come across this issue.
There are 2 other parameters you'd want to use to avoid that issue. Quoting documentation (https://docs.aws.amazon.com/lambda/latest/dg/with-ddb.html):
Retry attempts – The maximum number of times that Lambda retries when the function returns an error. This doesn't apply to service errors or throttles where the batch didn't reach the function.
Maximum age of record – The maximum age of a record that Lambda sends to your function.
Basically, you'll have to specify how many time the failures should be retried and how far back in the events Lambda should be looking at.

MismatchingMessageCorrelationException : Cannot correlate message ‘onEventReceiver’: No process definition or execution matches the parameters

We are facing an MismatchingMessageCorrelationException for the receive task in some cases (less than 5%)
The call back to notify receive task is done by :
protected void respondToCallWorker(
#NonNull final String correlationId,
final CallWorkerResultKeys result,
#Nullable final Map<String, Object> variables
) {
try {
runtimeService.createMessageCorrelation("callWorkerConsumer")
.processInstanceId(correlationId)
.setVariables(variables)
.setVariable("callStatus", result.toString())
.correlateWithResult();
} catch(Exception e) {
e.printStackTrace();
}
}
When i check the logs : i found that the query executed is this one :
select distinct RES.* from ACT_RU_EXECUTION RES
inner join ACT_RE_PROCDEF P on RES.PROC_DEF_ID_ = P.ID_
WHERE RES.PROC_INST_ID_ = 'b2362197-3bea-11eb-a150-9e4bf0efd6d0' and RES.SUSPENSION_STATE_ = '1'
and exists (select ID_ from ACT_RU_EVENT_SUBSCR EVT
where EVT.EXECUTION_ID_ = RES.ID_ and EVT.EVENT_TYPE_ = 'message'
and EVT.EVENT_NAME_ = 'callWorkerConsumer' )
Some times, When i look for the instance of the process in the database i found it waiting in the receive task
SELECT DISTINCT * FROM ACT_RU_EXECUTION RES
WHERE id_ = 'b2362197-3bea-11eb-a150-9e4bf0efd6d0'
However, when i check the subscription event, it's not yet created in the database
select ID_ from ACT_RU_EVENT_SUBSCR EVT
where EVT.EXECUTION_ID_ = 'b2362197-3bea-11eb-a150-9e4bf0efd6d0'
and EVT.EVENT_TYPE_ = 'message'
and EVT.EVENT_NAME_ = 'callWorkerConsumer'
I think that the solution is to save the "receive task" before getting the response for respondToCallWorker, but sadly i can't figure it out.
I tried "asynch before" callWorker and "Message consumer" but it did not work,
I also tried camunda.bpm.database.jdbc-batch-processing=false and got the same results,
I tried also parallel branches but i get OptimisticLocak exception and MismatchingMessageCorrelationException
Maybe i am doing it wrong
Thanks for your help
This is an interesting problem. As you already found out, the error happens, when you try to correlate the result from the "worker" before the main process ended its transaction, thus there is no message subscription registered at the time you correlate.
This problem in process orchestration is described and analyzed in this blog post, which is definitely worth reading.
Taken from that post, here is a design that should solve the issue:
You make message send and receive parallel and put an async before the send task.
By doing so, the async continuation job for the send event and the message subscription are written in the same transaction, so when the async message send executes, you already have the subscription waiting.
Although this should work and solve the issue on BPMN model level, it might be worth to consider options that do not require remodeling the process.
First, instead of calling the worker directly from your delegate, you could (assuming you are on spring boot) publish a "CallWorkerCommand" (simple pojo) and use a TransactionalEventLister on a spring bean to execute the actual call. By doing so, you first will finish the BPMN process by subscribing to the message and afterwards, spring will execute your worker call.
Second: you could use a retry mechanism like resilience4j around your correlate message call, so in the rare cases where the result comes to quickly, you fail and retry a second later.
Another solution I could think of, since you seem to be using an "external worker" pattern here, is to use an external-task-service task directly, so the send/receive synchronization gets solved by the Camunda external worker API.
So many options to choose from. I would possibly prefer the external task, followed by the transactionalEventListener, but that is a matter of personal preference.

Processing AWS Lambda messages in Batches

I am wondering something, and I really can't find information about it. Maybe it is not the way to go but, I would just like to know.
It is about Lambda working in batches. I know I can set up Lambda to consume batch messages. In my Lambda function I iterate each message, and if one fails, Lambda exits. And the cycle starts again.
I am wondering about slightly different approach
Let's assume I have three messages: A, B and C. I also take them in batches. Now if the message B fails (e.g. API call failed), I return message B to SQS and keep processing the message C.
Is it possible? If it is, is it a good approach? Because I see that I need to implement some extra complexity in Lambda and what not.
Thanks
There's an excellent article here. The relevant parts for you are...
Using a batchSize of 1, so that messages succeed or fail on their own.
Making sure your processing is idempotent, so reprocessing a message isn't harmful, outside of the extra processing cost.
Handle errors within your function code, perhaps by catching them and sending the message to a dead letter queue for further processing.
Calling the DeleteMessage API manually within your function after successfully processing a message.
The last bullet point is how I've managed to deal with the same problem. Instead of returning errors immediately, store them or note that an error has occurred, but then continue to handle the rest of the messages in the batch. At the end of processing, return or raise an error so that the SQS -> lambda trigger knows not to delete the failed messages. All successful messages will have already been deleted by your lambda handler.
sqs = boto3.client('sqs')
def handler(event, context):
failed = False
for msg in event['Records']:
try:
# Do something with the message.
handle_message(msg)
except Exception:
# Ok it failed, but allow the loop to finish.
logger.exception('Failed to handle message')
failed = True
else:
# The message was handled successfully. We can delete it now.
sqs.delete_message(
QueueUrl=<queue_url>,
ReceiptHandle=msg['receiptHandle'],
)
# It doesn't matter what the error is. You just want to raise here
# to ensure the trigger doesn't delete any of the failed messages.
if failed:
raise RuntimeError('Failed to process one or more messages')
def handle_msg(msg):
...
For Node.js, check out https://www.npmjs.com/package/#middy/sqs-partial-batch-failure.
const middy = require('#middy/core')
const sqsBatch = require('#middy/sqs-partial-batch-failure')
const originalHandler = (event, context, cb) => {
const recordPromises = event.Records.map(async (record, index) => { /* Custom message processing logic */ })
return Promise.allSettled(recordPromises)
}
const handler = middy(originalHandler)
.use(sqsBatch())
Check out https://medium.com/#brettandrews/handling-sqs-partial-batch-failures-in-aws-lambda-d9d6940a17aa for more details.
As of Nov 2019, AWS has introduced the concept of Bisect On Function Error, along with Maximum retries. If your function is idempotent this can be used.
In this approach you should throw an error from the function even if one item in the batch is failing. AWS with split the batch into two and retry. Now one half of the batch should pass successfully. For the other half the process is continued till the bad record is isolated.
Like all architecture decisions, it depends on your goal and what you are willing to trade for more complexity. Using SQS will allow you to process messages out of order so that retries don't block other messages. Whether or not that is worth the complexity depends on why you are worried about messages getting blocked.
I suggest reading about Lambda retry behavior and Dead Letter Queues.
If you want to retry only the failed messages out of a batch of messages it is totally doable, but does add slight complexity.
A possible approach to achieve this is iterating through a list of your events (ex [eventA, eventB, eventC]), and for each execution, append to a list of failed events if the event failed. Then, have an end case that checks to see if the list of failed events has anything in it, and if it does, manually send the messages back to SQS (using SQS sendMessageBatch).
However, you should note that this puts the events to the end of the queue, since you are manually inserting them back.
Anything can be a "good approach" if it solves a problem you are having without much complexity, and in this case, the issue of having to re-execute successful events is definitely a problem that you can solve in this manner.
SQS/Lambda supports reporting batch failures. How it works is within each batch iteration, you catch all exceptions, and if that iteration fails add that messageId to an SQSBatchResponse. At the end when all SQS messages have been processed, you return the batch response.
Here is the relevant docs section: https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html#services-sqs-batchfailurereporting
To use this feature, your function must gracefully handle errors. Have your function logic catch all exceptions and report the messages that result in failure in batchItemFailures in your function response. If your function throws an exception, the entire batch is considered a complete failure.
To add to the answer by David:
SQS/Lambda supports reporting batch failures. How it works is within each batch iteration, you catch all exceptions, and if that iteration fails add that messageId to an SQSBatchResponse. At the end when all SQS messages have been processed, you return the batch response.
Here is the relevant docs section: https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html#services-sqs-batchfailurereporting
I implemented this, but a batch of A, B and C, with B failing, would still mark all three as complete. It turns out you need to explicitly define the lambda event source mapping to expect a batch failure to be returned. It can be done by adding the key of FunctionResponseTypes with the value of a list containing ReportBatchItemFailures. Here is the relevant docs: https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html#services-sqs-batchfailurereporting
My sam template looks like this after adding this:
Type: SQS
Properties:
Queue: my-queue-arn
BatchSize: 10
Enabled: true
FunctionResponseTypes:
- ReportBatchItemFailures

AWS SQS Dead Letter Queue - only in certain cases

My question isn't very easy but I believe I will find with Your help the solution for my problem. I have a microservice that reads messages from AWS SQS queue and saves it in Redis.
On AWS I have two queues:
AQueue (standard queue)
DeadLetterQueue
I'd like to:
Remove messages from my standard queue (AQueue) and move them to DeadLetterQueue when for example there is 5 times error parsing
When for example my Redis is temporarily unavailable, I'd like to NOT to remove currently read messages. In This case these messages should be read over and over again until the Redis will work.
HOW can I do this? On AWS I have set my standard queue (AQueue) to send messages to DeadLetterQueue when the messages will fail 5 times
My listener:
#SqsListener(value = "${amazon.sqs.destination}", deletionPolicy = SqsMessageDeletionPolicy.NEVER)
public void receive(String requestJSON, Acknowledgment acknowledgment) {
try (Jedis jedis = jedisPool.getResource()) {
if (redisPassword != null && !redisPassword.isEmpty()) {
jedis.auth(redisPassword);
}
long key = jedis.incr("Trace:");
Trace trace = Trace.fromJSON(requestJSON);
trace.setTechnicalId(Long.toString(key));
traceRepository.save(trace);
acknowledgment.acknowledge();
}catch (IOException e) {
log.error("Parse error: " + e.getMessage());
queueMessagingTemplate.convertAndSend(deadLetterQueue, requestJSON);
acknowledgment.acknowledge();
} catch(Exception e){
log.error("Problem with NOSQL database Redis: " + e.getMessage());
}
Unfortunately, EVEN WHEN I don't call acknowledgment.acknowledge(); my message after 5 attempts is moving to DeadLetterQueue.
It sounds like you've got the logic reversed - NOT acking the message will mean that it'll be treated like a failure (and retried, and eventually moved to the dead-letter-queue). ACK the message when you want to mark it as consumed, don't ACK it if you want it to retry/DLQ.
(Note - I'm not familiar with the spring tooling involved here, but I'm assuming straightforward mapping to core SQS concepts)
The logic should be:
If Redis is down, do not pull messages from SQS (this will keep them in the queue)
If a message is successfully processed, delete the message using the supplied MessageHandle
Configure the Queue to move messages to the Dead Letter Queue after 5 processing attempts

Deleting message from SQS FIFO queue: The receipt handle has expired

I switched to a FIFO queue and I got this error message when I tried to delete a message from the queue
Value {VALUE} for parameter ReceiptHandle is invalid. Reason: The receipt handle has expired.
It appears that the error happens because I tried to delete the message after visibility timeout has expired. I changed the default visibility timeout 0 to the maximum, 12 hours, this solved partially the issue. Sometimes it could happens that a message still in my queue for longer than 12 hours before I could perform it and than delete it, so I will get the error again. Is there any solution to increase the visibility timeout for more than 12 hours or to bypass this error by another way?
You can do it in AWS Console, but the trick is, you have to do it while the Polling progress is still active.
For example, when you poll for 10 seconds, and 10 messages, you need to delete the message within 10 seconds or before 10th messages arrived, whichever comes first, after the polling stopped, your window of deletion closed.
You get error when polling stopped
Adjust polling duration, and message count
While polling, select the message and delete
Message deleted successfully.
TLDR: You want to look into the ChangeMessageVisibility API.
Details
The reason for visibility timeout is to make sure the process handling the message hasn't unexpectedly died, and allow the message to be processed by a different worker.
If your process needs to take longer than the configured visibility timeout, it essentially needs to send some signal to SQS that says "I'm still alive and working on this message". That's what ChangeMessageVisibility is for.
If you have wide variability in the time required to consume and process a message, I suggest setting a small-ish default visibility timeout and having your workers emit a "heartbeat" (using ChangeMessageVisibility) to indicate they're still alive and working on the message. That way you can still recover relatively quickly when a worker legitimately fails.
Note there is also ChangeMessageVisibilityBatch for doing this on batches of messages.
Try increasing the value of VisibilityTimeout parameter in sqs.receive_message() for the message you wish to delete using ReceiptHandle
change VisibilityTimeout:0 to VisibilityTimeout:60 it's working
const params = {
AttributeNames:[
"SentTimestamp"
],
MaxNumberOfMessages:10,
MessageAttributeNames:[
"All"
],
QueueUrl:queueURL,
VisibilityTimeout:60,
WaitTimeSeconds:0,
};
sqs.receiveMessage(params,function (err,data) {
console.log(data);
if (err) {
console.log("Receive Error", err);
}else if (data.Messages) {
let deleteParams = {
QueueUrl: queueURL,
ReceiptHandle: data.Messages[0].ReceiptHandle
};
sqs.deleteMessage(deleteParams, function(err, data) {
if (err) {
console.log("Delete Error", err);
} else {
console.log("Message Deleted", data);
}
});
}
});
setting VisibilityTimeout greater than 0 will work