How to avoid receiving messages multiple times from a ServcieBus Queue when using the WebJobs SDK - azure-webjobs

I have got a WebJob with the following ServiceBus handler using the WebJobs SDK:
[Singleton("{MessageId}")]
public static async Task HandleMessagesAsync([ServiceBusTrigger("%QueueName%")] BrokeredMessage message, [ServiceBus("%QueueName%")]ICollector<BrokeredMessage> queue, TextWriter logger)
{
using (var scope = Program.Container.BeginLifetimeScope())
{
var handler = scope.Resolve<MessageHandlers>();
logger.WriteLine(AsInvariant($"Handling message with label {message.Label}"));
// To avoid coupling Microsoft.Azure.WebJobs the return type is IEnumerable<T>
var outputMessages = await handler.OnMessageAsync(message).ConfigureAwait(false);
foreach (var outputMessage in outputMessages)
{
queue.Add(outputMessage);
}
}
}
If the prerequisites for the handler aren't fulfilled, outputMessages contains a BrokeredMessage with the same MessageId, Label and payload as the one we are currently handling, but it contains a ScheduledEnqueueTimeUtcin the future.
The idea is that we complete the handling of the current message quickly and wait for a retry by scheduling the new message in the future.
Sometimes, especially when there are more messages in the Queue than the SDK peek-locks, I see messages duplicating in the ServiceBus queue. They have the same MessageId, Label and payload, but a different SequenceNumber, EnqueuedTimeUtc and ScheduledEnqueueTimeUtc. They all have a delivery count of 1.
Looking at my handler code, the only way this can happen is if I received the same message multiple times, figure out that I need to wait and create a new message for handling in the future. The handler finishes successfully, so the original message gets completed.
The initial messages are unique. Also I put the SingletonAttribute on the message handler, so that messages for the same MessageId cannot be consumed by different handlers.
Why are multiple handlers triggered with the same message and how can I prevent that from happening?
I am using the Microsoft.Azure.WebJobs version is v2.1.0
The duration of my handlers are at max 17s and in average 1s. The lock duration is 1m. Still my best theory is that something with the message (re)locking doesn't work, so while I'm processing the handler, the lock gets lost, the message goes back to the queue and gets consumed another time. If both handlers would see that the critical resource is still occupied, they would both enqueue a new message.

After a little bit of experimenting I figured out the root cause and I found a workaround.
If a ServiceBus message is completed, but the peek lock is not abandoned, it will return to the queue in active state after the lock expires.
The ServiceBus QueueClient, apparently, abandons the lock, once it receives the next message (or batch of messages).
So if the QueueClient used by the WebJobs SDK terminates unexpectedly (e.g. because of the process being ended or the Web App being restarted), all messages that have been locked appear back in the Queue, even if they have been completed.
In my handler I am now completing the message manually and also abandoning the lock like this:
public static async Task ProcessQueueMessageAsync([ServiceBusTrigger("%QueueName%")] BrokeredMessage message, [ServiceBus("%QueueName%")]ICollector<BrokeredMessage> queue, TextWriter logger)
{
using (var scope = Program.Container.BeginLifetimeScope())
{
var handler = scope.Resolve<MessageHandlers>();
logger.WriteLine(AsInvariant($"Handling message with label {message.Label}"));
// To avoid coupling Microsoft.Azure.WebJobs the return type is IEnumerable<T>
var outputMessages = await handler.OnMessageAsync(message).ConfigureAwait(false);
foreach (var outputMessage in outputMessages)
{
queue.Add(outputMessage);
}
await message.CompleteAsync().ConfigureAwait(false);
await message.AbandonAsync().ConfigureAwait(false);
}
}
That way I don't get the messages back into the Queue in the reboot scenario.

Related

Amazon Java SQS Client: How can I selectively delete a message from the queue?

I have a Spring Boot class the receives messages from a (currently) FIFO SQS queue like so:
ReceiveMessageRequest receiveMessageRequest = new ReceiveMessageRequest()
.withQueueUrl(queueUrl)
.withMaxNumberOfMessages(numMessages);
Map<String, String> messageMap = new HashMap<>();
try {
List<Message> messages = sqsClient.receiveMessage(receiveMessageRequest).getMessages();
if (!messages.isEmpty()) {
if (messages.size() == 1) {
Message message = messages.get(0);
String messageBody = message.getBody();
String receiptHandle = message.getReceiptHandle();
// snipped
}
}
}
I want the ability to "skip around" messages and find only a particular message to remove from this queue. My lead is certain this can be done, but I have doubts. These are my thoughts:
If I change to a Standard Queue, can this be done?
I see you have to receive a message to get the receiptHandle for the DeleteMessageRequest.
But if I receive a message I want processed, not the message to delete, how do I put it
back in the queue?
Do I extend the visibilityTimeout to let the message be picked up later?
yes, exactly as you described: receive the message, extract the receipt handle, submit a delete message request
yes
by simply not doing anything, the message will automatically pop back up in the queue after its visibility timeout expires. Note that even such a basic receive increases the receive counter and may push the message into a dlq depending on your configuration
no, extending the visibility timeout will only delay further processing even more

MismatchingMessageCorrelationException : Cannot correlate message ‘onEventReceiver’: No process definition or execution matches the parameters

We are facing an MismatchingMessageCorrelationException for the receive task in some cases (less than 5%)
The call back to notify receive task is done by :
protected void respondToCallWorker(
#NonNull final String correlationId,
final CallWorkerResultKeys result,
#Nullable final Map<String, Object> variables
) {
try {
runtimeService.createMessageCorrelation("callWorkerConsumer")
.processInstanceId(correlationId)
.setVariables(variables)
.setVariable("callStatus", result.toString())
.correlateWithResult();
} catch(Exception e) {
e.printStackTrace();
}
}
When i check the logs : i found that the query executed is this one :
select distinct RES.* from ACT_RU_EXECUTION RES
inner join ACT_RE_PROCDEF P on RES.PROC_DEF_ID_ = P.ID_
WHERE RES.PROC_INST_ID_ = 'b2362197-3bea-11eb-a150-9e4bf0efd6d0' and RES.SUSPENSION_STATE_ = '1'
and exists (select ID_ from ACT_RU_EVENT_SUBSCR EVT
where EVT.EXECUTION_ID_ = RES.ID_ and EVT.EVENT_TYPE_ = 'message'
and EVT.EVENT_NAME_ = 'callWorkerConsumer' )
Some times, When i look for the instance of the process in the database i found it waiting in the receive task
SELECT DISTINCT * FROM ACT_RU_EXECUTION RES
WHERE id_ = 'b2362197-3bea-11eb-a150-9e4bf0efd6d0'
However, when i check the subscription event, it's not yet created in the database
select ID_ from ACT_RU_EVENT_SUBSCR EVT
where EVT.EXECUTION_ID_ = 'b2362197-3bea-11eb-a150-9e4bf0efd6d0'
and EVT.EVENT_TYPE_ = 'message'
and EVT.EVENT_NAME_ = 'callWorkerConsumer'
I think that the solution is to save the "receive task" before getting the response for respondToCallWorker, but sadly i can't figure it out.
I tried "asynch before" callWorker and "Message consumer" but it did not work,
I also tried camunda.bpm.database.jdbc-batch-processing=false and got the same results,
I tried also parallel branches but i get OptimisticLocak exception and MismatchingMessageCorrelationException
Maybe i am doing it wrong
Thanks for your help
This is an interesting problem. As you already found out, the error happens, when you try to correlate the result from the "worker" before the main process ended its transaction, thus there is no message subscription registered at the time you correlate.
This problem in process orchestration is described and analyzed in this blog post, which is definitely worth reading.
Taken from that post, here is a design that should solve the issue:
You make message send and receive parallel and put an async before the send task.
By doing so, the async continuation job for the send event and the message subscription are written in the same transaction, so when the async message send executes, you already have the subscription waiting.
Although this should work and solve the issue on BPMN model level, it might be worth to consider options that do not require remodeling the process.
First, instead of calling the worker directly from your delegate, you could (assuming you are on spring boot) publish a "CallWorkerCommand" (simple pojo) and use a TransactionalEventLister on a spring bean to execute the actual call. By doing so, you first will finish the BPMN process by subscribing to the message and afterwards, spring will execute your worker call.
Second: you could use a retry mechanism like resilience4j around your correlate message call, so in the rare cases where the result comes to quickly, you fail and retry a second later.
Another solution I could think of, since you seem to be using an "external worker" pattern here, is to use an external-task-service task directly, so the send/receive synchronization gets solved by the Camunda external worker API.
So many options to choose from. I would possibly prefer the external task, followed by the transactionalEventListener, but that is a matter of personal preference.

Processing Dropped Message In Akka Streams

I have the following source queue definition.
lazy val (processMessageSource, processMessageQueueFuture) =
peekMatValue(
Source
.queue[(ProcessMessageInputData, Promise[ProcessMessageOutputData])](5, OverflowStrategy.dropNew))
def peekMatValue[T, M](src: Source[T, M]): (Source[T, M], Future[M]) {
val p = Promise[M]
val s = src.mapMaterializedValue { m =>
p.trySuccess(m)
m
}
(s, p.future)
}
The Process Message Input Data Class is essentially an artifact that is created when a caller calls a web server endpoint, which is hooked upto this stream (i.e. the service endpoint's business logic puts messages into this queue). The Promise of process message out is something that is completed downstream in the sink of the application, and the web server then has an on complete callback on this future to return the response back.
There are also other sources of ingress into this stream.
Now the buffer may be backed up since the other source may overload the system, thereby triggering stream back pressure. The existing code just drops the new message. But I still want to complete the process message output promise to complete with an exception stating something like "Throttled".
Is there a mechanism to write a custom overflow strategy, or a post processing on the overflowed element that allows me to do this?
According to https://github.com/akka/akka/blob/master/akkastream/src/main/scala/akka/stream/impl/QueueSource.scala#L83
dropNew would work just fine. On clients end it would look like.
processMessageQueue.offer(in, pr).foreach { res =>
res match {
case Enqueued => // Code to handle case when successfully enqueued.
case Dropped => // Code to handle messages that are dropped since the buffier was overflowing.
}
}

Deleting message from SQS FIFO queue: The receipt handle has expired

I switched to a FIFO queue and I got this error message when I tried to delete a message from the queue
Value {VALUE} for parameter ReceiptHandle is invalid. Reason: The receipt handle has expired.
It appears that the error happens because I tried to delete the message after visibility timeout has expired. I changed the default visibility timeout 0 to the maximum, 12 hours, this solved partially the issue. Sometimes it could happens that a message still in my queue for longer than 12 hours before I could perform it and than delete it, so I will get the error again. Is there any solution to increase the visibility timeout for more than 12 hours or to bypass this error by another way?
You can do it in AWS Console, but the trick is, you have to do it while the Polling progress is still active.
For example, when you poll for 10 seconds, and 10 messages, you need to delete the message within 10 seconds or before 10th messages arrived, whichever comes first, after the polling stopped, your window of deletion closed.
You get error when polling stopped
Adjust polling duration, and message count
While polling, select the message and delete
Message deleted successfully.
TLDR: You want to look into the ChangeMessageVisibility API.
Details
The reason for visibility timeout is to make sure the process handling the message hasn't unexpectedly died, and allow the message to be processed by a different worker.
If your process needs to take longer than the configured visibility timeout, it essentially needs to send some signal to SQS that says "I'm still alive and working on this message". That's what ChangeMessageVisibility is for.
If you have wide variability in the time required to consume and process a message, I suggest setting a small-ish default visibility timeout and having your workers emit a "heartbeat" (using ChangeMessageVisibility) to indicate they're still alive and working on the message. That way you can still recover relatively quickly when a worker legitimately fails.
Note there is also ChangeMessageVisibilityBatch for doing this on batches of messages.
Try increasing the value of VisibilityTimeout parameter in sqs.receive_message() for the message you wish to delete using ReceiptHandle
change VisibilityTimeout:0 to VisibilityTimeout:60 it's working
const params = {
AttributeNames:[
"SentTimestamp"
],
MaxNumberOfMessages:10,
MessageAttributeNames:[
"All"
],
QueueUrl:queueURL,
VisibilityTimeout:60,
WaitTimeSeconds:0,
};
sqs.receiveMessage(params,function (err,data) {
console.log(data);
if (err) {
console.log("Receive Error", err);
}else if (data.Messages) {
let deleteParams = {
QueueUrl: queueURL,
ReceiptHandle: data.Messages[0].ReceiptHandle
};
sqs.deleteMessage(deleteParams, function(err, data) {
if (err) {
console.log("Delete Error", err);
} else {
console.log("Message Deleted", data);
}
});
}
});
setting VisibilityTimeout greater than 0 will work

ActiveMQ-cpp Broker URI with PrefetchPolicy has no effect

I am using activemq-cpp 3.7.0 with VS 2010 to build a client, the server is ActiveMQ 5.8. I have created a message consumer using code similar to the following, based on the CMS configurations mentioned here. ConnClass is a ExceptionListener and a MessageListener. I only want to consume a single message before calling cms::Session::commit().
void ConnClass::setup()
{
// Create a ConnectionFactory
std::tr1::shared_ptr<ConnectionFactory> connectionFactory(
ConnectionFactory::createCMSConnectionFactory(
"tcp://localhost:61616?cms.PrefetchPolicy.queuePrefetch=1");
// Create a Connection
m_connection = std::tr1::shared_ptr<cms::Connection>(
connectionFactory->createConnection());
m_connection->start();
m_connection->setExceptionListener(this);
// Create a Session
m_session = std::tr1::shared_ptr<cms::Session>(
m_connection->createSession(Session::SESSION_TRANSACTED));
// Create the destination (Queue)
m_destination = std::tr1::shared_ptr<cms::Destination>(
m_session->createQueue("myqueue?consumer.prefetchSize=1"));
// Create a MessageConsumer from the Session to the Queue
m_consumer = std::tr1::shared_ptr<cms::MessageConsumer>(
m_session->createConsumer( m_destination.get() ));
m_consumer->setMessageListener( this );
}
void ConnClass::onMessage( const Message* message )
{
// read message code ...
// schedule a processing event for
// another thread that calls m_session->commit() when done
}
The problem is I am receiving multiple messages instead of one message before calling m_session->commit() -- I know this because the commit() call is triggered by user input. How can I ensure onMessage() is only called once before each call to commit()?
It doesn't work that way. When using async consumers the messages are delivered as fast as the onMessage method completes. If you want to consume one and only one message then use a sync receive call.
For an async consumer the prefetch allows the broker to buffer up work on the client instead of firing one at a time so you can generally get better proformance, in your case as the async onMessage call completes an ack is sent back to the broker an the next message is sent to the client.
Yes, I find this too. However, when I use the Destination URI option ( "consumer.prefetchSize=15" , http://activemq.apache.org/cms/configuring.html#Configuring-DestinationURIParameters ) for the asynchronous consumer, It works well.
BTW, I just use the latest ActiveMQ-CPP v3.9.4 by Tim , and ActiveMQ v5.12.1 on CentOS 7.
Thanks!