I'm using Azure Webjobs to process messages from a queue.
I saw that the Webjobs SDK processes any failed message again after 10 minutes, and it if fails 5 times it moves it to the poison queue (1).
Also I can see the nextVisibleTime of the message in the queue, that is 10 minutes after the insertionTime (2).
I want to use the AzureSDK error handling of the messages but I cannot wait 10 minutes for the message to be processed again.
Is there any way I can set this nextVisibleTime to a few seconds?
Create a .NET WebJob in Azure App Service
If the method fails before completing, the queue message is not deleted; after a 10-minute lease expires, the message is released to be picked up again and processed.
How to use Azure queue storage with the WebJobs SDK
public static void WriteLog([QueueTrigger("logqueue")] string logMessage,
DateTimeOffset expirationTime,
DateTimeOffset insertionTime,
DateTimeOffset nextVisibleTime,
Note: There are similar questions here in StackOverflow but with no answer:
QueueTrigger Attribute Visibility Timeout
Azure WebJob QueueTrigger Retry Policy
In the latest v1.1.0 release, you can now control the visibility timeout by registering your own custom QueueProcessor instances via JobHostConfiguration.Queues.QueueProcessorFactory. This allows you to control advanced message processing behavior globally or per queue/function.
For example, to set the visibility for failed messages, you can override ReleaseMessageAsync as follows:
protected override async Task ReleaseMessageAsync(CloudQueueMessage message, FunctionResult result, TimeSpan visibilityTimeout, CancellationToken cancellationToken)
{
// demonstrates how visibility timeout for failed messages can be customized
// the logic here could implement exponential backoff, etc.
visibilityTimeout = TimeSpan.FromSeconds(message.DequeueCount);
await base.ReleaseMessageAsync(message, result, visibilityTimeout, cancellationToken);
}
More details can be found in the release notes here.
If there is an exception while processing your function, the SDK will put the message back in the queue instantly and the message will be reprocessed. Are you not seeing this behavior?
Related
How do I configure visibility timeout so that a message in SQS can be read again?
I have Amazon SQS as a message queue. Messages are being sent by multiple applications. I am now using Spring listener to read message in queue as below:
public DefaultMessageListenerContainer jmsListenerContainer() {
SQSConnectionFactory sqsConnectionFactory = SQSConnectionFactory.builder()
.withAWSCredentialsProvider(new DefaultAWSCredentialsProviderChain())
.withEndpoint(environment.getProperty("aws_sqs_url"))
.withAWSCredentialsProvider(awsCredentialsProvider)
.withNumberOfMessagesToPrefetch(10).build();
DefaultMessageListenerContainer dmlc = new DefaultMessageListenerContainer();
dmlc.setConnectionFactory(sqsConnectionFactory);
dmlc.setDestinationName(environment.getProperty("aws_sqs_queue"));
dmlc.setMessageListener(queueListener);
return dmlc;
}
The class queueListener implements javax.jms.MessageListener which uses onMessage() method further.
I have also configured a scheduler to read the queue again after a certain period of time. It uses receiveMessage() of com.amazonaws.services.sqs.AmazonSQS.
As soon as message reach the queue the listener reads the message. I want to read the message again after certain period of time i.e. through scheduler, but once a message is read by listener it does not become visible or read again. As per Amazon's SQS developer guide the default visibility timeout is 30 seconds, but that message is not becoming visible even after 30 seconds. I have tried setting custom visibility timeout in SQS QUEUE PARAMETER CONSOLE, but it's not working.
For information, nobody is deleting the message from the queue.
I only have a passing familiarity with Amazon SQS, but I can say that typically in messaging use-cases when a consumer receives and acknowledges the message then that message is removed (i.e. deleted) from the queue. Given that your Spring application is receiving the message I would suspect it is also acknowledging the message and therefore removing it from the queue which prevents your scheduler from receiving it later. Note that Spring's DefaultMessageListenerContainer uses JMS' AUTO_ACKNOWLEDGE mode by default.
This documentation from Amazon essentially states that if a message is acknowledged in a JMS context that it is "deleted from the underlying Amazon SQS queue."
I have a system where a Lambda is triggered with event source as an SQS Queue.Each message gets our own internal unique id to differentiate between two requests .
Now lambda deletes the message from the queue automatically after sqs invocation and keeps the message in inflight while processing it so duplicate processing of a unique message should never occur ideally.
But when I checked my logs a message with the same unique id was processed within 100 milliseconds of the time frame of each other.
So This seems like two lambdas were triggered for one message and something failed at the end of aws it was either visibility timeout or something else.I have read online that few others have gone through the same situation.
Can anyone who has gone through the same situation explain how did they solve it or people with current scalable systems who don't have this kind of issue can help me out with the reasons why I could be having it ?
Note:- One single message was successfully executed Twice this wasn't the case of retry on failure.
I faced a similar issue, where a lambda (let's call it lambda-1) is triggered through a queue, and lambda-1 further invokes lambda-2 'synchronously' (https://docs.aws.amazon.com/lambda/latest/dg/invocation-sync.html) and the message basically goes to inflight and return back after visibility timeout expiry and triggers lambda-1 again. This goes on in a loop.
As per the link above:
"For functions with a long timeout, your client might be disconnected
during synchronous invocation while it waits for a response. Configure
your HTTP client, SDK, firewall, proxy, or operating system to allow
for long connections with timeout or keep-alive settings."
Making async calls in lambda-1 can resolve this issue. In the case above, invoking lambda-2 with InvocationType='Event' returns back, which in-turn deletes the item from queue.
The official documentation does mention that Google Cloud Pub / Sub resends messages to subscribers until subscribers acknowledge the message receipt when using official Cloud Pub / Sub Node.js client.
But it does not explicitly mention this for background functions if they return a callback error. Refer https://cloud.google.com/functions/docs/writing/background.
If it helps - My background function does not use the official Cloud Pub / Sub Node.js client since I get all the required info from the event arguement itself.
From documentation: https://cloud.google.com/functions/docs/bestpractices/retries
Cloud Functions guarantees at-least-once execution of a background
function for each event emitted by an event source. However, by
default, if a function invocation terminates with an error, the
function will not be invoked again, and the event will be dropped.
When you enable retries on a background function, Cloud Functions will
retry a failed function invocation until it completes successfully, or
the retry window expires.
And described below, you can enable retry on errors:
In any of the above cases, the function stops executing by default and
the event is discarded. If you want to retry the function when an
error occurs, you can change the default retry policy by setting the
"retry on failure" property. This causes the event to be retried
repeatedly for up to multiple days until the function successfully
completes.
Listening to a AWS SQS queue, using spring cloud as follows:
#SqsListener(value = "${queue.name}", deletionPolicy = SqsMessageDeletionPolicy.ON_SUCCESS)
public void queueListener(String message, #Headers Map<String, Object> sqsHeaders) {
// code
}
Spring config:
<aws-messaging:annotation-driven-queue-listener
max-number-of-messages="10" wait-time-out="20" visibility-timeout="3600"
amazon-sqs="awsSqsClient" />
AwsSqsClient:
#Bean
public com.amazonaws.services.sqs.AmazonSQSAsyncClient awsSqsClient() {
ExecutorService executorService = Executors.newFixedThreadPool(10);
return new AmazonSQSAsyncClient(new DefaultAWSCredentialsProviderChain(), executorService);
}
This works fine.
Configured 10 threads to process these messages in SQS client as you can see above code. This is also working fine, at any point of time maximum 10 messages are processed.
The issue is, I couldn't figure-out a way to control the polling interval. By default spring polls once all threads are free.
i.e. consider the following example
Around 3 messages are delivered to Queue
Spring polls the queue and get 3 messages
3 messages are processing each message take roughly about 20 minutues
In the meantime there are around 25 messages delivered to queue. Spring is NOT polling the queue until all the 3 messages delivered earlier completed. Esentially as per example above Spring polls only after 20 minutes though there are 7 threads still free!!
Any idea how we can control this polling? i.e. Poll should start if there are any threads free and should not wait until all threads become free
Your listener can load messages into your Spring app and submit them to another thread pool along with Acknowledgement and Visibility objects (if you want to control both).
Once messages are submitted to this thread pool, your listener can load more data. You can control the concurrency by adjusting thread pool settings.
Your listener's method signature will be similar to one below:
#SqsListener(value = "${queueName}", deletionPolicy = SqsMessageDeletionPolicy.NEVER)
public void listen(YourCustomPOJO pojo,
#Headers Map<String, Object> headers,
Acknowledgment acknowledgment,
Visibility visibility) throws Exception {
...... Send pojo to worker thread and return
A worker thread then will acknowledge the successful processing
acknowledgment.acknowledge().get();
Make sure your message visibility is set to a value that is greater than your highest processing time (use some timeout to limit execution time).
Scenario: producer send a message into the Storage Queue, a WebJobs process the message on QueueTrigger, each message must only be processed once, there could be multiple WebJob instances.
I've been googling and from what I've read, I need to write the function that processes the message to be idempotent so a message isn't processed twice. I've also read that there is a default lease time of 10 minutes for a message.
My question is, when the QueueTrigger is triggered on one WebJob instance, does it set the lease time on the message so that another WebJob can't pick up the same message? If so why do I need to account for the possibility that the message can be processed twice? Or am I misunderstanding this?
If you are using the built-in queue trigger attributes, it will automatically ensure that any given message gets processed once, even when a site scales out to multiple instances. This is posted on the article in the discussion section, https://azure.microsoft.com/en-us/documentation/articles/websites-dotnet-webjobs-sdk-get-started/
In the same article you will find clarification regarding the 10 minute lease. In summary, the QueueTrigger attribute directs the WebJobs SDK to call a method when a new message is received in queue. The message is processed and when the method completes, the queue message is deleted. If the method fails before completing, the queue message is not deleted; after a 10-minute lease expires, the message is released to be picked up again and processed. This sequence won't be repeated indefinitely if a message always causes an exception. After 5 unsuccessful attempts to process a message, the message is moved to the poison queue. The maximum number of attempts is configurable.
Your process need to be idempotent. Because
Facts:
A webjob leases a message (No other webjob can get it).
A webjob deletes a message when its job is done.
If a webjob crashes while processing a message, its lease will time out and another webjob will get and start to process that. (default retry is 5 for a messsage, after that it goes to poison queue)
So if a webjob crashes after its job is done but before it deletes the message, then the message will be released after a while and the same job will be done again.
Therefore your process need to be idempotent.