Prevent multiple instances of an Azure web job processing the same Queue message - azure-webjobs

I noticed that multiple instances of my Web job are receiving the same message and end up acting on it. This is not the desired behavior. I would like multiple messages to be processed concurrently, however, I do not want the same message being processed by multiple instances of the web job.
My web job is of the continuous running type.
I use a QueueTrigger to receive the message and invoke the function
My function runs for several hours.
I have looked into the JobHostConfiguration.BatchSize and MaxDequeue properties and I am not sure on these. I simply want a single instance processing a message and that it could take several hours to complete.
This is what I see in the web job logs indicating the message is received twice.
[01/24/2017 16:17:30 > 7e0338: INFO] Executing:
'Functions.RunExperiment' - Reason: 'New queue message detected on
'runexperiment'.'
[01/24/2017 16:17:30 > 7e0338: INFO] Executing:
'Functions.RunExperiment' - Reason: 'New queue message detected on
'runexperiment'.'

According to the official document, if we use Azure queue storage in the WebJob on the multiple instance, we no need to write code to prevent multiple instances to processing the same queue message.
The WebJobs SDK queue trigger automatically prevents a function from processing a queue message multiple times; functions do not have to be written to be idempote.
I deployed a WebJob on the 2 instances WebApp, It also works correctly(not execute twice with same queue message). It could run on the 2 instances and there is no duplicate executed message.
So it is very odd that the queue message is executed twice, please have a try to debug it whether there are 2 queue messages that have the same content are triggered.
The following is my debug code. I wrote the message that with the executed time and instance id info into another queue.
public static void ProcessQueueMessage([QueueTrigger("queue")] string message, [Queue("logqueue")] out string newMessage, TextWriter log)
{
string instance = Environment.GetEnvironmentVariable("WEBSITE_INSTANCE_ID");
string newMsg = $"WEBSITE_INSTANCE_ID:{instance}, timestamp:{DateTime.Now}, Message:{message}";
log.WriteLine(newMsg);
Console.WriteLine(newMsg);
newMessage = newMsg;
}
}

I had the same issue of a single message processed multiple times at the same time. The issue disappeared as soon as I have set the MaxPollingInterval property...

Related

AWS Lambda triggered twice for a sigle SQS Message

I have a system where a Lambda is triggered with event source as an SQS Queue.Each message gets our own internal unique id to differentiate between two requests .
Now lambda deletes the message from the queue automatically after sqs invocation and keeps the message in inflight while processing it so duplicate processing of a unique message should never occur ideally.
But when I checked my logs a message with the same unique id was processed within 100 milliseconds of the time frame of each other.
So This seems like two lambdas were triggered for one message and something failed at the end of aws it was either visibility timeout or something else.I have read online that few others have gone through the same situation.
Can anyone who has gone through the same situation explain how did they solve it or people with current scalable systems who don't have this kind of issue can help me out with the reasons why I could be having it ?
Note:- One single message was successfully executed Twice this wasn't the case of retry on failure.
I faced a similar issue, where a lambda (let's call it lambda-1) is triggered through a queue, and lambda-1 further invokes lambda-2 'synchronously' (https://docs.aws.amazon.com/lambda/latest/dg/invocation-sync.html) and the message basically goes to inflight and return back after visibility timeout expiry and triggers lambda-1 again. This goes on in a loop.
As per the link above:
"For functions with a long timeout, your client might be disconnected
during synchronous invocation while it waits for a response. Configure
your HTTP client, SDK, firewall, proxy, or operating system to allow
for long connections with timeout or keep-alive settings."
Making async calls in lambda-1 can resolve this issue. In the case above, invoking lambda-2 with InvocationType='Event' returns back, which in-turn deletes the item from queue.

Is it possible to load the same message from AWS SQS more than once

I have an SQS FIFO queue which we send bunch of ids for processing on the other end. We have 4 workers digesting the message. Once the worker receives the message, it deletes the msg and stores these ids until it hits a limit before performing actions.
What I've noticed is that some ids are received more than once when each id is only sent once. Is it normal?
Your current process appears to be:
A worker pulls (Receives) a message from a queue
It deletes the message
It performs actions on the message
This is not the recommended way to use a queue because the worker might fail after it has deleted the message but before it has completed the action. Thus, the message would be "lost".
The recommended way to use a queue would be:
Pull a message from the queue (makes the message temporarily invisible)
Process the message
Delete the message
This way, if the worker fails while processing the message, it will automatically "reappear" on the queue after the invisibility period. The worker can also send a "still working" signal to keep the message invisible for longer while it is being processed.
Amazon SQS FIFO queues provide exactly-once processing. This means that a message will only be delivered once. (However, if the invisibility period expires before the message is deleted, it will be provided again.)
You say that "some ids are received more than once". I would recommend adding debug code to try and understand the circumstances in which this happens, since it should not be happening if the messages are deleted within the invisibility period.

Azure Web Job reading message from Service Bus doesnt delete message after

The scenario here is that we have a service bus queue and a web job. The web job reads the message from the service bus queue and calls a logic up which then goes on and does other stuff.
The problem we are facing is that after the web job reads the message from the service bus, it occasionally doesn't delete it after, which constantly causes the logic app to be called and flood our database with data.
Here is the message in question as seen from azure management studio:
https://gyazo.com/7f57b460421d1bb4a69fcb8b5a9ff01f
As you can see, there is no lock time on the message. I have tried to play around with the settings to no avail.
When i manually try to delete that message from azure management studio it is also unsuccessful but there is no error message received.
Does anyone know what is going on here? I feel like this is a problem with the queue itself as opposed to a bug in our code since 2-3 tools that i have used are unable to delete this message from the queue.
It looks like the message is only deleted after a specific time (does not go to the dead-letter queue however).
Thanks
So just for information, i figured my own issue out. When the file scraper job runs, it puts a message in the service bus. The webjob now that runs and picks up that file stores the file that it just picked up locally as well as on blob storage.
The problem was that webjob keeps a queue of what it processes locally which was never cleared so every time the webjob run, it was processing all previous files as well.

Azure Storage Queue and multiple WebJobs instances: will QueueTrigger set the message lease time on triggered?

Scenario: producer send a message into the Storage Queue, a WebJobs process the message on QueueTrigger, each message must only be processed once, there could be multiple WebJob instances.
I've been googling and from what I've read, I need to write the function that processes the message to be idempotent so a message isn't processed twice. I've also read that there is a default lease time of 10 minutes for a message.
My question is, when the QueueTrigger is triggered on one WebJob instance, does it set the lease time on the message so that another WebJob can't pick up the same message? If so why do I need to account for the possibility that the message can be processed twice? Or am I misunderstanding this?
If you are using the built-in queue trigger attributes, it will automatically ensure that any given message gets processed once, even when a site scales out to multiple instances. This is posted on the article in the discussion section, https://azure.microsoft.com/en-us/documentation/articles/websites-dotnet-webjobs-sdk-get-started/
In the same article you will find clarification regarding the 10 minute lease. In summary, the QueueTrigger attribute directs the WebJobs SDK to call a method when a new message is received in queue. The message is processed and when the method completes, the queue message is deleted. If the method fails before completing, the queue message is not deleted; after a 10-minute lease expires, the message is released to be picked up again and processed. This sequence won't be repeated indefinitely if a message always causes an exception. After 5 unsuccessful attempts to process a message, the message is moved to the poison queue. The maximum number of attempts is configurable.
Your process need to be idempotent. Because
Facts:
A webjob leases a message (No other webjob can get it).
A webjob deletes a message when its job is done.
If a webjob crashes while processing a message, its lease will time out and another webjob will get and start to process that. (default retry is 5 for a messsage, after that it goes to poison queue)
So if a webjob crashes after its job is done but before it deletes the message, then the message will be released after a while and the same job will be done again.
Therefore your process need to be idempotent.

SQS Messages never gets removed/deleted after script runs

I'm having issues where my SQS Messages are never deleted from the SQS Queue. They are only removed when the lifetime ends, which is 4 days.
So to summarize the app:
Send URL to SQS Queue to wait to be crawled
Send message to Elastic Beanstalk App that crawls the data and store it in database
The script seems to be working in the meaning that it does receive the message, and it does crawl it successfully and store the data successfully in the database. The only issue is that the messages remain in the queue, stuck at "Message Available".
So if I for example load the queue with 800 messages, it will be stuck at ~800 messages for 4 days and then they will all be deleted instantly because of the lifetime value. It seems like a few messages get deleted because the number changes slightly, but a large majority is never removed from the queue.
So question:
Isn't SQS supposed to remove the message as soon as it has been send and received by the script?
Is there a manual way for me to in the script itself, delete the current message? From what I know the message is only sent 1 way. From SQS -> App. So from what I know, I can not do SQS <-> App.
Any ideas?
A web application in a worker environment tier should only listen on
the local host. When the web application in the worker environment
tier returns a 200 OK response to acknowledge that it has received and
successfully processed the request, the daemon sends a DeleteMessage
call to the SQS queue so that the message will be deleted from the
queue. (SQS automatically deletes messages that have been in a queue
for longer than the configured RetentionPeriod.) If the application
returns any response other than 200 OK or there is no response within
the configured InactivityTimeout period, SQS once again makes the
message visible in the queue and available for another attempt at
processing.
http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features-managing-env-tiers.html
So I guess that answers my question. Some messages do not return HTTP 200 and then they are stuck in an infinite loop.
No the messages won't get deleted when you read a Queue Item; it is only hidden for a specific amount of time it is called as Visibility Timeout. The idea behind visibility timeout is to ensure that if there are multiple consumers for a single queue, no two consumer pick the same item and start processing.
The is the change you need to do your app to get the expected behavior
Send URL to SQS Queue to wait to be crawled
Send message to Elastic Beanstalk App that crawl the data and store it in database
On the event of successful crawled status, use the receipt-handle(not the message id) and delete the Queue Item from the Queue.
AWS Documentation - DeleteMessage