Azure Storage Queue and multiple WebJobs instances: will QueueTrigger set the message lease time on triggered? - azure-webjobs

Scenario: producer send a message into the Storage Queue, a WebJobs process the message on QueueTrigger, each message must only be processed once, there could be multiple WebJob instances.
I've been googling and from what I've read, I need to write the function that processes the message to be idempotent so a message isn't processed twice. I've also read that there is a default lease time of 10 minutes for a message.
My question is, when the QueueTrigger is triggered on one WebJob instance, does it set the lease time on the message so that another WebJob can't pick up the same message? If so why do I need to account for the possibility that the message can be processed twice? Or am I misunderstanding this?

If you are using the built-in queue trigger attributes, it will automatically ensure that any given message gets processed once, even when a site scales out to multiple instances. This is posted on the article in the discussion section, https://azure.microsoft.com/en-us/documentation/articles/websites-dotnet-webjobs-sdk-get-started/
In the same article you will find clarification regarding the 10 minute lease. In summary, the QueueTrigger attribute directs the WebJobs SDK to call a method when a new message is received in queue. The message is processed and when the method completes, the queue message is deleted. If the method fails before completing, the queue message is not deleted; after a 10-minute lease expires, the message is released to be picked up again and processed. This sequence won't be repeated indefinitely if a message always causes an exception. After 5 unsuccessful attempts to process a message, the message is moved to the poison queue. The maximum number of attempts is configurable.

Your process need to be idempotent. Because
Facts:
A webjob leases a message (No other webjob can get it).
A webjob deletes a message when its job is done.
If a webjob crashes while processing a message, its lease will time out and another webjob will get and start to process that. (default retry is 5 for a messsage, after that it goes to poison queue)
So if a webjob crashes after its job is done but before it deletes the message, then the message will be released after a while and the same job will be done again.
Therefore your process need to be idempotent.

Related

AWS Lambda triggered twice for a sigle SQS Message

I have a system where a Lambda is triggered with event source as an SQS Queue.Each message gets our own internal unique id to differentiate between two requests .
Now lambda deletes the message from the queue automatically after sqs invocation and keeps the message in inflight while processing it so duplicate processing of a unique message should never occur ideally.
But when I checked my logs a message with the same unique id was processed within 100 milliseconds of the time frame of each other.
So This seems like two lambdas were triggered for one message and something failed at the end of aws it was either visibility timeout or something else.I have read online that few others have gone through the same situation.
Can anyone who has gone through the same situation explain how did they solve it or people with current scalable systems who don't have this kind of issue can help me out with the reasons why I could be having it ?
Note:- One single message was successfully executed Twice this wasn't the case of retry on failure.
I faced a similar issue, where a lambda (let's call it lambda-1) is triggered through a queue, and lambda-1 further invokes lambda-2 'synchronously' (https://docs.aws.amazon.com/lambda/latest/dg/invocation-sync.html) and the message basically goes to inflight and return back after visibility timeout expiry and triggers lambda-1 again. This goes on in a loop.
As per the link above:
"For functions with a long timeout, your client might be disconnected
during synchronous invocation while it waits for a response. Configure
your HTTP client, SDK, firewall, proxy, or operating system to allow
for long connections with timeout or keep-alive settings."
Making async calls in lambda-1 can resolve this issue. In the case above, invoking lambda-2 with InvocationType='Event' returns back, which in-turn deletes the item from queue.

Is it possible to load the same message from AWS SQS more than once

I have an SQS FIFO queue which we send bunch of ids for processing on the other end. We have 4 workers digesting the message. Once the worker receives the message, it deletes the msg and stores these ids until it hits a limit before performing actions.
What I've noticed is that some ids are received more than once when each id is only sent once. Is it normal?
Your current process appears to be:
A worker pulls (Receives) a message from a queue
It deletes the message
It performs actions on the message
This is not the recommended way to use a queue because the worker might fail after it has deleted the message but before it has completed the action. Thus, the message would be "lost".
The recommended way to use a queue would be:
Pull a message from the queue (makes the message temporarily invisible)
Process the message
Delete the message
This way, if the worker fails while processing the message, it will automatically "reappear" on the queue after the invisibility period. The worker can also send a "still working" signal to keep the message invisible for longer while it is being processed.
Amazon SQS FIFO queues provide exactly-once processing. This means that a message will only be delivered once. (However, if the invisibility period expires before the message is deleted, it will be provided again.)
You say that "some ids are received more than once". I would recommend adding debug code to try and understand the circumstances in which this happens, since it should not be happening if the messages are deleted within the invisibility period.

What happens to a SQS message if listener gets killed?

Say, I have one SQS and a listener process listening to that queue. Say, there is one message, and while processing the message, say the process got killed automatically. Then, what will happen to this message? Will it again go to SQS or remain in flight? Or will it go to DLQ, if configured?
Your measages will have an “invisibility time-out”. During that time window they won’t be returned by subsequent request for messages.
Once the timeout expires, the message will be returned by another call.
Once you “ack” the message it won’t be processed again.
Generally SQS uses “at least once” semantics. Up to the maximum age of your queue it will reprocess messages until they are successfuly processed.
Absent something really crazy (where you keep failing the message for 14 days), it will ensure that your message is processed.
It does mean the code you run in response to a message needs to be idempotent (needs to run more than once with out changing the result).
But if you can do that, and you can respond to bugs in your code faster than 14 days, then you will be guranteed to have your message processed.
Does that make sense?

Prevent multiple instances of an Azure web job processing the same Queue message

I noticed that multiple instances of my Web job are receiving the same message and end up acting on it. This is not the desired behavior. I would like multiple messages to be processed concurrently, however, I do not want the same message being processed by multiple instances of the web job.
My web job is of the continuous running type.
I use a QueueTrigger to receive the message and invoke the function
My function runs for several hours.
I have looked into the JobHostConfiguration.BatchSize and MaxDequeue properties and I am not sure on these. I simply want a single instance processing a message and that it could take several hours to complete.
This is what I see in the web job logs indicating the message is received twice.
[01/24/2017 16:17:30 > 7e0338: INFO] Executing:
'Functions.RunExperiment' - Reason: 'New queue message detected on
'runexperiment'.'
[01/24/2017 16:17:30 > 7e0338: INFO] Executing:
'Functions.RunExperiment' - Reason: 'New queue message detected on
'runexperiment'.'
According to the official document, if we use Azure queue storage in the WebJob on the multiple instance, we no need to write code to prevent multiple instances to processing the same queue message.
The WebJobs SDK queue trigger automatically prevents a function from processing a queue message multiple times; functions do not have to be written to be idempote.
I deployed a WebJob on the 2 instances WebApp, It also works correctly(not execute twice with same queue message). It could run on the 2 instances and there is no duplicate executed message.
So it is very odd that the queue message is executed twice, please have a try to debug it whether there are 2 queue messages that have the same content are triggered.
The following is my debug code. I wrote the message that with the executed time and instance id info into another queue.
public static void ProcessQueueMessage([QueueTrigger("queue")] string message, [Queue("logqueue")] out string newMessage, TextWriter log)
{
string instance = Environment.GetEnvironmentVariable("WEBSITE_INSTANCE_ID");
string newMsg = $"WEBSITE_INSTANCE_ID:{instance}, timestamp:{DateTime.Now}, Message:{message}";
log.WriteLine(newMsg);
Console.WriteLine(newMsg);
newMessage = newMsg;
}
}
I had the same issue of a single message processed multiple times at the same time. The issue disappeared as soon as I have set the MaxPollingInterval property...

Can I tell if an Amazon SQS message is still in flight?

Given an Amazon SQS message, is there a way to tell if it is still in flight via the API? Or, would I need to note the timestamp when I receive the message, subtract that from the current time, and check if that is less than the visibility timeout?
The normal flow for using Amazon Simple Queueing Service (SQS) is:
A message is pushed onto a queue using SendMessage (it can remain in the queue for up to 14 days)
An application uses ReceiveMessage to retrieve a message from the queue (no guarantee of first-in-first-out)
When the application has finished processing the message, it calls DeleteMessage (it can also call ChangeMessageVisibility to extend the time until it times-out)
If the application does not delete the message within a pre-configured time period, SQS makes the message reappear on the queue
If a message is retrieved from the queue more than a pre-configured number of times, the message can be moved to a Dead Letter queue
It is not possible to obtain information about a specific message. Rather, the application asks for a message (or a batch of messages), upon which the message becomes invisible (or 'in flight'). This also gives access to a ReceiptHandle that can be used with DeleteMessage or ChangeMessageVisibility.
The closest option is to call GetQueueAttributes. The value for ApproximateNumberOfMessagesNotVisible will indicate the number of in-flight messages but it will not give insight into a particular message.