I'm running some tests on my web app which has a WebJob running to handle some backend tasks.
I connect to the queue using Cloud Explorer in Visual Studio and clear all the messages from the queue. When I restart my WebJob, it still finds messages and tries to process them.
Where are these messages coming from? If I clear the queue through Cloud Explorer in Visual Studio, shouldn't the queue be empty? BTW, I also clear the queue poision.
The Clear Queue command in the VS Queue explorer will indeed delete all messages in the queue, including any messages that may currently be invisible due to their invisibility timeout. When viewing the queue, if there are any invisible messages you'll see them in the display text in the bottom of the window (e.g. "0 of 5 messages").
So if you've executed the Clear Command and it shows "0 of 0" messages the queue is completely empty. If after that your queue triggered function gets invoked on that queue, you must have some code somewhere that is adding messages to that queue. Not a very satisfying answer perhaps, but the neither the WebJobs SDK nor Azure Storage itself is going to be manufacturing any messages in this way :)
Related
I am running WSO2 MI 4.1 in a cluster with two nodes. After I re-enable all message forwarding processors in the Dashboard that are forwarding messages from RabbitMQs message store to an endpoint, each queue says it is running. When I stop the server on one node wait for a short period of time and then start the same node back up and then repeat this on the second node, the message processors look enabled and all have an emabled state. If I go to RabbitMQ, some of the queues are idle. If I try to send a message to these queues the message just sits there in the queue. If I stop and start the message processor for the queue then the queue starts processing messages. This behavior happens with empty queues and queues that have messages in them. Is this a bug or is there a better way to do a system restart?
Removing the _meta_MSMP* files in the _system/governance/repository/components/org.wso2.carbon.tasks/definitions/-1234/ESB_TASK/folder resolved this issue.
Currently we want to pull down an entire FIFO queue, and process the contents, and if any issues, release messages back into the queue.
The problem is, that currently AWS only gives us 10 messages, and won't give us 10 more (which is the way you get bulk messages in SQS, multiple 10 max message requests) until we delete or release the first 10.
We need to get more than 10 though. Is this not possible? We understand we can set the group_id to a random string, and that allows processing more, but then the order isn't guaranteed, which defeats the purpose of FIFO.
I managed to reproduce your results -- I could retrieve 10 messages, but then running the same command again would not return another set of messages.
The relevant documentation seems to be:
While messages with a particular MessageGroupId are invisible, no more messages belonging to the same MessageGroupId are returned until the visibility timeout expires. You can still receive messages with another MessageGroupId as long as it is also visible.
I suspect (just a theory!) that this is to preserve the ordering of messages... If a client asked for a set of messages and they are still being processed, there is the chance that the messages might be returned to the queue. Therefore, no further messages are provided until the original messages are deleted or pass their visibility timeout.
This is only a behaviour of FIFO queues.
It seems that you will need to receive and delete all messages to be able to access them all. I would suggest:
Receive one (or more) message.
Process it. If everything worked, delete the message.
If there were problems, push the message to a new queue.
Once the queue is empty, you would need to read from the new queue and send them back to the original queue (which should preserve ordering).
If you frequently require more capabilities that Amazon SQS provides, you could consider using Amazon MQ – Managed message broker service for ActiveMQ. It has many more capabilities (but is accordingly less 'simple').
If you set another MessageGroupId, you can get another 10 messages, even you don't release or delete the previous ones.
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/using-messagegroupid-property.html
The scenario here is that we have a service bus queue and a web job. The web job reads the message from the service bus queue and calls a logic up which then goes on and does other stuff.
The problem we are facing is that after the web job reads the message from the service bus, it occasionally doesn't delete it after, which constantly causes the logic app to be called and flood our database with data.
Here is the message in question as seen from azure management studio:
https://gyazo.com/7f57b460421d1bb4a69fcb8b5a9ff01f
As you can see, there is no lock time on the message. I have tried to play around with the settings to no avail.
When i manually try to delete that message from azure management studio it is also unsuccessful but there is no error message received.
Does anyone know what is going on here? I feel like this is a problem with the queue itself as opposed to a bug in our code since 2-3 tools that i have used are unable to delete this message from the queue.
It looks like the message is only deleted after a specific time (does not go to the dead-letter queue however).
Thanks
So just for information, i figured my own issue out. When the file scraper job runs, it puts a message in the service bus. The webjob now that runs and picks up that file stores the file that it just picked up locally as well as on blob storage.
The problem was that webjob keeps a queue of what it processes locally which was never cleared so every time the webjob run, it was processing all previous files as well.
Scenario: producer send a message into the Storage Queue, a WebJobs process the message on QueueTrigger, each message must only be processed once, there could be multiple WebJob instances.
I've been googling and from what I've read, I need to write the function that processes the message to be idempotent so a message isn't processed twice. I've also read that there is a default lease time of 10 minutes for a message.
My question is, when the QueueTrigger is triggered on one WebJob instance, does it set the lease time on the message so that another WebJob can't pick up the same message? If so why do I need to account for the possibility that the message can be processed twice? Or am I misunderstanding this?
If you are using the built-in queue trigger attributes, it will automatically ensure that any given message gets processed once, even when a site scales out to multiple instances. This is posted on the article in the discussion section, https://azure.microsoft.com/en-us/documentation/articles/websites-dotnet-webjobs-sdk-get-started/
In the same article you will find clarification regarding the 10 minute lease. In summary, the QueueTrigger attribute directs the WebJobs SDK to call a method when a new message is received in queue. The message is processed and when the method completes, the queue message is deleted. If the method fails before completing, the queue message is not deleted; after a 10-minute lease expires, the message is released to be picked up again and processed. This sequence won't be repeated indefinitely if a message always causes an exception. After 5 unsuccessful attempts to process a message, the message is moved to the poison queue. The maximum number of attempts is configurable.
Your process need to be idempotent. Because
Facts:
A webjob leases a message (No other webjob can get it).
A webjob deletes a message when its job is done.
If a webjob crashes while processing a message, its lease will time out and another webjob will get and start to process that. (default retry is 5 for a messsage, after that it goes to poison queue)
So if a webjob crashes after its job is done but before it deletes the message, then the message will be released after a while and the same job will be done again.
Therefore your process need to be idempotent.
I have an SQS Worker Tier beanstalk application listening to a queue. If we encounter any issues, for example a database crash, is there a way for us to temporarily stop the worker tier from working that queue without having to terminate the environment and rebuilding it again when we want to resume?
One hack I guess would be for us to point it to an empty queue, but I'd rather avoid that type of thing.
Thanks
For anybody who is in the same boat as me, I just want to post my own, inelegant solution.
We have created another SQS Queue, and whenever we want to turn off the processing of messages, we just update the worker tier app to point to this new queue. It isn't clean, but it does what we need.
Another option is to just leave it as is. In case of database crash, or any other error, your application will return for example 500 instead of 200 and message will be returned back to the queue for future processing.
Not sure if this helps, but you can add a delivery delay to SQS queue: right click the queue -> configure queue -> set Delivery Delay up to 15 minutes. Any message will be received after this delay. This allows me to "pause" the queue for up to 15 minutes.
You can terminate the environment and recreate it. In case you do not have a way to recreate same environment via just one command, take a look at: https://github.com/ThoughtWorksStudios/eb_deployer