Azure Batch Job sequential execution not working - azure-webjobs

We are using azure web job for batch processing, the job will trigger when there is a message in the storage queue.
We have configured the job to execute the messages one by one.
JobHostConfiguration config = new JobHostConfiguration();
config.Queues.BatchSize = 1;
config.Queues.MaxDequeueCount = 1;
even though the job is taking multiple messages from the storage queue and executing parallelly.
Please help.

taking multiple messages from the storage queue and executing
parallelly
How did you judge take multiple messages and executing in parallel? Did you have multiple instances?
I test the code in different situations.
1)The normal situation ,not set the batchsize, it will drag all messages in the queue.However i think it still run one by one.But from the result i think it won't wait last running completely over.Here is result.
2)Set the batchsize to 1, if you debug the code or refresh the queue frequently, you will find it did drag one message one time run. And here is result.
3) Set the batchsize to three and debug , it just change the message number dragged, each time it will drag 3 messages, then it will run like normal without setting batchsize.Here is the result.And i found if you just run not debug , the order console showing is very orgnized.
So if you don't have other instance running, i think this is working in sequential mode.
If this doesn't match your requirements or you still have questions, please let me know.

Related

Camunda External tasks messages are de prioritising

We use node camunda-external-task-client-js to handle camunda external tasks.
Following is the client configuration
"topic_name": "app-ext-task",
"maxTasks": 5,
"maxParallelExecutions": 5,
"interval": 500,
"usePriority": true,
"lockDuration":2100000,
"workerId": "app-ext-task-worker"
We are getting external task details and able to processing them,But some times we see some tasks are getting deprioritised.
We are not setting any priority to any external task, by default all tasks are assigned priority 0.
We expect all tasks will execute in sequential manner, we agree some tasks may take more time than the subsequent task so that the taks-1 may take more time than task-2.
Ex: If a queue contains 10 taks [task1,taks-2,task-3,task-4,task-5,...task-10]
All the tasks executed sequentially as all the tasks have same priority.
1st:task-1,
2nd:task-2
3rd: task-3
Problem:
We see some tasks are getting deprioritised it means early messages are taking priority over existing messages.
1st:task-1,
2nd:task-2
3rd: task-4
4th: task-5
5th: task-6
6th: task-7
7th: task-8
8th: task-3
I am seeing problem at 2 places
While producing the message, camunda could have not posted the message in QUEUE.
While reading the Queue camunda external tasks are not processed properly.
I didn't find much docs on this, I don't know how do I debug this.
For me this is an intermitent issue, as I am not able to find the root cause of the problem.
I am not sure how to debug this as well.
Is my expectation wrong in camunda queues?
The external tasks do not form a "queue". They are instances in a pool of possible tasks, your worker fetches "some" tasks, which might be in order or not. You could prioritise the tasks, but still, if you have 10 "highest" prio tasks in the pool and the worker fetches 5, you won't be able to determine which are chosen.
But you have a process engine at hand, if keeping the sequence is essential for your process, why do you start all tasks at once and rely on the external worker to keep the order? Why not just creating one task at a time and continue when it is finished?

Dataflow job stuck and not reading messages from PubSub

I have a dataflow job which reads JSON from 3 PubSub topics, flattening them in one, apply some transformations and save to BigQuery.
I'm using a GlobalWindow with following configuration.
.apply(Window.<PubsubMessage>into(new GlobalWindows()).triggering(AfterWatermark.pastEndOfWindow()
.withEarlyFirings(AfterFirst.of(AfterPane.elementCountAtLeast(20000),
AfterProcessingTime.pastFirstElementInPane().plusDelayOf(durations))))
.discardingFiredPanes());
The job is running with following configuration
Max Workers : 20
Disk Size: 10GB
Machine Type : n1-standard-4
Autoscaling Algo: Throughput Based
The problem I'm facing is that after processing few messages (approx ~80k) the job stops reading messages from PubSub. There is a backlog of close to 10 Million messages in one of those topics and yet the Dataflow Job is not reading the messages or autoscaling.
I also checked the CPU usage of each worker and that is also hovering in single digit after initial burst.
I've tried changing machine type and max worker configuration but nothing seems to work.
How should I approach this problem ?
I suspect the windowing function is the culprit. GlobalWindow isn't suited to streaming jobs (which I assume this job is, due to the use of PubSub), because it won't fire the window until all elements are present, which never happens in a streaming context.
In your situation, it looks like the window will fire early once, when it hits either that element count or duration, but after that the window will get stuck waiting for all the elements to finally arrive. A quick fix to check if this is the case is to wrap the early firings in a Repeatedly.forever trigger, like so:
withEarlyFirings(
Repeatedly.forever(
AfterFirst.of(
AfterPane.elementCountAtLeast(20000),
AfterProcessingTime.pastFirstElementInPane().plusDelayOf(durations)))))
This should allow the early firing to fire repeatedly, preventing the window from getting stuck.
However for a more permanent solution I recommend moving away from using GlobalWindow in streaming pipelines. Using fixed-time windows with early firings based on element count would give you the same behavior, but without risk of getting stuck.

Dataflow discarding massive amount of events due to Window object or inner processing

Been recently developing a Dataflow consumer which read from a PubSub subscription and outputs to Parquet files the combination of all those objects grouped within the same window.
While I was doing testing of this without a huge load everything seemed to work fine.
However, after performing some heavy testing I can see that from 1.000.000 events sent to that PubSub queue, only 1000 make it to Parquet!
According to multiple wall times across different stages, the one which parses the events prior applying the window seems to last 58 minutes. The last stage which writes to Parquet files lasts 1h and 32 minutes.
I will show now the most relevant parts of the code within, hope you can shed some light if its due to the logic that comes before the Window object definition or if it's the Window object iself.
pipeline
.apply("Reading PubSub Events",
PubsubIO.readMessagesWithAttributes()
.fromSubscription(options.getSubscription()))
.apply("Map to AvroSchemaRecord (GenericRecord)",
ParDo.of(new PubsubMessageToGenericRecord()))
.setCoder(AvroCoder.of(AVRO_SCHEMA))
.apply("15m window",
Window.<GenericRecord>into(FixedWindows.of(Duration.standardMinutes(15)))
.triggering(AfterProcessingTime
.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(1)))
.withAllowedLateness(Duration.ZERO)
.accumulatingFiredPanes()
)
Also note that I'm running Beam 2.9.0.
Could the logic inside the second stage be too heavy so that messages arrive too late and get discarded in the Window? The logic basically consists reading the payload, parsing into a POJO (reading inner Map attributes, filtering and such)
However, if I sent a million events to PubSub, all those million events make it till the Parquet write to file stage, but then those Parquet files don't contain all those events, just partially. Does that make sense?
I would need the trigger to consume all those events independently of the delay.
Citing from an answer on the Apache Beam mailing list:
This is an unfortunate usability problem with triggers where you can accidentally close the window and drop all data. I think instead, you probably want this trigger:
Repeatedly.forever(
AfterProcessingTime
.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(1)))
The way I recommend to express this trigger is:
AfterWatermark.pastEndOfWindow().withEarlyFirings(
AfterProcessingTime
.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(1)))
In the second case it is impossible to accidentally "close" the window and drop all data.

Spring Integration Multiple consumers not processing concurrently

I am using Spring Integration with ActiveMQ. I defined a DefaultMessageListenerContainer with maxConcurrentConsumers = 5. It is referenced in a . After an int-xml:validating-filter and an int-xml:unmarshalling-transformer, I defined a queue channel actionInstructionTransformed. And I have got a poller for this queue channel. When I start my application, in the ActiveMQ console, I can see that a connection is created and inside five sessions.
Now, I have got a #MessageEndpoint with a method annotated
#ServiceActivator(inputChannel = "actionInstructionTransformed", poller = #Poller(value = "customPoller")).
I have got a log statement at the method entrance. Processing of each message is long (several minutes). In my logs, I can see that thread-1 starts the processing and then I can only see thread-1 outputs. Only when thread-1 has finished processing 1 message, I can see thread-2 starts processing the next message, etc. I do NOT have any synchronized block inside my class annotated #MessageEndpoint. I have not managed to get thread-1, thread-2, etc process messages concurrently.
Has anybody experienced something similar?
Look, you say:
After an int-xml:validating-filter and an int-xml:unmarshalling-transformer, I defined a queue channel actionInstructionTransformed.
Now let's go to the QueueChannel and PollingConsumer definitions!
On the other hand, a channel adapter connected to a channel that implements the org.springframework.messaging.PollableChannel interface (e.g. a QueueChannel) will produce an instance of PollingConsumer.
And pay attention that #Poller (PollerMetadata) has taskExecutor option.
By default the TaskScedhuler ask QueueChannel for data periodically according to the trigger configuration. If that is PeriodicTrigger with default options like fixedRate = false, the next poll really happens after the previous one. That's why you see only one Thread.
So, try to configure taskExecutor and your messages from that queue will go in parallel.
The concurrency on the DefaultMessageListenerContainer does not have effect. Because in the end you place all those messages to the QueueChannel. And here a new Threading model starts to work based on the #Poller configuration.

What happens to running processes on a continuous Azure WebJob when website is redeployed?

I've read about graceful shutdowns here using the WEBJOBS_SHUTDOWN_FILE and here using Cancellation Tokens, so I understand the premise of graceful shutdowns, however I'm not sure how they will affect WebJobs that are in the middle of processing a queue message.
So here's the scenario:
I have a WebJob with functions listening to queues.
Message is added to Queue and job begins processing.
While processing, someone pushes to develop, triggering a redeploy.
Assuming I have my WebJobs hooked up to deploy on git pushes, this deploy will also trigger the WebJobs to be updated, which (as far as I understand) will kick off some sort of shutdown workflow in the jobs. So I have a few questions stemming from that.
Will jobs in the middle of processing a queue message finish processing the message before the job quits? Or is any shutdown notification essentially treated as "this bitch is about to shutdown. If you don't have anything to handle it, you're SOL."
If we are SOL, is our best option for handling shutdowns essentially to wrap anything you're doing in the equivalent of DB transactions and implement your shutdown handler in such a way that all changes are rolled back on shutdown?
If a queue message is in the middle of being processed and the WebJob shuts down, will that message be requeued? If not, does that mean that my shutdown handler needs to handle requeuing that message?
Is it possible for functions listening to queues to grab any more queue messages after the Job has been notified that it needs to shutdown?
Any guidance here is greatly appreciated! Also, if anyone has any other useful links on how to handle job shutdowns besides the ones I mentioned, it would be great if you could share those.
After no small amount of testing, I think I've found the answers to my questions and I hope someone else can gain some insight from my experience.
NOTE: All of these scenarios were tested using .NET Console Apps and Azure queues, so I'm not sure how blobs or table storage, or different types of Job file types, would handle these different scenarios.
After a Job has been marked to exit, the triggered functions that are running will have the configured amount of time (grace period) (5 seconds by default, but I think that is configurable by using a settings.job file) to finish before they are exited. If they do not finish in the grace period, the function quits. Main() (or whichever file you declared host.RunAndBlock() in), however, will finish running any code after host.RunAndBlock() for up to the amount of time remaining in the grace period (I'm not sure how that would work if you used an infinite loop instead of RunAndBlock). As far as handling the quit in your functions, you can essentially "listen" to the CancellationToken that you can pass in to your triggered functions for IsCancellationRequired and then handle it accordingly. Also, you are not SOL if you don't handle the quits yourself. Huzzah! See point #3.
While you are not SOL if you don't handle the quit (see point #3), I do think it is a good idea to wrap all of your jobs in transactions that you won't commit until you're absolutely sure the job has ran its course. This way if your function exits mid-process, you'll be less likely to have to worry about corrupted data. I can think of a couple scenarios where you might want to commit transactions as they pass (batch jobs, for instance), however you would need to structure your data or logic so that previously processed entities aren't reprocessed after the job restarts.
You are not in trouble if you don't handle job quits yourself. My understanding of what's going on under the covers is virtually non-existent, however I am quite sure of the results. If a function is in the middle of processing a queue message and is forced to quit before it can finish, HAVE NO FEAR! When the job grabs the message to process, it will essentially hide it on the queue for a certain amount of time. If your function quits while processing the message, that message will "become visible" again after x amount of time, and it will be re-grabbed and ran against the potentially updated code that was just deployed.
So I have about 90% confidence in my findings for #4. And I say that because to attempt to test it involved quick-switching between windows while not actually being totally sure what was going on with certain pieces. But here's what I found: on the off chance that a queue has a new message added to it in the grace period b4 a job quits, I THINK one of two things can happen: If the function doesn't poll that queue before the job quits, then the message will stay on the queue and it will be grabbed when the job restarts. However if the function DOES grab the message, it will be treated the same as any other message that was interrupted: it will "become visible" on the queue again and be reran upon the restart of the job.
That pretty much sums it up. I hope other people will find this useful. Let me know if you want any of this expounded on and I'll be happy to try. Or if I'm full of it and you have lots of corrections, those are probably more welcome!