can I run azure webjob after other webjobs finish - azure-webjobs

say I have this:
Step 1: A azure webjob triggered by a timer, and this job will create 1000 messages and I will put them in a queue.
Step 2: I have another azure webjob triggered by above message queue, this webjob will process these messages.
Step 3: The final webjob should only be triggered when all messages have been processed by step 2.
Looks like azure Queue doesn't support ordering and the only way is to use ServiceBus. I am wondering is it really the only way?
What I am thinking is this kind of process:
Put all these messages into an azure table, with some guid as primary key and status to be 0.
after finishing step 2, change the status of this message to 1 (i.e. finished) and will trigger step 3 if every messages have been done.
Will it work? Or maybe there are some nuget packages that I can use to achieve what I want?

The simplest way I think is the combination of Azure Logic App and Azure Function.
Logic App is a automated scalable workflow, you could trigger it with Timer, HTTP request and etc. And Azure function is a serverless compute service that enables you to run code on-demand without having to explicitly provision or manage infrastructure.
The Logic App support add code with Function, as for the use of Function, It's similar with WebJob. So I think you could create a Logic App with three Function, the they will run one by one.
As for the WebJob, yes,the QueueTrigger doesn't support ordering. And the Service Bus you mentioned, It did meet your some requirements for its FIFO feature. However you need to make sure your step 3 would be triggered after step 1, because it's already null in your queue before creating queues.
Hope my answer could help you.

Related

How to limit number of concurrent workflows running?

The title is pretty much the question. Is there some way to limit the number of concurrent workflows running at any given time?
Some background:
I'm using eventarc to dispatch a workflow once a message has been sent to a pubsub topic. The workflow will be used to start some long-running operation (LRO) but for reasons I won't go into, I don't want more than 3 instances of this workflow running at a given time.
Is there some way to do this? - primarily from some type of configuration rather than using another compute resource.
There is no configuration to limit running processes that specifically targets sessions that are executed by a Workflow enabled for concurrent execution.
The existing process limit applies to all sessions without differentiating between those from non-concurrent or concurrent enabled Workflows.
Synchronization enables users to limit the parallel execution of certain workflows or templates within a workflow without having to restrict others.
Users can create multiple synchronization configurations in the ConfigMap that can be referred to from a workflow or template within a workflow. Alternatively, users can configure a mutex to prevent concurrent execution of templates or workflows using the same mutex.
Refer to this link for more information.
Summarizing your requirements:
Trigger workflow executions with Pub/Sub messages
Execute at most 3 workflow executions concurrently
Queue up waiting Pub/Sub messages
(Unspecified) Do you need messages processed in the order delivered?
There is no out-of-the box capability to achieve this. For fun, below is a solution that doesn't need secondary compute (and therefore is still fully managed).
The key to making this work is likely starting new executions for every message, but waiting in that execution if needed. Workflows does not provide a global concurrency construct, so you'll need to use some external storage, such as Firestore. An algorithm like this could work:
Create a callback
Push the callback into a FIFO queue
Atomically increment a counter (which returns the new value)
If the returned value is <= 3, pop the last callback and call it
Wait on the callback
-- MAIN WORKFLOW HERE --
Atomically decrement the counter
If the returned value is < 3, pop the last callback and call it
To keep things cleaner, you could put the above steps in a the triggered workflow and the main logic in a separate workflow that is called as needed.

How to complete a service task using camunda rest api

I am using Camunda workflows to automate various processes. I have come across a scenario where the process is not moving from a service task. Usually, we call the task/{taskid}/complete to complete the task, but since the process is stuck on a service task, I am not able to complete that task. Can anybody help me find a way to complete the service task?
You are using a service task. That basically means "a machine should do something". The "normal" implementation is to provide code (a java Delegate or a connector endpoint) that is called by the process engine to execute this task.
The alternativ is to use the "external task" pattern. Think of external tasks as "user tasks for computers". So the process waits, tells subscribed clients that a job is to be done and waits for their completion.
I suppose your process uses the second option? (you can check in the modeler under "Implementation"). So completion can be done through the external task API, see docs.
/external-task/{id}/complete
If it is a connector then you likely will see when checking the log that retries have occurred and that the transaction rolled back. After addressing the underlying issue the service task (email) should be sent without explicitly triggering the service task and the following user task (Approval) should be created.

How can I track the progress/status of an asynchronous AWS Lambda invocation?

I have an API which I use to trigger AWS Lambda jobs. Upon request, the API invokes an AWS Lambda job with InvocationType='Event'. Hereafter, I want to periodically poll if the AWS Lambda job has finished.
The way that would fit best to my architecture, is to store an identifier of the Lambda job in a database and periodically check if the job is finished and what its output is. However, I was not able to find how I can do this.
How can I periodically poll for the result of an AWS Lambda job, and view the output once it has finished?
I have looked into using InvocationType='RequestResponse', but this requires me to store a future, which I cannot do in a database.
There's no built-in way to check for the status of an asynchronous Lambda invocation.
Asynchronous Lambda invocation, using the event invocation type, is meant to be a fire and forget job. As such, there's no 'progress' or 'status' to get or poll for.
As you don't want to wait for the Lambda to complete, synchronous Lambda invocation is out of the picture. In this case, you need to write your own logic to keep track of the status.
One way you could do this is to store a (job) item in a DynamoDB jobs table with 2 attributes:
jobId UUID (String attribute, set as the partition key)
completed boolean flag (Boolean attribute)
Workflow is then as follows:
Within your API, create & store a new job with completed defaulting to 'false'
Pass the newly-created jobId to the Lambda being invoked in the payload
When the Lambda finishes, lookup the job associated with the passed in jobId within the jobs table & set the completed attribute of the job to true
You can then periodically poll for the result of the job within the DynamoDB table.
Or take a look at using DynamoDB Streams as a way to know when a job finishes in near-real time without polling.
As to viewing the 'output', AWS Lambda just returns a success response without additional information. There is no 'output'. Store any output you might need in persistent storage - maybe an extra output attribute as a String with each job? - & later retrieve it.
#Ermiya Eskandary's answer is absolutely right.
I am a Dynamodb Subject matter expert, and did this status tracking (also error handling, retry, error logging) pattern for many of my customers
You could check the pynamodb_mate library, it has the status tracker pattern implemented and you can enable that with around 15 lines of code.
in general, when you say you want status tracking, you are talking about the following:
Each task should be handled by only one worker, you want a concurrency lock mechanism to avoid double consumption. (a lot of people didn't aware of this, it is called Idempotent)
For those succeeded tasks, store additional information such as the output of the task and log the success time.
For those failed task, log the error message for debug, so you can fix the bug and rerun the task.
For those failed task, you want to get all of failed tasks by one simple query and rerun with the updated business logic.
For those tasks failed too many times, you don't want to retry them anymore and wants to ignore them. (a lot of people run into endless loop when they deploy to production then realize that it is a necessary feature)
Run custom query based on task status for analytics purpose.
You can read this jupyter notebook example
Basically, with pynamodb_mate your lambda job application code become:
# this is your lambda application code
def lambda_handler(...):
...
# your new code should be:
with tracker.start_job():
lambda_handler()
If your application code is not Python, then you have two options:
create another lambda function that invoke the original one using sync mode. however, you pay more money to run the "caller" lambda function
suppose your lambda code in in Node.js, then add additional lambda runtime as a layer and wrap your node.js caller around a Python function. In short, you are using Python to call node.js.

Using AWS Lambda to periodically monitor state of a remote resource

I need to implement the following feature in the backend on AWS:
- API endpoint which allows a user to start a particular long running "process" in a remote system
- the process status in this remote system should be monitored periodically (every few-several seconds) for status and when status == complete, trigger an action (the remote system does not support sending/triggering notifications or callbacks)
We use primarily lambda functions so I'm thinking about approaching it in the following way:
- my endpoint which is triggered by the user would call remote system to start the process, store record in internal DB and generate a message to SQS (with a delivery delay of X seconds)
- there would be a second lambda that would read messages from SQS & check status of the process in this remote system. When status == complete, trigger an action, when status != complete, generate another message to SQS which would again the same lambda would pick up after X seconds of delay and repeat the check and so on
I'm wondering if there is a better solution/tools to implement this kind of monitoring/notification pattern in the AWS since I'm not that familiar with all the services that AWS provides.
Would anyone comment on this approach and perhaps suggest an alternative if there is one?
Take a look at AWS Step Functions which I think is the best fit for your use case.
All you need to do is, instead of generating a SQS message, initiate an execution of a StateMachine in StepFunctions.
The following tutorial explains a iterator loop with a counter. But you can use the same logic to check the status and keep looping until status == complete
https://docs.aws.amazon.com/step-functions/latest/dg/tutorial-create-iterate-pattern-section.html
Another useful resource which I think very close to your use case
https://docs.aws.amazon.com/step-functions/latest/dg/sample-project-job-poller.html

How to implement SWF exponential retries using the aws sdk

I'm trying to implement a jruby SWF activity worker using AWS SDK v2.
I cannot use the aws-flow-ruby framework since it's not compatible with jruby(forking), so I wrote a worker that uses threading.
https://github.com/djpate/jflow if people are interested.
Anyway, in the framework they implement retries and It seems that it actually schedules the same activity later if an activity failed.
I found everywhere in the AWS docs and cannot find how to send that signal back to SWF using the SDK http://docs.aws.amazon.com/sdkforruby/api/Aws/SWF/Client.html
Anyone know where I should look?
From the question, I believe you are somewhat confused about what SWF is / how it works.
Activities don't run and are not retried in isolation. Everything happens in the context of a workflow. The workflow definition tell you when to retry and how to behave if activities fail/timeout etc.
The worker that processes the workflow definition and schedules the next thing that needs to happen is referred to as a decider. (you will see decider and workflow used interchangeably). It's called a decider because based on the current state it makes the decision on what the next activity that needs to be scheduled is. The decider normally takes the workflow history as input when making this input.
In Flow for example, the retry is encoded in the workflow logic. Basically if the activity fails you can just schedule it.
So to finally answer your question: if your target is to only implement the activity workers you don't need to implement any retry logic as that happens at the decider level. You should make sure that the activities are compatible with the decider (you need to make sure the history and the input/output convention are the same).
If your target is to implement your own framework on top of SWF you need to actually do the hard work needed to make the decider work.