How to send TaskSuccess to the right activity with AWS Step Functions? - amazon-web-services

So, I'm working on a state machine. It can have up to 20 or 30 executions of it running at the same time, with different parameters.
One of it's states is an activity worker (needs to wait for some input from another step function execution started from one of it's states through a lambda function, since you can't directly start a new execution from a state machine).
I know how to send a "Task Success" for an activity. But how can I make sure it's sent to the right execution ?

Using a pub/sub service such as mqtt would be useful here.
Generate a UUID in the lambda that spawns the new execution.
Pass the UUID to the new execution and return it to the activity worker.
The new execution writes the UUID and result to the queue once it's done.
The activity worker reads from the queue and uses the UUID to find the right message.

Depending on the design of your state machine, you may also be able to pass the current activity's taskToken as an input parameter when your activity creates a new StepFunction execution. Then the last state in the sub-execution can call Task Success for the state in the parent execution using the taskToken passed in, returning any result data as the results for that state. (Don't forget the last state would also have to call Task Success for itself as well.)

Related

How can I track the progress/status of an asynchronous AWS Lambda invocation?

I have an API which I use to trigger AWS Lambda jobs. Upon request, the API invokes an AWS Lambda job with InvocationType='Event'. Hereafter, I want to periodically poll if the AWS Lambda job has finished.
The way that would fit best to my architecture, is to store an identifier of the Lambda job in a database and periodically check if the job is finished and what its output is. However, I was not able to find how I can do this.
How can I periodically poll for the result of an AWS Lambda job, and view the output once it has finished?
I have looked into using InvocationType='RequestResponse', but this requires me to store a future, which I cannot do in a database.
There's no built-in way to check for the status of an asynchronous Lambda invocation.
Asynchronous Lambda invocation, using the event invocation type, is meant to be a fire and forget job. As such, there's no 'progress' or 'status' to get or poll for.
As you don't want to wait for the Lambda to complete, synchronous Lambda invocation is out of the picture. In this case, you need to write your own logic to keep track of the status.
One way you could do this is to store a (job) item in a DynamoDB jobs table with 2 attributes:
jobId UUID (String attribute, set as the partition key)
completed boolean flag (Boolean attribute)
Workflow is then as follows:
Within your API, create & store a new job with completed defaulting to 'false'
Pass the newly-created jobId to the Lambda being invoked in the payload
When the Lambda finishes, lookup the job associated with the passed in jobId within the jobs table & set the completed attribute of the job to true
You can then periodically poll for the result of the job within the DynamoDB table.
Or take a look at using DynamoDB Streams as a way to know when a job finishes in near-real time without polling.
As to viewing the 'output', AWS Lambda just returns a success response without additional information. There is no 'output'. Store any output you might need in persistent storage - maybe an extra output attribute as a String with each job? - & later retrieve it.
#Ermiya Eskandary's answer is absolutely right.
I am a Dynamodb Subject matter expert, and did this status tracking (also error handling, retry, error logging) pattern for many of my customers
You could check the pynamodb_mate library, it has the status tracker pattern implemented and you can enable that with around 15 lines of code.
in general, when you say you want status tracking, you are talking about the following:
Each task should be handled by only one worker, you want a concurrency lock mechanism to avoid double consumption. (a lot of people didn't aware of this, it is called Idempotent)
For those succeeded tasks, store additional information such as the output of the task and log the success time.
For those failed task, log the error message for debug, so you can fix the bug and rerun the task.
For those failed task, you want to get all of failed tasks by one simple query and rerun with the updated business logic.
For those tasks failed too many times, you don't want to retry them anymore and wants to ignore them. (a lot of people run into endless loop when they deploy to production then realize that it is a necessary feature)
Run custom query based on task status for analytics purpose.
You can read this jupyter notebook example
Basically, with pynamodb_mate your lambda job application code become:
# this is your lambda application code
def lambda_handler(...):
...
# your new code should be:
with tracker.start_job():
lambda_handler()
If your application code is not Python, then you have two options:
create another lambda function that invoke the original one using sync mode. however, you pay more money to run the "caller" lambda function
suppose your lambda code in in Node.js, then add additional lambda runtime as a layer and wrap your node.js caller around a Python function. In short, you are using Python to call node.js.

Camunda process versioning using "Process Instance Modification" migrate call activities

In our project we have problem with camunda process versioning.
We have read some guides and decided to use Process Instance Modification over Process Instance Migration due to limitations that the last approach has.
As we see Process Instance Migration does not allow us to change current variables (based on their previous value, and current wait point we stay), sometimes we only want to change variables because we change delegate executions code and we know that business model (BPMN) haven't bean changed.
So currently I am trying to develop migration framework based on Process Instance Modification.
And first issue I encounter is:
How properly migrate process instance which currently stays on wait point in Call Activity?
For example, I have process:
I start it. One exectuions stays on wait point before Message 1 event. Another gets into Call activity:
And stays there before Message 3 and Message 4.
By using Process Instance Modification I stop processes in Call Activity and then start them again (changing variables, and bpmn model to the latest). How can I attach them to the parent process instance which called Call activity in the first place, to make it return back to the parent process instance (which called Call activity) and proceed with processing (executing Task 6). What if I want to migrate parent process as well?

What happens when we trigger the SWF Flows #Execute method multiple times?

We have a usecase where we start a workflow (by invoking #Execute method) and the we schedule a timer for a subsequent activity. Now, this triggering of workflow is based on API call which can be triggered multiple times by a client.
Wanted to know how SWF flow handled the multiple invocations of #Execute method.
Does it create multiple executions ?
or would there be multiple timer clocks scheduled for same workflow execution ?
SWF allows only one open workflow execution per ID. So if the workflow is still running calling the Execute method again is going to return WorkflowExecutionAlreadyStartedFault.
Note that if a workflow is completed the new workflow is going to start even for the same ID.
The temporal.io which is an open source version of SWF has an additional WorkflowIdReusePolicy which specifies what should be done if there are already completed workflows.

How to update MultiInstance User Task to add/delete Tasks?

We have a business scenario where we would like to have the ability to INCREASE or DELETE tasks within a multi-instance context.
I’ve managed to successfully create a mutli-instance User task based on a collection workPartnerList
If a Process is working on a multi instance stage of the workflow - how can I increase or decrease the multi instance state based on the count/values of workPartnerList which can increase or decrease based on updates from the API call. (we need to do this prior to the overall task completion)?
I assume you are referring to a parallel multi-instance task.
https://docs.camunda.org/manual/latest/reference/bpmn20/tasks/task-markers/
Another way to define the number of instances is to specify the name
of a process variable which is a collection using the loopDataInputRef
child element. For each item in the collection, an instance will be
created
The creation of the instances happens at the point in time when the execution reaches the parallel multi-instance activity. The number of instances created is determined by the size of the collection at this specific point in time. (A BPMN2 process engine will not automatically keep the task instances in sync with the collection.)
To "delete" task instance you can complete or cancel them (e.g. via an attached boundary event) or us the completion condition.
A multi-instance activity ends when all instances are finished.
However, it is possible to specify an expression that is evaluated
every time one instance ends. When this expression evaluates to true,
all remaining instances are destroyed and the multi-instance activity
ends, continuing the process. Such an expression must be defined in
the completionCondition child element.
To add additional task instances to a running process instance dynamically you can use for instance event sub processes or attach a boundary event to the task.
https://docs.camunda.org/manual/7.13/reference/bpmn20/events/message-events/#message-boundary-event
Boundary events are catching events that are attached to an activity.
This means that while the activity is running, the message boundary
event is listening for named message. When this is caught, two things
might happen, depending on the configuration of the boundary event:
Interrupting boundary event: The activity is interrupted and the sequence flow going out of the event is followed.
Non-interrupting boundary event: One token stays in the activity and an additional token is created which follows the sequence flow
going out of the event.
If you are willing to approach this on API level then the TaskService allows you to create a new task (with a user defined task id).
Example:
https://github.com/rob2universe/cam-multi-instance/blob/25f524be6a112deb1b4ae3bb4f28a35422e428e0/src/test/java/org/camunda/bpm/example/ProcessJUnitTest.java#L79
The migration API would even allow you to add additional instances to the already created set of task instances - see: https://docs.camunda.org/manual/latest/user-guide/process-engine/process-instance-modification/#modify-multi-instance-activity-instances

How to prevent concurent runs of a state machine in AWS Step Functions?

Is there a way to prevent concurrent execution of AWS Step Functions state machines? For example I run state machine and if this execution is not finished and I run this machine again I get an exception.
You can add a step (say, with a Lambda function) which would check if the same state machine is already being executed (and in which state). If this is the case, the lambda and the step would fail.
Depending on what you want to achieve, you can additionally configure a Retry so that the execution will continue once the old state machine has finished.
I don't think it is possible according to StartExecution API documentation:
StartExecution is idempotent. If StartExecution is called with the
same name and input as a running execution, the call will succeed and
return the same response as the original request. If the execution is
closed or if the input is different, it will return a 400
ExecutionAlreadyExists error. Names can be reused after 90 days.