I currently have a sagemaker pipeline that is executed using step functions. However, while I am able to start the execution, I am unable to allow the step to wait before moving on to the next step. Hence I should I set it up within step function such that it waits for the pipeline to be executed completely before executing the next step?
StartPipelineExecution is an asynchronous API that returns immediately with the execution ARN. You have several options to wait for pipeline completion with a Step Function. Here are two:
Option 1: Polling
One option is to poll for execution completion within a single State Machine. Polling could be implemented with a Wait - Lambda - Choice task loop. Or by a single Lambda function with a callback pattern (in which case the looping is the Lambda's job). In both cases, the Lambda checks the status of the pipeline execution with an SDK call.
Visually, the State Machine looks like this:
SM #1
[x x x S P x x x x]
where S = Sagemaker StartPieplineExecution task, P = Poller tasks, x = Other tasks.
Option #2: Event-driven
A second option avoids polling. Instead, split your State Machine in two, the first one ending with the StartPieplineExecution task. Add an EventBridge rule that triggers the second half of your tasks when a Pipeline execution state change event with "currentPipelineExecutionStatus": "Succeeded" is emitted.
SM #1 SM #2
[x x x S] -> pipeline success event -> [x x x x]
These patterns apply more generally to orchestratiing asynchronous tasks. See this related question for another example.
Related
Im developer who is new to AWS.
While configuring step functions, I found there are some senarious that there could be multiple step functions instance executed because of timer I've set.
for the step function that I configured follows below process
wait 2 minute
execute lambda function
since there is timer in my step function, there could be some cases that, step function might invoked multiple times at a time.
the thing is that, I wan't to guarantee that on execution queue, only one step function in on.
so if any other step function gets invoked, while step function in on running(timing), I wan't to terminate step function that just got invoked. is there any way to list step functions that are executing?
You can't prevent an execution from starting, but you can list the executions at the start of your Step Function, and exit early if a running execution is found.
The ListExecutions API lists the executions for a given state machine ARN. Call it in a Task, setting the statusFilter to RUNNING to return only in-progress executions. You'll get back a list of matching execution items. All you care about is whether the length > 0.
Finally, insert a Choice state. If there are running items, exit. If no running items, continue with the execution.
Scenario
I'm looking for a way to create an instance of a step function that waits for me to start it. Pseudo code would look like this.
StateMachine myStateMachine = new();
string executionArn = myStateMachine.ExecutionArn;
myStateMachine.Start();
Use Case
We need a way to reliably store the Execution ARN of a step function to a database. If we fail to write the Execution ARN to the database, we won't call the Start method and the step function should timeout. If the starting of the step function fails, the database operation would be rolled back.
These are the steps we plan to take
A local transaction is started
The step function instance is created, but not started
The ExecutionArn of the created step function instance is recorded in a database
The step function is started
The local transaction is committed
Is there a simple way to start a step function like this?
Below is the result of some research I've done on this so far.
Manual Callbacks
Following information in this article https://aws.amazon.com/blogs/compute/implementing-serverless-manual-approval-steps-in-aws-step-functions-and-amazon-api-gateway/,
I create an empty activity, then us this activity as the first step in the step function and add a timeout of 30 seconds to the activity step. The expectation was that if I didn't send a success to that activity task in the step function then the step would timeout and the workflow would fail, but it isn't doing that. Even though I set the timeout to 30 seconds, the step is not timing out. I'm guessing the timeout is about how long it waits for the step function to be able to schedule the activity, not how long it waits for the step function to move on from the activity step.
I've also considered using an SQS SendMessage step with Wait for callback checked and with a similar timeout, but that would require I create a throw-away SQS queue just to contain messages I never intend to read, plus I'm guessing the timeout functionality would work the same here as in an activity.
Wait State
There may be something I can do with a Wait state and parallel branches by following the accepted answer in this SO article: Does AWS Step Functions have a timeout feature?, but before I go down that route I want to see if something simpler can be done.
Global Timeout
I have found that step functions have a global timeout, and that is useful in this case if I use it in conjunction with a step that pauses until my application explicitly resumes it, but the global timeout is only useful if it can be reasonably low (like 20 minutes) and still have the step function viable for all use cases. For instance, if the maximum time it should take to run the step function is 2 or 3 minutes, then all is fine. But if I have another step in the step function that can take longer than 20 minutes then I can't use the global timer anymore or I have to start setting it to something very high, which I don't want to do.
Is there anything I can do here easily that I'm overlooking?
Thanks
Two-phase initialization of a step function cannot be done. We've worked around this by:
Our Application: Writing a row in our DB to indicate the intent to start a step function
Our Application: Start the step function
Our Application: Record the ExecutionArn of the step function instance in the created row
Step Function: Have the step function wait on step 1 indefinitely on an SQS step
Our Application: Poll the SQS queue and either abort the step function or allow it to proceed to the next step by sending a callback to the SQS step. (This is the 2nd phase)
My scenario is as follows:
When an event occurs, several processes need to be executed in parallel.
A step function seems ideal for this:
e.g.
Event received:
- Parallel 1 - Send an SMS message
- Parallel 2 - Send an email
- Parallel 3 - Write data to a database
Gather all completed results to determine success / failure.
However, dependant on the type of event I may want to only execute a few of the processes.
An example would be: the incoming customer data does not contain an email address so we want to skip the email process.
Is there a way for the step function to enable, disable certain parallel processes dependant on the type of initial event?
Here are two ways to apply yes-no decision logic to each parallel branch:
Add a two-branch Choice State at the start of each parallel branch. For instance, payloads matching {"Variable": "$.email", "IsPresent": true} continue to the Parallel 2 branch, but otherwise bypass it.
Perform the parallel tasks in Lambdas using the SDKs. Use branching logic at the start of each Lambda to decide whether to process the payload.
I have nearly 1000 items in my DB. I have to run the same operation on each item. The issue is that this is a third party service that has a 1 second rate limit for each operation. Until now, I was able to do the entire thing inside a lambda function. It is now getting close to the 15 minute (900 second) timeout limit.
I was wondering what the best way for splitting this would be. Can I dump each item (or batches of items) into SQS and have a lambda function process them sequentially? But from what I understood, this isn't the recommended way to do this as I can't delay invocations sufficiently long. Or I would have to call lambda within a lambda, which also sounds weird.
Is AWS Step Functions the way to go here? I haven't used that service yet, so I was wondering if there are other options too. I am also using the serverless framework for doing this if it is of any significance.
Both methods you mentioned are options that would work. Within lambda you could add a delay (sleep) after one item has been processed and then trigger another lambda invocation following the delay. You'll be paying for that dead time, of course, if you use this approach, so step functions may be a more elegant solution. One lambda can certainly invoke another--even invoking itself. If you invoke the next lambda asynchronously, then the initial function will finish while the newly-invoked function starts to run. This article on Asynchronous invocation will be useful for that approach. Essentially, each lambda invocation would be responsible for processing one item, delaying sufficiently to accommodate the service limit, and then invoking the next item.
If anything goes wrong you'd want to build appropriate exception handling so a problem with one item either halts the rest or allows the rest of the chain to continue, depending on what is appropriate for your use case.
Step Functions would also work well to handle this use case. With options like Wait and using a loop you could achieve the same result. For example, your step function flow could invoke one lambda that processes an item and returns the next item, then it could next run a wait step, then process the next item and so on until you reach the end. You could use a Map that runs a lambda task and a wait task:
The Map state ("Type": "Map") can be used to run a set of steps for
each element of an input array. While the Parallel state executes
multiple branches of steps using the same input, a Map state will
execute the same steps for multiple entries of an array in the state
input.
This article on Iterating a Loop Using Lambda is also useful.
If you want the messages to be processed serially and are happy to dump the messages to sqs, set both the concurency of the lambda and the batchsize property of the sqs event that triggers the function to 1
Make it a FIFO queue so that messages dont potentially get processed more than once if that is important.
I have a step function with three states currently.
Pass state -> Wait for 9 hours -> x Lambda Task. - (a)
I want to update the state machine adding another task at the end, so effectively the machine would look like this:
Pass state -> Wait for 9 hours -> x Lambda Task -> y Lambda task. - (b)
Is there a way in which I can edit (a) to (b) and all the running executions will get updated with it ?
Or the ideal way is to abort all the (a) executions and supply the same data to run(b). If so, what would be the correct and possibly the easiest way to do this using SFN tools ?
As mentioned in the step function docs
Running executions will continue to use the previous definition and roleArn
However, if you change your lambda function, step function (even if they were started before the lambda function was changed) will run the new lambda code when it reaches that state (assuming you are not using versioned ARN of lambda function in your state machine). You can change the lambda x to call lambda y while you are in migration phase which will ensure all running executions also run y.