I have a state machine (AWS Step function). I invoke it from java code either to start or stop. How do I pause a state machine and resume it back.
To pause the state machine you can add a manual approval step with API Gateway and call a GetActivityTask when you're ready to unpause. See more details in this tutorial https://aws.amazon.com/blogs/compute/implementing-serverless-manual-approval-steps-in-aws-step-functions-and-amazon-api-gateway/
Alternatively, if the java code where you need to pause the step function sends logs to CloudWatch and unpause is not required to be done immediately after your code complete (can wait 5 minutes), you can trigger the lambda steps to proceed after some event in the logs. For more details see https://docs.aws.amazon.com/step-functions/latest/dg/tutorial-cloudwatch-events-target.html
Related
I have a state machine with multiple steps.
I want to send notification whenever a step run successfully or give error.
One solution is to add a SNS step after each step so whenever a step is successful then next step will run which is to send notification but what if a step fails then How can I send email ?
Is there any solution to this problem ?
I know we can set cloud watch rules but it send notification when a complete state machine fails but here I want to get notifications at lower level i.e. at every step of state machine.
Thanks for your question.
In the AWS Step Functions Console, there's a sample project called Callback pattern example which should illustrate what you're trying to achieve:
Step Functions offers error catching functionality which allows you to transition to a specific state based on the error thrown. For more information see this helpful documentation on error handling: Error Handling in Step Functions - AWS Step Functions
I am have simple python code which subscribes to a service bus subscription. I have containerized this and deployed as part of ACI on Azure.
If message arrives on service bus subscription, the code is executed, executes it logic and then waits indefinitely for another message from appear.
The code is what Azure has provided in its documentation for python sdk here
Since ACI is serverless and bills/second, just wanted a confirmation if I'll get billed even if it is not executing my code and waiting for message for appear on topic/subscription (event-based) ?
Yes, of course. It will cost if there is anyone container instance in the running state. Until you stop all the container instance, then the cost will stop. So even if your code is waiting, but the instance is running.
I need to implement the following feature in the backend on AWS:
- API endpoint which allows a user to start a particular long running "process" in a remote system
- the process status in this remote system should be monitored periodically (every few-several seconds) for status and when status == complete, trigger an action (the remote system does not support sending/triggering notifications or callbacks)
We use primarily lambda functions so I'm thinking about approaching it in the following way:
- my endpoint which is triggered by the user would call remote system to start the process, store record in internal DB and generate a message to SQS (with a delivery delay of X seconds)
- there would be a second lambda that would read messages from SQS & check status of the process in this remote system. When status == complete, trigger an action, when status != complete, generate another message to SQS which would again the same lambda would pick up after X seconds of delay and repeat the check and so on
I'm wondering if there is a better solution/tools to implement this kind of monitoring/notification pattern in the AWS since I'm not that familiar with all the services that AWS provides.
Would anyone comment on this approach and perhaps suggest an alternative if there is one?
Take a look at AWS Step Functions which I think is the best fit for your use case.
All you need to do is, instead of generating a SQS message, initiate an execution of a StateMachine in StepFunctions.
The following tutorial explains a iterator loop with a counter. But you can use the same logic to check the status and keep looping until status == complete
https://docs.aws.amazon.com/step-functions/latest/dg/tutorial-create-iterate-pattern-section.html
Another useful resource which I think very close to your use case
https://docs.aws.amazon.com/step-functions/latest/dg/sample-project-job-poller.html
I am using Amazon SWF service to automate some recurring tasks.
I am trying to use signals to execute some commands on remote machines. After the commands are finished executing, I'd like to send a signal back to the workflow to indicate success or failure.
The question is how can I find the workflow execution id programmatically? This is required for the remote machines to send a signal.
Thanks
Per http://docs.aws.amazon.com/AWSRubySDK/latest/AWS/SimpleWorkflow/WorkflowExecution.html, shouldn't
your_workflow_execution_variable.run_id
get you exactly what you're looking for?
Is there a way to kill / re-start a long running task in AWS SWF? Sometimes some of our tasks run for a longer duration and we would like to manually kill a certain task (either via UI or programmatically) and re-start the task if possible. How to achieve this?
Console is option to manually kill workflow.
You can also set timeouts to whole workflow execution time or to individual activities. This can be set when you register your activity or when you start your activity (defaultTaskStartToCloseTimeoutSecond).
It's not clear what language you're using.
If you're using java, then you should look into Exponential Retry in Flow Framework. This make SDK restart your activity if it fails.
Long running activity is expected to heartbeat using RecordActivityTaskHeartbeat. It leads to timeout failure after short hearbeat interval instead of long task execution timeout if the activity process hangs or crashes.
The workflow code (decider) can always request activity cancellation through RequestCancelActivityTask decision. The cancellation request is returned as output of the RecordActivityTaskHeartbeat call. Activity implementation should cancel itself and report back to the service using RespondActivityTaskCanceled API call.
See Error Handling section of AWS Flow Framework Developer Guide for the AWS Flow Framework way of cancelling activities.
Sometimes activity implementation cannot support heartbeating and self cancellation. The solution is to execute another kill activity that terminates the first activity execution. For example under Unix such kill activity could emit "kill -9" command for the process that implements the first one.