Failing AWS Step Functions after Catching

Failing AWS Step Functions after Catching - amazon-web-services

I have 3 stages in my AWS Step Function:
Stage 1 - Lambda
Stage 2 - AWS Batch
Stage 3 - AWS Batch (Mandatory Cleanup)
Everything works fine in that if Stage 1 fails then it moves to the Cleanup stage. However, since the cleanup stage always passes, the Step Function's final result is always a Pass, whereas if Stage 1 or 2 fails, I need the Cleanup to be performed, yet the Step Function final result should be a fail.
Options investigated:
One way to solve this is to maintain a flag in a cache whether there is an error, but was wondering if there is an inbuilt way for this.
Another option is to use the Result Path to check for an error but I am not sure how to access this result from an AWS Batch.
Appreciate any advice on this, thanks.
I have added the following Catch block in Stage 1 and 2:
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"Next": "Cleanup"
}
]
The Cleanup stage is as follows:
"Cleanup": {
"Type": "Task",
"Resource": "arn:aws:states:::batch:submitJob.sync",
"Parameters": {
"JobDefinition": "arn:aws:batch:<region>:<account>:job-definition/MyCleanupJob",
"JobName": "cleanup",
"JobQueue": "arn:aws:batch:<region>:<account>:job-queue/MyCleanupQueue",
"ContainerOverrides": {
"Command": [
"java",
"-jar",
"cleanup.jar" ############ need to specify if an error occured as a command line parameter ###########
],
}
},
"End": true
}

Used below mechanism, credit for #LRutten for directing down this path.
For all success stages, append the response to the ResultPath else the previous results will be overwritten.
Set the error to the response path on an exception
Use a choice to decide if the step function should fail based on the presence of the error element
Here is the end output:
"MyLambda": {
"Type": "Task",
"Resource": "arn:aws:lambda:<region>:<account>:function:MyLambda",
"ResultPath": "$.mylambda", #### All results from the lambda are added to "mylambda" in the JSON
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"ResultPath": "$.error", #### If an error occurs it is appended to the result path as an "error" element
"Next": "Cleanup"
}
],
"Next": "MyBatch"
},
"MyBatch": {
"Type": "Task",
"Resource": "arn:aws:states:::batch:submitJob.sync",
"Parameters": {
"JobDefinition": "arn:aws:batch:<region>:<account>:job-definition/MyBatchJob",
"JobName": "cleanup",
"JobQueue": "arn:aws:batch:<region>:<account>:job-queue/MyBatchQueue",
"ContainerOverrides": {
"Command": [
"java",
"-jar",
"mybatch.jar"
],
}
},
"ResultPath": "$.mybatch",
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"ResultPath": "$.error",
"Next": "Cleanup"
}
],
"Next": "Cleanup"
},
"Cleanup": {
"Type": "Task",
"ResultPath": "$.cleanup",
"Resource": "arn:aws:states:::batch:submitJob.sync",
"Parameters": {
"JobDefinition": "arn:aws:batch:<region>:<account>:job-definition/MyCleanupJob",
"JobName": "cleanup",
"JobQueue": "arn:aws:batch:<region>:<account>:job-queue/MyCleanupQueue",
"ContainerOverrides": {
"Command": [
"java",
"-jar",
"cleanup.jar"
],
}
},
"Next": "Should Fail"
},
"Should Fail" :{
"Type" : "Choice",
"Choices" : [
{
"Variable" : "$.error", #### If an error element is present it means it is a Failure
"IsPresent": true,
"Next" : "Fail"
}
],
"Default" : "Pass"
},
"Fail" : {
"Type" : "Fail",
"Cause": "Step function failed"
},
"Pass" : {
"Type" : "Pass",
"Result": "Step function passed",
"End" : true
}
}

Related

How to catch exception from lambda in state machine?

I am using state machines and raising custom error, but in my state machine I am not able to catch that exception.
Below is lambda snippet and state machine definition. Instead of going to catch block and error task.. Its throwing error at result selector attribute as below-
the JSONPath '$.Payload.tables' specified for the field 'tables.$' could not be found in the input
How I can ignore result selector attribute during exception?
My lambda code snippet -
if schema is None:
raise Exception("schema is not configured")
My statemachine -
"ResultSelector": {
"tables.$": "$.Payload.tables"
},
"ResultPath": "$.export_tables",
"Catch": [
{
"ErrorEquals": [
"States.Runtime"
],
"ErrorEquals": [
"States.ALL"
],
"ResultPath": "$.error",
"Next": "error state"
}
],
"Next": "Export Tables"
},
"error state": {
"Type": "Fail"
},
"Export Tables": {
"Type": "Map",
"End": true,
"ItemsPath": "$.export.tables",
"Parameters": {
"product.$": "$.product",
"table_export_def.$": "$$.Map.Item.Value"
},

You can catch custom errors by specifying ErrorEquals and Next attribute with fallback step as in aws docs
For example, if you want to catch runtime error :
"Catch": [ {
"ErrorEquals": ["States.Runtime"],
"Next": "CustomErrorFallback"
},
...
]
Specify your custom fallback step to handle error with custom error message
"CustomErrorFallback": {
"Type": "Pass",
"Result": "schema is not configured",
"End": true
},

Step function with Redshift cluster

Building a step function to orchestrate an ETL pipeline but keep getting this error. Here is my code and following below AWS docs.
https://docs.aws.amazon.com/step-functions/latest/dg/sample-etl-orchestration.html
"GetStateOfCluster": {
"Type": "Task",
"Resource": "lambda,
"TimeoutSeconds": 180,
"HeartbeatSeconds": 60,
"Next": "IsClusterAvailable",
"InputPath": "$",
"ResultPath": "$.clusterStatus"
},
"IsClusterAvailable": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.clusterStatus",
"StringEquals": "available",
"Next": "runetljobs"
},
{
"Variable": "$.clusterStatus",
"StringEquals": "unavailable",
"Next": "ClusterUnavailable"
},
{
"Variable": "$.clusterStatus",
"StringEquals": "paused",
"Next": "InitializeResumeCluster"
},
{
"Variable": "$.clusterStatus",
"StringEquals": "resuming",
"Next": "ClusterWait"
}
],
"Default": "DefaultState"
},
"DefaultState": {
"Type": "Fail",
"Error": "DefaultStateError",
"Cause": "No Matches!"
},
"ClusterUnavailable": {
"Type": "Fail",
"Cause": "Redshift cluster is not available",
"Error": "Error"
},
"ClusterWait": {
"Type": "Wait",
"Seconds": 900,
"Next": "InitializeCheckCluster"
},
"InitializeResumeCluster": {
"Type": "Pass",
"Next": "ResumeCluster",
"Result": {
"input": {
"redshift_cluster_id": "redshift cluster id",
"operation": "resume"
}
}
},
"ResumeCluster": {
"Type": "Task",
"Resource": "lambda",
"TimeoutSeconds": 180,
"HeartbeatSeconds": 60,
"Next": "ClusterWait",
"InputPath": "$",
"ResultPath": "$"
},
It's directly going to default even cluster status shows 'available', rather it should go to runetljob stage. In the doc, they dont have default, if we dont add default, error is,
"cause": "An error occurred while executing the state 'IsClusterAvailable' (entered at the event id #14). Failed to transition out of the state. The state does not point to a next state."

You don't see the state "runetljobs" defined in you state definition.

Cannot pass array to next task in AWS StepFunction

Working on an AWS StepFunction that gets an array of dates from a Lambda call, then passes to a Task that should take that array as a parameter to pass into a lambda.
The Get Date Range task works fine and outputs the date array:
{
"rng": [
"2019-05-07",
"2019-05-09"
]
}
...and the array gets passed into the ProcessDateRange task, but I cannot assign the array the range Parameter.
It literally tries to pass this: "$.rng" instead of this:
[
"2019-05-07",
"2019-05-09"
]
Here's the StateMachine:
{
"StartAt": "Try",
"States": {
"Try": {
"Type": "Parallel",
"Branches": [{
"StartAt": "Get Date Range",
"States": {
"Get Date Range": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789:function:get-date-range",
"Parameters": {
"name": "thename",
"date_query": "SELECT date from sch.tbl_dates;",
"database": "the_db"
}
,
"ResultPath": "$.rng",
"TimeoutSeconds": 900,
"Next": "ProcessDateRange"
},
"ProcessDateRange": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789:function:process-date-range",
"Parameters": {
"range": "$.rng"
},
"ResultPath": "$",
"Next": "Exit"
},
"Exit": {
"Type": "Succeed"
}
}
}],
"Catch": [{
"ErrorEquals": ["States.ALL"],
"ResultPath": "$.Error",
"Next": "Failed"
}],
"Next": "Succeeded"
},
"Failed": {
"Type": "Fail",
"Cause": "There was an error. Please review the logs.",
"Error": "error"
},
"Succeeded": {
"Type": "Succeed"
}
}
}

This is because you are using the wrong syntax for Lambda tasks. To specify the input you need to set the InputPath key, for example:
"ProcessDateRange": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789:function:process-date-range",
"InputPath": "$.rng",
"ResultPath": "$",
"Next": "Exit"
},

If you want a parameter to be interpreted as a JSON path instead of a literal string, add ".$" to the end of the parameter name. To modify your example:
"ProcessDateRange": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789:function:process-date-range",
"Parameters": {
"range.$": "$.rng"
},
"ResultPath": "$",
"Next": "Exit"
},
Relevant docs here: https://docs.aws.amazon.com/step-functions/latest/dg/connectors-parameters.html#connectors-parameters-path

How to capture output of a parallel state machine in AWS Step function

I need to run an AWS Step function that runs a parallel state machine running, say two state machines. My requirement is to check the final execution status of the parallel machine and if there is any failure, invoke an SNS service to send out an email. Pretty standard stuff but for the life of me, i can't figure out how to capture the combined error of a parallel step machine. This sample parallel machine runs
A "passtask" that is just a simple lambda pass function,
and
Runs a failtask that has a sleep timer for 5 seconds and is suppposed to fail after 5 seconds.
If I execute this machine, this machine correctly shows passtask as succeeded, failtask as cancelled, Overall Parallel Task as succeeded (?????), Notify Failure task as cancelled and the overall execution of state machine as "failed" as well.
I'd like to see passtask as succeeded, fail task as failed, overall Parallel Task as Failed, Notify Failure task as succeeded.
{
"Comment": "Parallel Example",
"StartAt": "Parallel Task",
"TimeoutSeconds": 120,
"States": {
"Parallel Task": {
"Type": "Parallel",
"Branches": [
{
"StartAt": "passtask",
"States": {
"passtask": {
"Type": "Task",
"Resource":"arn:xxxxxxxxxxxxxxx:function:passfunction",
"End": true
}
}
},
{
"StartAt": "failtask",
"States": {
"failtask": {
"Type": "Task",
"Resource":"arn: xxxxxxxxxxxxxxx:function:failfunction",
"End": true
}
}
}
],
"ResultPath": "$.status",
"Catch": [
{
"ErrorEquals": ["States.ALL"],
"Next": "Notify Failure"
}
],
"Next": "Notify Success"
},
"Notify Failure": {
"Type": "Pass",
"InputPath": "$.input.Cause",
"End": true
},
"Notify Success": {
"Type": "Pass",
"Result": "This is a fallback from a task success",
"End": true
}
}
}

From your requirment "My requirement is to check the final execution status of the parallel machine and if there is any failure, invoke an SNS service to send out an email.", I understand that the "failtask" is just for debugging purposes and in the future it won't neccesarily fail. So the problem is, the moment Step Functions detect a failure in a branch all other branches are terminated and their outputs discarded, only the failed branch's output is used. So if you want to preserve the output of each Branch and check if a failure has occured, you will need to handle the errors in each branch and not report the whole branch as failed. Additionally you will need to add an output field to each branch which says if there was a failure or not (Choice State will give an error if a field does not exist). And also remember that the output of a ParralelState is an array with the output of each Branch, for example this State Machine should let each branch finish execution and handle the errors correctly:
{
"Comment": "Parallel Example",
"StartAt": "Parallel Task",
"TimeoutSeconds": 120,
"States": {
"Parallel Task": {
"Type": "Parallel",
"Branches": [{
"StartAt": "passtask",
"States": {
"passtask": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:XXXXXXXXXXXXXXXXX",
"Next": "SuccessBranch1",
"Catch": [{
"ErrorEquals": ["States.ALL"],
"Next": "FailBranch1"
}]
},
"SuccessBranch1": {
"Type": "Pass",
"Result": {
"Error": false
},
"ResultPath": "$.Status",
"End": true
},
"FailBranch1": {
"Type": "Pass",
"Result": {
"Error": true
},
"ResultPath": "$.Status",
"End": true
}
}
},
{
"StartAt": "failtask",
"States": {
"failtask": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:XXXXXXXXXXXXXXXXX",
"Next": "SuccessBranch2",
"Catch": [{
"ErrorEquals": ["States.ALL"],
"Next": "FailBranch2"
}]
},
"SuccessBranch2": {
"Type": "Pass",
"Result": {
"Error": false
},
"ResultPath": "$.Status",
"End": true
},
"FailBranch2": {
"Type": "Pass",
"Result": {
"Error": true
},
"ResultPath": "$.Status",
"End": true
}
}
}
],
"ResultPath": "$.ParralelOutput",
"Catch": [{
"Comment": "This catch should never catch any errors, as the error handling is done in the individual Branches",
"ErrorEquals": ["States.ALL"],
"ResultPath": "$.ParralelOutput",
"Next": "ChoiceStateX"
}],
"Next": "ChoiceStateX"
},
"ChoiceStateX": {
"Type": "Choice",
"Choices": [{
"Or": [{
"Variable": "$.ParralelOutput[0].Status.Error",
"BooleanEquals": true
},
{
"Variable": "$.ParralelOutput[1].Status.Error",
"BooleanEquals": true
}
],
"Next": "Notify Failure"
}],
"Default": "Notify Success"
},
"Notify Failure": {
"Type": "Pass",
"End": true
},
"Notify Success": {
"Type": "Pass",
"Result": "This is a fallback from a task success",
"End": true
}
}
}
For a more general case (although more complex) of the above as asked by Nisman in the comments. Instead of hardcoding the Choice State to check for every branch we can add a pass state with some JSONPath tricks to check for conditions not currently possible with a choice state alone.
Inside this Pass State we use Parameters to restructure our data in such a way that when we apply a JSONPath filter expression to this data using the OutputPath we are left with an array of either 2 (if no branches failed) or 3 (if some branches failed) elements, where the first element always contains the original input data and the second/third contains at least 1 key with the same name to be used by the choice state. Here's the State Machine JSON:
{
"Comment": "Parallel Example",
"StartAt": "Parallel Task",
"States": {
"Parallel Task": {
"Type": "Parallel",
"Branches": [
{
"StartAt": "passtask",
"States": {
"passtask": {
"Type": "Task",
"Resource": "<TASK RESOURCE>",
"End": true,
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"ResultPath": "$.error-info",
"Next": "FailBranch1"
}
]
},
"FailBranch1": {
"Type": "Pass",
"Parameters": {
"BranchOutput.$": "$",
"BranchError": true
},
"End": true
}
}
},
{
"StartAt": "failtask",
"States": {
"failtask": {
"Type": "Task",
"Resource": "<TASK RESOURCE>",
"End": true,
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"ResultPath": "$.error-info",
"Next": "FailBranch2"
}
]
},
"FailBranch2": {
"Type": "Pass",
"Parameters": {
"BranchOutput.$": "$",
"BranchError": true
},
"End": true
}
}
}
],
"ResultPath": "$.ParralelOutput",
"Next": "Pre-Process"
},
"Pre-Process": {
"Type": "Pass",
"Parameters": {
"OrderedArray": [
{
"OriginalData": {
"Input.$": "$",
"ShouldFilterData": false
}
},
{
"ValuesToCheck": {
"ListBranchErrors.$": "$.ParralelOutput[?(#.BranchError==true)].BranchError",
"BranchFailures": true
}
},
{
"DefaultAlwaysFalse": {
"ShouldFilterData": false,
"BranchFailures": false
}
}
]
},
"OutputPath": "$..[?(#.ShouldFilterData == false || #.ListBranchErrors[0] == true)]",
"Next": "ChoiceStateX"
},
"ChoiceStateX": {
"Type": "Choice",
"OutputPath": "$.[0].Input",
"Choices": [
{
"Variable": "$[1].BranchFailures",
"BooleanEquals": true,
"Next": "NotifyFailure"
},
{
"Variable": "$[1].BranchFailures",
"BooleanEquals": false,
"Next": "NotifySuccess"
}
],
"Default": "NotifyFailure"
},
"NotifyFailure": {
"Type": "Pass",
"End": true
},
"NotifySuccess": {
"Type": "Pass",
"Result": "This is a fallback from a task success",
"End": true
}
}
}

Send Input as Output on error for AWS Step Function

I'd like my state machine to continue execution even in the event of some state error early on. Most of my lambda functions output the same thing they take as input, so I'd like to be able to just pass on the input that the lambda that encountered the error as output to the next state. I tried
{
"DeleteStuff": {
"Type": "Task",
"Resource": "MY_ARN",
"Catch": [ {
"ErrorEquals": ["States.ALL"],
"ResultPath": "$InputPath",
"Next": "FailedState"
}],
"Next": "checkStuff"
}, ...
without any luck. Has anyone done this, or can anyone offer some assistance?
Thanks!

So the solution is the set ResultPath to null. Changing my state machine to
{
"DeleteStuff": {
"Type": "Task",
"Resource": "MY_ARN",
"Catch": [ {
"ErrorEquals": ["States.ALL"],
"ResultPath": null,
"Next": "FailedState"
}],
"Next": "checkStuff"
}, ...
gave me the desired behaviour.

if you just add a new path to the result path, it is added to the input:
{
"ErrorEquals": ["States.ALL"],
"ResultPath": "$.error",
"Next": "Catch All Error Handler"
}
so if your input was:
{
"data_a" : "aaa",
"data_b" : "bbb"
}
output will be:
{
"data_a" : "aaa",
"data_b" : "bbb",
"error" : "<error description>"
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Failing AWS Step Functions after Catching - amazon-web-services

Related

How to catch exception from lambda in state machine?

Step function with Redshift cluster

Cannot pass array to next task in AWS StepFunction

How to capture output of a parallel state machine in AWS Step function

Send Input as Output on error for AWS Step Function

Categories

Resources