aws step function parallel with input parameters - amazon-web-services

I am trying to use AWS step functions to create parallel branches of execution.
One of the parallel branches starts another step function invocation, how can we pass input from this parallel branch to next step function execution
{
"Comment": "Parallel Example.",
"StartAt": "FunWithMath",
"States": {
"FunWithMath": {
"Type": "Parallel",
"End": true,
"Branches": [
{
"StartAt": "Add", /// This receives some json object here input {}
"States": {
"Add": {
"Type": "Task", ***//How to pass the received input to the following arn as input?***
"Resource": ""arn:aws:states:::states:startExecution",
Parameters: {
"StateMachineArn": "anotherstepfunctionarnpath"
}
"End": true
}
}
},
{
"StartAt": "Subtract",
"States": {
"Subtract": {
"Type": "Task",
"Resource": "some lambda arn here,
"End": true
}
}
}
]
}
}
}
anotherstepfunctionarnpath :
{
"Comment": "Second state machine",
"StartAt": "stage1",
"Resource": "arn:aws:states:::glue:startJobRun.sync",
"Parameters":{
"Arguments":{
"Variable1" :"???" / how to access the value of the input passed to here
}
}
}

You can use Input to pass output from one SFN to other one:
First SFN(It will call second SFN)
{
"Comment": "My first SFN",
"StartAt": "First SFN",
"States": {
"First SFN": {
"Type": "Task",
"ResultPath": "$.to_pass",
"Resource": "arn:aws:lambda:us-east-1:807278658150:function:test-lambda",
"Next": "Trigger Next SFN"
},
"Trigger Next SFN": {
"Type": "Task",
"Resource": "arn:aws:states:::states:startExecution",
"Parameters": {
"Input": {
"Comment.$": "$"
},
"StateMachineArn": "arn:aws:states:us-east-1:807278658150:stateMachine:MyStateMachine2"
},
"End": true
}
}
}
Second SFN (MyStateMachine2)
{
"Comment": "A Hello World example of the Amazon States Language using Pass states",
"StartAt": "Hello",
"States": {
"Hello": {
"Type": "Pass",
"Result": "Hello",
"Next": "World"
},
"World": {
"Type": "Pass",
"Result": "World",
"End": true
}
}
}
First SFN's Execution
Second SFN's Execution
Explanation
The Lambda test-lambda is returning:
{
"user": "stackoverflow",
"id": "100"
}
Which is stored in "ResultPath": "$.to_pass" here in to_pass variable. I am passing the same output to next state machine MyStateMachine2 which is done by
"Input": {
"Comment.$": "$"
}
In the next State Machine's execution you see that same data is received as input which was created by first Lambda.
You can read more about it here.

Related

Passing parameters to Glue Job using Step Function

I have a Step function that enables my glue jobs to
synchronously run by passing multiple parameters from event bridge which contains the job that will be running and its arguments but when I look to my glue they are running at the same time.
{
"Comment": "A description of my state machine",
"StartAt": "Pass",
"States": {
"Pass": {
"Type": "Pass",
"Next": "Map"
},
"Map": {
"Type": "Map",
"Iterator": {
"StartAt": "Glue StartJobRun_1",
"States": {
"Glue StartJobRun_1": {
"Type": "Task",
"Resource": "arn:aws:states:::glue:startJobRun.sync",
"Parameters": {
"JobName.$": "$.job_name",
"Arguments.$": "$.Arguments"
},
"End": true
}
}
},
"ItemsPath": "$.detail.config",
"End": true
}
}
}
The first glue job should finish first before I proceed with another job. Can you suggest what I can do to run them synchronously in sequence
{
"config": [
{
"job_name": "dev_1",
"Arguments": {
"--environment": "dev"
}
},
{
"job_name": "dev_2",
"Arguments": {
"--environment": "dev"
}
}
]
}
The Map state in your Step Functions workflow takes the input array and executes your states in the iterator in parallel (default 40 concurrent iterations).
To execute the Glue jobs in sequence, add "MaxConcurrency": 1 to the Map state. This will process items in the array synchronously and sequentially in the order of appearance.
Here's the modified Step Functions workflow definition
{
"Comment": "A description of my state machine",
"StartAt": "Pass",
"States": {
"Pass": {
"Type": "Pass",
"Next": "Map"
},
"Map": {
"Type": "Map",
"Iterator": {
"StartAt": "Glue StartJobRun_1",
"States": {
"Glue StartJobRun_1": {
"Type": "Task",
"Resource": "arn:aws:states:::glue:startJobRun.sync",
"Parameters": {
"JobName.$": "$.job_name",
"Arguments.$": "$.Arguments"
},
"End": true
}
}
},
"ItemsPath": "$.detail.config",
"End": true,
"MaxConcurrency": 1
}
}
}

State Machine Will Not Accept Input Path

I'm sure someone will point me to an immediate solution, but I've been at this for hours, so I'm just going to ask.
I cannot get a State Machine to accept an initial input. The intent is to set up an EventBridge trigger pointed at the State Machine with a static JSON passed to the SM to initiate with the proper parameters. In development, I'm just using Step Functions option to pass a JSON as the initial input when you select "New Execution".
This is the input:
{"event":{
"country": "countryA",
"landing_bucket": "aws-glue-countryA-inputs",
"landing_key": "countryA-Bucket/prefix/filename.csv",
"forecast_bucket": "aws-forecast-countryA",
"forecast_key": "inputs/",
"date_start": "2018-01-01",
"validation": "False",
"validation_size": 90
}
}
When looking at what is passed at the ExecutionStarted log entry:
{
"input": {
"country": "countryA",
"landing_bucket": "aws-glue-countryA-inputs",
"landing_key": "countryA-Bucket/prefix/filename.csv",
"forecast_bucket": "aws-forecast-countryA",
"forecast_key": "inputs/",
"date_start": "2018-01-01",
"validation": "False",
"validation_size": 90
}
,
"inputDetails": {
"truncated": false
},
"roleArn": "arn:aws:iam::a-valid-service-role"
}
This is the State Machine:
"Comment": "A pipeline!",
"StartAt": "Invoke Preprocessor",
"States": {
"Invoke Preprocessor": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"InputPath": "$.input",
"Parameters": {
"FunctionName": "arn:aws:lambda:my-lambda-arn:$LATEST"
},
"Next": "EndSM"
},
"EndSM": {
"Type": "Pass",
"Result": "Ended",
"End": true
}
}
}
I've tried nearly anything I can think of from changing the InputPath to assigning the "input" dictionary directly to a variable:
"$.event":"$.input"
To drilling down to the individual variables and assigning those directly like:
"$.country:"$.country". I've also used the new Step Functions Data Flow Simulator and can't get anywhere. If anyone has any thoughts, I'd really appreciate it.
Thanks!
Edited for correct solution:
You need to set the Payload.$ parameter to $. That will pass in the entire input object to the lambda.
{
"Comment": "A pipeline!",
"StartAt": "Invoke Preprocessor",
"States": {
"Invoke Preprocessor": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "arn:aws:lambda:my-lambda-arn:$LATEST",
"Payload.$": "$"
},
"Next": "EndSM"
},
"EndSM": {
"Type": "Pass",
"Result": "Ended",
"End": true
}
}
}
Another thing you could do is specify the input in the parameters, this will allow you to specify only all/certain parts of the json to pass in.
{
"Comment": "A pipeline!",
"StartAt": "Invoke Preprocessor",
"States": {
"Invoke Preprocessor": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"InputPath": "$",
"Parameters": {
"FunctionName": "arn:aws:lambda:my-lambda-arn:$LATEST",
"input_event.$": "$.event"
},
"Next": "EndSM"
},
"EndSM": {
"Type": "Pass",
"Result": "Ended",
"End": true
}
}
}
From the code perspective you could just reference it like so (python):
input = event['input_event']

AWS Step function error : There are Amazon States Language errors in your state machine definition. Fix the errors to continue

I'm new to AWS step functions.
Trying to create a basic ETL flow of glue jobs. Upon completion of state machine definition im able to see the graph being generated , but getting a generic error "There are Amazon States Language errors in your state machine definition. Fix the errors to continue",
error message
that is not allowing me to proceed.
Here is the code and graph :
{
"Comment": "DRC downstream glue jobs execution step function:slf_aws_can_dbisdel_everyone_drc_amp",
"StartAt": "startFlow",
"States": {
"Comment": "various state types of the Amazon States Language",
"startFlow": {
"Comment": "Pass states are useful when constructing and debugging state machines.",
"Type": "Pass",
"Next": "stg_ods"
},
"stg_ods": {
"Type": "Task",
"Resource": "arn:aws:states:::glue:startJobRun.sync",
"Parameters": {
"JobName": "stage_job_name"
},
"Next": "ods_job"
},
"ods_job": {
"Type": "Task",
"Resource": "arn:aws:states:::glue:startJobRun.sync",
"Parameters": {
"JobName": "main_job_name"
},
"Next": "Wait 3 sec"
},
"Wait 3 sec": {
"Comment": "A Wait state delays the state machine from continuing for a specified time.",
"Type": "Wait",
"Seconds": 3,
"Next": "parallel_stg_adr"
},
"parallel_stg_adr": {
"Comment": "A Parallel state can be used to create parallel branches of execution in your state machine.",
"Type": "Parallel",
"Branches": [
{
"StartAt": "stg_job1",
"States": {
"stg_job1": {
"Type": "Task",
"Resource": "arn:aws:states:::glue:startJobRun.sync",
"Parameters": {
"JobName": "stg_job_name1"
},
"End": true
}
}
},
{
"StartAt": "stg_job2",
"States": {
"stg_job2": {
"Type": "Task",
"Resource": "arn:aws:states:::glue:startJobRun.sync",
"Parameters": {
"JobName": "stg_job_name2"
},
"End": true
}
}
}
],
"Next": "parallel_adr_job"
},
"parallel_adr_job": {
"Comment": "A Parallel state can be used to create parallel branches of execution in your state machine.",
"Type": "Parallel",
"Branches": [
{
"StartAt": "job1",
"States": {
"job1": {
"Type": "Task",
"Resource": "arn:aws:states:::glue:startJobRun.sync",
"Parameters": {
"JobName": "some_glue_job",
"Arguments": {
"--target_table": "some_string_table",
"--calendar_key": "some_string"
}
},
"End": true
}
}
},
{
"StartAt": "job2",
"States": {
"job2": {
"Type": "Task",
"Resource": "arn:aws:states:::glue:startJobRun.sync",
"Parameters": {
"JobName": "some_glue_job",
"Arguments": {
"--target_table": "some_string_table",
"--calendar_key": "some_string"
}
},
"End": true
}
}
}
],
"Next": "end_job"
},
"end_job": {
"Type": "Pass",
"End": true
}
}
}
Step function graph
"Comment": "various state types of the Amazon States Language",
This one at Line 5 seems to be incorrect. "States" map cannot have a "Comment" key. Remove it and then try. Rest of the config looks correct.
Edit 1
If the type of Step Function is Express, ".sync" functions won't work. Try changing the ARN to
"Resource": "arn:aws:states:::glue:startJobRun"
and you should be able to save your Step Function. You will then have to figure out how to setup a different Glue task.

AWS Step-Function: pass a specific value from one AWS lambda to another in step function parallel state

I have the below state machine. The requirement is to have a lambda to query DB and get all the ids. Next I have a parallel state call that calls more than five lambdas at once. Instead of passing all the ids fetched to all the lambdas, I need to pass the respective ids to each lambda.
In the below state language, first call is DB_CALL, lets say it returns {id1, id2, id3, id4, id5, id6}, I want to pass only id1 to First_Lambda and id2 to Second_Lambda etc...
The entire id object should get passed to all lambdas. Please suggest a way to achieve this.
{
"Comment": "Concurrent Lambda calls",
"StartAt": "StarterLambda",
"States": {
"StarterLambda": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:DB_CALL",
"Next": "ParallelCall"
},
"State": {
"ParallelCall": {
"Type": "Parallel",
"End": true,
"Branches": [
{
"StartAt": "First",
"States": {
"First": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:First_Lambda",
"TimeoutSeconds": 120,
"End": true
}
}
},
{
"StartAt": "Second",
"States": {
"Second": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:Second_Lambda",
"Retry": [ {
"ErrorEquals": ["States.TaskFailed"],
"IntervalSeconds": 1,
"MaxAttempts": 2,
"BackoffRate": 2.0
} ],
"End": true
}
}
},
{
"StartAt": "Third",
"States": {
"Third": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:Third_Lambda",
"Catch": [ {
"ErrorEquals": ["States.TaskFailed"],
"Next": "CatchHandler"
} ],
"End": true
},
"CatchHandler": {
"Type": "Pass",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:CATCH_HANDLER",
"End": true
}
}
},
{
"StartAt": "Fourth",
"States": {
"Fourth": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:Fourth_Lambda",
"TimeoutSeconds": 120,
"End": true
}
}
},
{
"StartAt": "Fifth",
"States": {
"Fifth": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:Fifth_Lambda",
"TimeoutSeconds": 120,
"End": true
}
}
},
{
"StartAt": "Sixth",
"States": {
"Sixth": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:Sixth_Lambda",
"TimeoutSeconds": 120,
"End": true
}
}
}
}
]
}
}
}
}
You can use Step Function parameter option.
This would allow you to send specific value or json to next lambda.
"Parameters": {
"toprocess.$": "$.MetaData.CorrelationId"
},
So input to this lambda would be smaller dto than compared to you first lambda. So while returning value from this lambda avoid assigning it back to Step function result.
"OutputPath": "$",
"ResultPath": "$.PartialResutl",
What you are looking for is the Map State. With this state, you pass in the iterator, in your case the path to the ids. The map state will run once for each item in the list. Within the map state, you have a full state machine, so you can call a Lambda or any other state. It has controls to limit how many are running at once if that is needed.

Parallel States Merge the output in Step Function

Is it possible to have following kind of Step Function graph, i.e. from 2 parallel state output, one combined state:
If yes, what would json for this looks like? If not, why?
A parallel task always outputs an array (containing one entry per branch).
You can tell AWS step functions to append the output into new (or existing) property in the original input with "ResultPath": "$.ParallelOut" in your parallel state definition, but this is not what you seem to be trying to achieve.
To merge the output of parallel task, you can leverage the "Type": "Pass" state to define transformations to apply to the JSON document.
For example, in the state machine below, I'm transforming a JSON array...
[
{
"One": 1,
"Two": 2
},
{
"Foo": "Bar",
"Hello": "World"
}
]
...into a few properties
{
"Hello": "World",
"One": 1,
"Foo": "Bar",
"Two": 2
}
{
"Comment": "How to convert an array into properties",
"StartAt": "warm-up",
"States": {
"warm-up": {
"Type": "Parallel",
"Next": "array-to-properties",
"Branches": [
{
"StartAt": "numbers",
"States": {
"numbers": {
"Type": "Pass",
"Result": {
"One": 1,
"Two" : 2
},
"End": true
}
}
},
{
"StartAt": "words",
"States": {
"words": {
"Type": "Pass",
"Result": {
"Foo": "Bar",
"Hello": "World"
},
"End": true
}
}
}
]
},
"array-to-properties": {
"Type": "Pass",
"Parameters": {
"One.$": "$[0].One",
"Two.$": "$[0].Two",
"Foo.$": "$[1].Foo",
"Hello.$": "$[1].Hello"
},
"End": true
}
}
}
It is possible as opposed diagram below
The parallel state should look like this
"MyParallelState": {
"Type": "Parallel",
"InputPath": "$",
"OutputPath": "$",
"ResultPath": "$.ParallelResultPath",
"Next": "SetCartCompleteStatusState",
"Branches": [
{
"StartAt": "UpdateMonthlyUsageState",
"States": {
"UpdateMonthlyUsageState": {
"Type": "Task",
"InputPath": "$",
"OutputPath": "$",
"ResultPath": "$.UpdateMonthlyUsageResultPath",
"Resource": "LambdaARN",
"End": true
}
}
},
{
"StartAt": "QueueTaxInvoiceState",
"States": {
"QueueTaxInvoiceState": {
"Type": "Task",
"InputPath": "$",
"OutputPath": "$",
"ResultPath": "$.QueueTaxInvoiceResultPath",
"Resource": "LambdaARN",
"End": true
}
}
}
The output of MyParallelState will be populated as in array, from each state in the Parallel state. They are populated within ParallelResultPath object and will be passed into the Next state
{
"ParallelResultPath": [
{
"UpdateMonthlyUsageResultPath": Some Output
},
{
"QueueTaxInvoiceResultPath": Some Output
}
]
}
We can use ResultSelector and Result Path to combine the result into one object
We have a parallel state like:
{
"StartAt": "ParallelBranch",
"States": {
"ParallelBranch": {
"Type": "Parallel",
"ResultPath": "$",
"InputPath": "$",
"OutputPath": "$",
"ResultSelector": {
"UsersResult.$": "$[1].UsersUpload",
"CustomersResult.$": "$[0].customersDataUpload"
},
"Branches": [
{
"StartAt": "customersDataUpload",
"States": {
"customersDataUpload": {
"Type": "Pass",
"ResultPath": "$.customersDataUpload.Output",
"Result": {
"CompletionStatus": "success",
"CompletionDetails": null
},
"Next": "Wait2"
},
"Wait2": {
"Comment": "A Wait state delays the state machine from continuing for a specified time.",
"Type": "Wait",
"Seconds": 2,
"End": true
}
}
},
{
"StartAt": "UsersUpload",
"States": {
"UsersUpload": {
"Type": "Pass",
"Result": {
"CompletionStatus": "success",
"CompletionDetails": null
},
"ResultPath": "$.UsersUpload.Output",
"Next": "Wait1"
},
"Wait1": {
"Comment": "A Wait state delays the state machine from continuing for a specified time.",
"Type": "Wait",
"Seconds": 1,
"End": true
}
}
}
],
"End": true
}
},
"TimeoutSeconds": 129600,
"Version": "1.0"
}
enter image description here
And the output will be like:
{
"UsersResult": {
"Output": {
"CompletionStatus": "success",
"CompletionDetails": null
}
},
"CustomersResult": {
"Output": {
"CompletionStatus": "success",
"CompletionDetails": null
}
}
}
Your diagram is technically wrong because no state can set multiple states to its Next task. You cannot start State Machine as StartAt by providing multiple State names. Also, even if it was possible I don't see any point why would you want to run two parallel states as opposed to one parallel state with all the sub states that you would split into two.
This worked for me
"Transform And Freeze": {
"Type": "Parallel",
"InputPath": "$",
"Branches": [
{
"StartAt": "Transform Status",
"States": {
"Transform Status": {
"Type": "Map",
"ItemsPath": "$",
"MaxConcurrency": 25,
"Iterator": {
"StartAt": "Transform",
"States": {
"Transform": {
"Type": "Task",
"Resource": "${TransformFunction}",
"End": true
}
}
},
"End": true
}
}
},
{
"StartAt": "Freeze Status",
"States": {
"Freeze Status": {
"Type": "Map",
"MaxConcurrency": 25,
"Iterator": {
"StartAt": "Freeze",
"States": {
"Freeze Transactions": {
"Type": "Task",
"Resource": "${FreezeFunction}",
"End": true
}
}
},
"End": true
}
}
}
],
"ResultPath" : "$.parts",
"Next": "SetParallelOutput",
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"ResultPath": "$.exception",
"Next": "Error Handler"
}
]
},
"SetParallelOutput": {
"Type": "Pass",
"Parameters": {
"foo.$": "$.foo",
"bar.$": "$.bar",
"parts.$": "$.parts[0]"
},
"Next": "Target Type"
},