Step Function nested Map step in Map step - amazon-web-services

I have been working on step functions for a couple of weeks now. I am using map state in my step functions to iterate over an array. The array has an additional inner array as well, thus I would like to employ an addition map step in the "outer" map state. AWS documentation does not go into this level of details (as of now), therefore, I wanted to share that I managed to make it work.

This is how I managed to nest map steps:
"OuterMapState": {
"Type": "Map",
"ItemsPath": "$.shipped",
"MaxConcurrency": 0,
"Iterator": {
"StartAt": "Validate",
"States": {
"Validate": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:ship-val",
"Next": "InnerMapState"
},
"InnerMapState": {
"Type": "Map",
"ItemsPath": "$.shipped.innerArray",
"MaxConcurrency": 0,
"Iterator": {
"StartAt": "doSomething",
"States": {
"doSomething": {
"Type": "Pass",
"End": true
}
}
}
},
"End": true
}
},
"End": true
}
Good luck on using AWS Step functions.

Related

Can I access the TaskToken from a Map state with ItemSelector where the iteration step uses lambda:invoke.waitForTaskToken?

I am using AWS step function to iterate over a list in an input document where for each iteration, I need to invoke an external service. So I want to iterate over each item and run a step using lambda:invoke.waitForTaskToken and pass the TaskToken into the execution of each iteration.
The problem I'm running into is how to use both an ItemSelector at the Map state level but also inject the TaskToken during the internal step. I need to use an ItemSelector because I want each item to also contain information from the input to Map state. The AWS Docs state:
The ItemSelector field replaces the Parameters field within the Map state. If you use the Parameters field in your Map state definitions to create custom input, we highly recommend that you replace them with ItemSelector.
But they also say:
During an execution, the context object is populated with relevant data for the Parameters field from where it is accessed. The value for a Task field is null if the Parameters field is outside of a task state.
These two statements seem to imply that what I'm trying to do is impossible.
So, what I want is something like:
{
"StartAt": "ExampleMapState",
"States": {
"ExampleMapState": {
"Type": "Map",
"ItemsPath": "$.items",
"ItemSelector": {
"dynamic.$": "$.dynamic",
"ContextIndex.$": "$$.Map.Item.Index",
"ContextValue.$": "$$.Map.Item.Value"
},
"ItemProcessor": {
"ProcessorConfig": {
"Mode": "INLINE"
},
"StartAt": "TestPass",
"States": {
"TestPass": {
"Type": "Task",
"Parameters": {
"FunctionName": "arn:aws:lambda:us-west-2:123456789012:function:echo-lambda",
"Payload": {
"item.$": "$",
"token.$": "$$.Task.Token"
}
},
"Resource": "arn:aws:states:::lambda:invoke.waitForTaskToken",
"End": true
}
}
},
"End": true
}
}
}
But this doesn't work because the ItemSelector overrides the Payload of the internal TestPass state. Is there a way to get this to work?
ETA: I figured I would try putting $$.Task.Token in ItemSelector just in case it would magically work but it ended up throwing an error because $$.Task does not exist in the context object at that level.
Example with this (invalid) configuration:
{
"StartAt": "ExampleMapState",
"States": {
"ExampleMapState": {
"Type": "Map",
"ItemsPath": "$.items",
"ItemSelector": {
"dynamic.$": "$.dynamic",
"ContextIndex.$": "$$.Map.Item.Index",
"ContextValue.$": "$$.Map.Item.Value",
"token.$": "$$.Task.Token"
},
"ItemProcessor": {
"ProcessorConfig": {
"Mode": "INLINE"
},
"StartAt": "TestPass",
"States": {
"TestPass": {
"Type": "Task",
"Parameters": {
"FunctionName": "arn:aws:lambda:us-west-2:123456789012:function:echo-lambda"
},
"Resource": "arn:aws:states:::lambda:invoke.waitForTaskToken",
"End": true
}
}
},
"End": true
}
}
}
Based on my research I don't think what I'm trying to do is possible. What I ended up implementing is a workaround. I modified the function providing input to this step function to put the dynamic info that I needed into every item in the list I am iterating over. So my step function definition now looks something like this
{
"StartAt": "ExampleMapState",
"States": {
"ExampleMapState": {
"Type": "Map",
"ItemsPath": "$.items",
"ItemProcessor": {
"ProcessorConfig": {
"Mode": "INLINE"
},
"StartAt": "TestPass",
"States": {
"TestPass": {
"Type": "Task",
"Parameters": {
"FunctionName": "arn:aws:lambda:us-west-2:123456789012:function:echo-lambda",
"Payload": {
"item.$": "$",
"token.$": "$$.Task.Token"
}
},
"Resource": "arn:aws:states:::lambda:invoke.waitForTaskToken",
"End": true
}
}
},
"End": true
}
}
}
And an example input to this step function looks like:
{
"dynamic": "info",
"items": [
{
"dynamic": "info",
"resize": "true",
"format": "jpg"
},
{
"dynamic": "info",
"resize": "false",
"format": "png"
},
{
"dynamic": "info",
"resize": "true",
"format": "jpg"
}
]
}
It's not great because I have to repeat info into every item ahead of time but it works.

Passing parameters to Glue Job using Step Function

I have a Step function that enables my glue jobs to
synchronously run by passing multiple parameters from event bridge which contains the job that will be running and its arguments but when I look to my glue they are running at the same time.
{
"Comment": "A description of my state machine",
"StartAt": "Pass",
"States": {
"Pass": {
"Type": "Pass",
"Next": "Map"
},
"Map": {
"Type": "Map",
"Iterator": {
"StartAt": "Glue StartJobRun_1",
"States": {
"Glue StartJobRun_1": {
"Type": "Task",
"Resource": "arn:aws:states:::glue:startJobRun.sync",
"Parameters": {
"JobName.$": "$.job_name",
"Arguments.$": "$.Arguments"
},
"End": true
}
}
},
"ItemsPath": "$.detail.config",
"End": true
}
}
}
The first glue job should finish first before I proceed with another job. Can you suggest what I can do to run them synchronously in sequence
{
"config": [
{
"job_name": "dev_1",
"Arguments": {
"--environment": "dev"
}
},
{
"job_name": "dev_2",
"Arguments": {
"--environment": "dev"
}
}
]
}
The Map state in your Step Functions workflow takes the input array and executes your states in the iterator in parallel (default 40 concurrent iterations).
To execute the Glue jobs in sequence, add "MaxConcurrency": 1 to the Map state. This will process items in the array synchronously and sequentially in the order of appearance.
Here's the modified Step Functions workflow definition
{
"Comment": "A description of my state machine",
"StartAt": "Pass",
"States": {
"Pass": {
"Type": "Pass",
"Next": "Map"
},
"Map": {
"Type": "Map",
"Iterator": {
"StartAt": "Glue StartJobRun_1",
"States": {
"Glue StartJobRun_1": {
"Type": "Task",
"Resource": "arn:aws:states:::glue:startJobRun.sync",
"Parameters": {
"JobName.$": "$.job_name",
"Arguments.$": "$.Arguments"
},
"End": true
}
}
},
"ItemsPath": "$.detail.config",
"End": true,
"MaxConcurrency": 1
}
}
}

AWS step function map task parameters

I have a step function with a map task, as known the map have to work on an array from the ItemsPath, how can i pass the whole input to the lambda and not only the array.
{"StartAt": "Find","States": {
"Find": {
"Type": "Map",
"MaxConcurrency": 0,
"InputPath": "$",
"ItemsPath": "$.Payload.contacts",
"Iterator": {
"StartAt": "func",
"States": {
"func": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:....",
"Parameters": {
"Input": {
"Payload":{
"contact.$": "$"
}
}
},
"End": true
}
}
},
"ResultPath": "$.Input",
"End": true
}}} ,
i want the whole input to be passed in the event parameter
If you use Iterator it will pass the values from ItemsPath as input to Lambda. You can use the Parameters block to transform the input to the lambda and add the whole input. I haven't tried it myself but I'm pretty sure that should do it.
https://docs.aws.amazon.com/step-functions/latest/dg/amazon-states-language-intrinsic-functions.html

AWS Step Function where Map input comes from Execution Input

I'm attempting to create an AWS step function with a Map state whose input (that is, the array to iterate over) comes from the Execution Input. A reduce example JSON step function looks like:
{
"StartAt": "pass",
"States": {
"pass": {
"Type": "Pass",
"Next": "map-sleep"
},
"map-sleep": {
"MaxConcurrency": 5,
"InputPath": "$$.Execution.Input['data']",
"Iterator": {
"StartAt": "wait",
"States": {
"wait": {
"SecondsPath": "$['length']",
"Type": "Wait",
"End": true
}
}
},
"Type": "Map",
"Next": "final-wait"
},
"final-wait": {
"Seconds": 10,
"Type": "Wait",
"End": true
}
}
}
However, when I attempt to create this, I'm greeted by the error:
An error occurred (InvalidDefinition) when calling the CreateStateMachine operation: Invalid State Machine Definition: 'SCHEMA_VALIDATION_FAILED: Value must be a valid JSONPath. at /States/map-sleep/InputPath'
I infer from this that the InputPath is wrong, but I don't quite understand why, or what the correct way to express what I'm trying to do is. (This code was generated using the Python Step Functions SDK, and if it's helpful I can share this code, but I figured reducing it to the JSON would make it easier to consider).
Well, I strongly suspect this is not the optimal answer, but it looks like by combining a Pass node using Parameters with a Map node, you can get the desired outcome:
"map-sleep-pass": {
"Parameters": {
"items.$": "$$.Execution.Input['data']"
},
"Type": "Pass",
"Next": "map-sleep"
},
"map-sleep": {
"MaxConcurrency": 5,
"InputPath": "$.items",
"Iterator": {
"StartAt": "wait",
"States": {
"wait": {
"SecondsPath": "$['length']",
"Type": "Wait",
"End": true
}
}
},
"Type": "Map",
"Next": "final-wait"
},

Can AWS Step Function describe this kind of dataflow?

It can not be described with Parallel State in AWS Step Function.
B and C should be in parallel.
C sends messages to both D and E.
D and E should be in parallel.
{
"StartAt": "A",
"States": {
"A": {
"Type": "Pass",
"Next": "Parallel State 1"
},
"Parallel State 1": {
"Type": "Parallel",
"Branches": [{
"StartAt": "B",
"States": {
"B": {
"Type": "Pass",
"End": true
}
}
},
{
"StartAt": "C",
"States": {
"C": {
"Type": "Pass",
"End": true
}
}
}
],
"Next": "Parallel State 2"
},
"Parallel State 2": {
"Type": "Parallel",
"Branches": [{
"StartAt": "D",
"States": {
"D": {
"Type": "Pass",
"End": true
}
}
},
{
"StartAt": "E",
"States": {
"E": {
"Type": "Pass",
"End": true
}
}
}
],
"Next": "F"
},
"F": {
"Type": "Pass",
"End": true
}
}
}
Answer is No , inside step function no state can set multiple states (invokes both successors)to its Next task. As per AWS step function cannot start State Machine as StartAt by providing multiple State names.
You can tweak your logic and use The Parallel state and achive same ,If you share your usecase may be help to solve problems.
How to specify multiple result path values in AWS Step Functions
A Parallel state provides each branch with a copy of its own input
data (subject to modification by the InputPath field). It generates
output that is an array with one element for each branch, containing
the output from that branch.
https://aws.amazon.com/blogs/aws/new-step-functions-support-for-dynamic-parallelism/
Example of state function
{
"Comment": "An example of the Amazon States Language using a choice state.",
"StartAt": "FirstState",
"States": {
"FirstState": {
"Type": "Task",
"Resource": "arn:aws:lambda:REGION:ACCOUNT_ID:function:FUNCTION_NAME",
"Next": "ChoiceState"
},
"ChoiceState": {
"Type" : "Choice",
"Choices": [
{
"Variable": "$.foo",
"NumericEquals": 1,
"Next": "FirstMatchState"
},
{
"Variable": "$.foo",
"NumericEquals": 2,
"Next": "SecondMatchState"
}
],
"Default": "DefaultState"
},
"FirstMatchState": {
"Type" : "Task",
"Resource": "arn:aws:lambda:REGION:ACCOUNT_ID:function:OnFirstMatch",
"Next": "NextState"
},
"SecondMatchState": {
"Type" : "Task",
"Resource": "arn:aws:lambda:REGION:ACCOUNT_ID:function:OnSecondMatch",
"Next": "NextState"
},
"DefaultState": {
"Type": "Fail",
"Error": "DefaultStateError",
"Cause": "No Matches!"
},
"NextState": {
"Type": "Task",
"Resource": "arn:aws:lambda:REGION:ACCOUNT_ID:function:FUNCTION_NAME",
"End": true
}
}
}
https://docs.aws.amazon.com/step-functions/latest/dg/connect-to-resource.html#connect-wait-example
https://sachabarbs.wordpress.com/2018/10/30/aws-step-functions/
As I answered in How to simplify complex parallel branch interdependencies for Step Functions, what you asked is better to be modeled as DAG but not state machine.
Depends on your use case, you might be able to workaround it (just as #horatiu-jeflea 's answer), but it's a workaround (not the straightforward way) anyway.