Step Functions - Access State from previous Map Iteration - amazon-web-services

How can I get the results from previous Map Iterations in the next iteration when using MaxConcurrency: 1 in Amazon Step Functions?
Here's an example of the code I have
{
"StartAt": "UploadUsers",
"States": {
"UploadUsers": {
"Type": "Map",
"MaxConcurrency": 1,
"ItemsPath": "$.data.users",
"Parameters": {
"data.$": "$$.Map.Item.Value.data",
"friends.$": "$.?????? Get created users ids"
},
"Iterator": {
"StartAt": "UploadUser",
"States": {
"UploadUser": {
"End": true,
"Parameters": {
"FunctionName": "${FnUploadUser}",
"Payload": {
"data.$": "$.user_data",
"friends.$": "$.??????"
}
},
"Resource": "arn:aws:states:::lambda:invoke.waitForTaskToken",
"ResultPath": "$.data. ???",
"Type": "Task"
}
}
},
"End": true,
"ResultPath": "$.data.UploadUsers",
"ResultSelector": {
"result.$": "$"
}
}
}
}
Suppose FnUploadUser is a lambda that returns the id of the created user.
And I want to get the ids of the previously created users and use that value for the next user I'm about to create.

You can't. Map State iterations don't share state. Two workarounds:
(1) Manage the shared state externally: Each Map iteration writes and reads from, say, a DynamoDB table.
(2) Refactor to a "for" loop and keep the shared state in the execution output.
Instead of using Map, insert a Choice State (after UploadUser) that checks for a "done" condition. If "done", finish, else loop back to UploadUser.
UploadUser accepts the user_data array as input. It appends its output to, say, the uploaded output array.
Each UploadUser iteration identifies the next user_data item by comparing it to the uploaded array. The iteration that processes the last item can also output done: true to signal to Choice that work is done.
The Choice State loops back to UploadUser while there are more to process (i.e. while done is not present).
There are other ways to build steps 2-3. For instance, you could add next_item and total_items keys on the output to keep track of progress. The important point is that Choice loops until an exit condition is met.

Related

AWS Step Function - Dynamic parallelism MaxConcurrency Field

We are using Step functions dynamic parallelism with Map state to achieve concurrency.
Is it possible to pass the values to "MaxConcurrency" field from its upstream task(lambda or read from file ) in map state stepfunctions .
Current code:
"Type": "Map",
"InputPath": "$.detail",
"ItemsPath": "$.shipped",
"MaxConcurrency": 3,
"ResultPath": "$.detail.shipped",
Expectation(pass input to MaxConcurrency from lambda task or read file task ):
"Type": "Map",
"InputPath": "$.detail",
"ItemsPath": "$.shipped",
"MaxConcurrency": "$.input",
"ResultPath": "$.detail.shipped"
Getting error as it supports only integer.
You can set maxConcurrency at state machine definition-time, but not at execution-time. As you experienced, Map's maxConcurrency expects a number, but the state machine language uses strings to pass variables dynamically.
(Note: Step Functions are concurrent by default. Docs: maxConcurrency's "default value is 0, which places no quota on parallelism and iterations are invoked as concurrently as possible".)
Option 1: Choice + Map (Difficulty: Low)
A workaround that limits concurrency dynamically at execution-time is a Choice state that branches to discrete map states based on an input variable. Each branch's Map has a different maxConcurrency, but is otherwise identical. Add whichever discrete maxConcurrency choices you need. Choice also accepts a default case to catch unmatched choice input.
// execution input
{
"concurrency": 5,
"jobs": [ { "jobId": 1 }, { "jobId": 2 }, { "jobId": 3 }, { "jobId": 4}]
}
// state machine definition (partial)
"States": {
"Max-Concurrency-Choice": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.concurrency",
"NumericEquals": 5,
"Next": "MapState-MaxConcurrency-5" // maxConcurrency in this branch is set at 5
},
{
"Variable": "$.concurrency",
"NumericEquals": 10,
"Next": "MapState-MaxConcurrency-10" // maxConcurrency in this branch is set at 10
}
],
"Default": "MapState-MaxConcurrency-1" // maxConcurrency in the default branch is set at 1
},
Option 2: Nested Sfn + API Call (Difficulty: High)
Nest your Sfn in a new Sfn. The new, parent Sfn takes a maxConcurrency in the input. It has two tasks:
In a Lambda Task, call the UpdateStateMachine API with new stringified JSON state machine definition for your current, child Sfn.
Invoke your current State Machine. The Sfn will have the new maxConurrency.

Is it possible to iterate through a DynamoDB table within a step function's map state?

Just what the title says, basically. I have read through the documentation:
https://docs.aws.amazon.com/step-functions/latest/dg/connect-ddb.html
This describes how to get a single item of information out of a DynamoDB table from a step function. What I would like to do is iterate through the entire table and start execution of another state machine for each item. Each new state machine would have an individual item as input. I have attempted the following code, which unfortunately is not functional:
{
"StartAt": "OuterFunction",
"States": {
"OuterFunction": {
"Type": "Map",
"Iterator": {
"StartAt": "InnerFunction",
"States": {
"InnerFunction": {
"Type": "Task",
"Resource": "arn:aws:states:::dynamodb:getItem.sync",
"Parameters": {
"StateMachineArn":"other-state-machine-arn",
"TableName": "TestTable"
},
"End": true
}
}
},
"End": true
}
}
}
Is it actually possible to iterate through a DynamoDB table in this way?
You are now able to call DynamoDB directly from step functions. This includes the query and scan operations. With the result, you can then iterate through the items. The one less convenient, caveat is that it does not use the document client, so the results are in the dynamodb json format.
https://docs.aws.amazon.com/step-functions/latest/dg/connect-ddb.html
No, getItem is designed to fetch particular DynamoDB document. You need to write custom Lambda that will .query() or .scan() your table and then use Map step to iterate over results (most likely you won't need getItem at that time, because you can load all data with the query/scan operation).

Is there a way to interpolate OutputPath's JsonPath using state's input in AWS step function?

Basically, i have the following input:
{
"name": "abc",
"choice": "choice1"
}
My dynamoDB table has the following structure:
Partition key - "name"
Complex json with choices:
{
"choices":
{
"choice1": ......,
"choice2": ......
}
}
I want to directly read from dynamodb, and get a subitem under the relevant choice:
{
"StartAt": "Read Next Message from DynamoDB",
"States": {
"Read Next Message from DynamoDB": {
"Type": "Task",
"Resource": "arn:aws:states:::dynamodb:getItem",
"Parameters": {
"TableName": "my_table",
"Key": {
"customerName": {"S.$": "$.name"}
}
},
"OutputPath": "$.Item.choices.M.choice1.M.myvalue.S",
"Next": "World"
},
"World": {
"Type": "Pass",
"End": true
}
}
}
basically i want to do something like "$.Item.choices.M.{$.choice}.M.myvalue.S", and take one of the output's keys from the input. is this possible?
I think what you're looking for is JsonPath interpolation, but that is not supported as per this thread on AWS forums.
As far as I know Step Functions allow only path reference through $, . and [] operators (Reference Path).
I don't know how much control you have on the DynamoDB table's data but I think your problem can be solved easily if your choice types are modeled in following way
{
"choices": [{
"choiceType": "choice1",
........
},
{
"choiceType": "choice2",
........
}]
}
Now you can use the map state to iterate over the choices array. Note that don't forget to pass the expected choiceType to each iteration.
First state of the map iterator can be a choice state which compares choiceType and moves to appropriate next state. So, basically your rest of the workflow is modeled as iterator of the map state in step 1.
Now, if you don't have the control over DynamoDB table, then you can process the query result in an AWS Lambda.

Pass multiple inputs into Map State in AWS Step Function

I am trying to use AWS Step Functions to trigger operations many S3 files via Lambda. To do this I am invoking a step function with an input that has a base S3 key of the file and part numbers each file (each parallel iteration would operate on a different S3 file). The input looks something like
{
"job-spec": {
"base_file_name": "some_s3_key-",
"part_array": [
"part-0000.tsv",
"part-0001.tsv",
"part-0002.tsv", ...
]
}
}
My Step function is very simple, takes that input and maps it out, however I can't seem to get both the file and the array as input to my lambda. Here is my step function definition
{
"Comment": "An example of the Amazon States Language using a map state to process elements of an array with a max concurrency of 2.",
"StartAt": "Map",
"States": {
"Map": {
"Type": "Map",
"ItemsPath": "$.job-spec",
"ResultPath": "$.part_array",
"MaxConcurrency": 2,
"Next": "Final State",
"Iterator": {
"StartAt": "My Stage",
"States": {
"My Stage": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "arn:aws:lambda:us-east-1:<>:function:some-lambda:$LATEST",
"Payload": {
"Input.$": "$.part_array"
}
},
"End": true
}
}
}
},
"Final State": {
"Type": "Pass",
"End": true
}
}
}
As written above it complains that that job-spec is not an array for the ItemsPath. If I change that to $.job-spec.array I get the array I'm looking for in my lambda but the base key is missing.
Essentially I want each python lambda to get the base file key and one entry from the array to stitch together the complete file name. I can't just put the complete file names in the array due to the limit limit of how much data I can pass around in Step Functions and that also seems like a waste of data
It looks like the Parameters value can be used for this but I can't quite get the syntax right
Was able to finally get the syntax right.
"ItemsPath": "$.job-spec.part_array",
"Parameters": {
"part_name.$": "$$.Map.Item.Value",
"base_file_name.$": "$.job-spec.base_file_name"
},
It seems that Parameters can be used to create custom inputs for each stage. The $$ is accessing the context of the stage and not the actual input. It appears that ItemsPath takes the array and puts it into a context which can be used later.
UPDATE Here is some AWS Documentation showing this being used from the comments below

How does the MaxConcurrency attribute work for the Map Task in AWS Step Functions?

Update: Creating a step function from the Map State step template and running that also throws an error. This is strong evidence that the MaxConcurrency attribute together with the Parameters value is not working.
I am not able to use the MaxConcurrency attribute successfully in the step function definition.
This can be demonstrated by using the example provided in the documentation for the Map Task (new as of 18 sept 2019):
{
"StartAt": "ExampleMapState",
"States": {
"ExampleMapState": {
"Type": "Map",
"MaxConcurrency": 2,
"Parameters": {
"ContextIndex.$": "$$.Map.Item.Index",
"ContextValue.$": "$$.Map.Item.Value"
},
"Iterator": {
"StartAt": "TestPass",
"States": {
"TestPass": {
"Type": "Pass",
"End": true
}
}
},
"End": true
}
}
}
By executing the step function with the following input:
[
{
"who": "bob"
},
{
"who": "meg"
},
{
"who": "joe"
}
]
We can observe in the Execution event history that we get:
ExecutionStarted
MapStateEntered
MapStateStarted
MapIterationStarted (index 0)
MapIterationStarted (index 1)
PassStateEntered (index 0)
PassStateExited (index 0)
MapIterationSucceeded (index 0)
ExecutionFailed
The step function fails.
The ExecutionFailed step has the following output (execution id omitted):
{
"error": "States.Runtime",
"cause": "Internal Error (omitted)"
}
Trying to catch the error with a Catch step has no effect.
What am I doing wrong here? Is this a bug?
Response to a private ticket submitted to AWS this morning;
Thank you for contacting AWS Premium Support. My name is Akanksha and
I will be assisting you with this case.
I understand that you have been working with the new Map state feature
of step functions and have noticed that when we use Parameters along
with MaxConcurrency set to lower value than the number of iterations
(with only first iteration successful) it fails with ‘States.Runtime’
and looks like a bug with the functionality.
Thank you for providing the details. It helped me during
troubleshooting. In order to confirm the behavior, I used the below
state machine example with Pass:
{
"StartAt": "Map State",
"TimeoutSeconds": 3600,
"States": {
"Map State": {
"Type": "Map”,
"Parameters": {
“ContextValue.$”: "$$.Map.Item.Value"
},
"MaxConcurrency": 1,
"Iterator": {
"StartAt": "Run Task",
"States": {
"Run Task": {
"Type": "Pass",
"End": true
}
}
},
"Next": "Final State"
},
"Final State": {
"Type": "Pass",
"End": true
}
} }
I tested with multiple input lists and MaxConcurrency values and below
are my observations:
Input size list: 4 MaxConcurrency:1/2/3 - Fails and MaxConcurrency:0/4/5 or above - Works
Input size list: 3 MaxConcurrency: 1/2 - Fails and MaxConcurrency:0/3/4 or above - Works
Similarly, I performed tests by removing the parameters from state machine as well and could see that it works as expected with different
MaxConcurrency values.
I also tested the same by changing the Task type of “Pass” with “Lambda” and observed the same behavior.
Hence, I can confirm that the state machine fails when we have
parameters in the code and specify MaxConcurrency value as anything
other than zero or the number greater than or equal to the list size.
After doing some research regarding this behavior to check if this is
intended, I could not find much information regarding the same as this
is a new feature. So, I will be reaching out to the internal team with
all the details and the example state machine that you have provided.
Thank you for bringing this to our notice. I will get back to you as
soon as I have an update from the internal team. Please be assured
that I will regularly follow up with the team and work with them to
investigate further.
Meanwhile, if you have any other queries or concerns, please do let me
know.
Have a great day ahead!
I will update here when I get more information.