Merging JSON outputs of parallel states in Step Function - amazon-web-services

I have a parallel block in a Step Function. The input to the parallel block looks like
{ "a": null, "b": null }
The 2 parallel states are populateA and populateB. The output from populateA is
{ "a": "A", "b": null }
And the output from populateB is
{ "a": null, "b": "B" }
The output from the parallel block is an array
[ { "a": "A", "b": null }, { "a": null, "b": "B" } ]
How do I merge the two into
{ "a": "A", "b": "B" }

If PopulateA always only populates the value of a and PopulateB only the value of b, you can use a Pass state with Parameters (instead of the Result field) after your Parallel State to combine the inputs. Parameters allow you to transform your input by specifying key-value pairs using JSONPath.
Here is an example State Machine:
{
"StartAt": "Parallel",
"States": {
"Parallel": {
"Type": "Parallel",
"ResultPath": "$.CombinedOutput",
"Next": "MergeOutputs",
"Branches": [{
"StartAt": "populateA",
"States": {
"populateA": {
"Type": "Pass",
"Result": {
"a": "A",
"b": null
},
"End": true
}
}
},
{
"StartAt": "populateB",
"States": {
"populateB": {
"Type": "Pass",
"Result": {
"a": null,
"b": "B"
},
"End": true
}
}
}
]
},
"MergeOutputs": {
"Type": "Pass",
"Parameters": {
"a.$": "$.CombinedOutput[0].a",
"b.$": "$.CombinedOutput[1].b"
},
"Next": "EndState"
},
"EndState": {
"Type": "Pass",
"End": true
}
}
}

I'm afraid that it is not possible (currently) to do that without additional state. You can add additional Task after Parallel state which will merge array elements.
The code of the additional task would look like this (Python):
def merge_dicts(a, b):
result = a.copy()
for key, value in b.items():
if value is not None:
result[key] = value
return result
def handler(event, context):
result = {}
for item in event:
result = merge_dicts(result, item)
return result
handler takes output from the Parallel states and merges it into single dictionary. The state should be directly after Parallel.
Keep in mind that this code handles only flat-dict (not nested ones) so it will work in simple cases but not in complex ones (I've added it here only to show the general idea).

Related

Rendering Values in a Step Function: extract YYYY-MM from 2022-10-20

I am trying to get a date prefix from a date in an AWS Step Function. When I try to call a variable from a previous step, that variable doesn't render. Here is the step function code:
{
"Comment": "Set and use a new variable in Step Functions",
"StartAt": "Set Date Prefix",
"States": {
"Set Date Prefix": {
"Type": "Pass",
"Result": {
"date_prefix": "${$$.Execution.Input.date.substr(0,7)}"
},
"ResultPath": "$.date_prefix",
"Next": "Use Date Prefix"
},
"Use Date Prefix": {
"Type": "Pass",
"Result": {
"date_prefix_used": "$.date_prefix"
},
"End": true
}
}
}
When I pass the following input:
{
"date": "2022-10-20"
}
I get the following as output:
{
"date_prefix_used": "$.date_prefix"
}
when I should have gotten:
{
"date_prefix": "2022-10",
"date_prefix_used": "2022-10"
}
What am I doing wrong?
You can intinsically extract the YYYY-MM from a date string input like 2022-10-20 with two Pass states. A Pass state can apply substitutions and State Machine intrinsic functions, but in the Parameters field, not the Result field. Remember the .$ suffix on keys whose values have substitutions. Otherwise, the values will be treated as literals.
{
"Comment": "Set and use a new variable in Step Functions",
"StartAt": "SplitDate",
"States": {
"SplitDate": {
"Type": "Pass",
"Parameters": {
"date_components.$": "States.StringSplit($.date, '-')"
},
"Next": "GetPrefix"
},
"GetPrefix": {
"Type": "Pass",
"InputPath": "$.date_components",
"Parameters": {
"date_prefix.$": "States.Format('{}-{}',States.ArrayGetItem($, 0), States.ArrayGetItem($, 1))"
},
"End": true
}
}
}
The Split Date task splits the date with States.StringSplit.
{
"date_components": [ "2022", "10", "20" ]
}
The GetPrefix task reassembles the the year and month with States.Format and States.ArrayGetItem:
{
"date_prefix": "2022-10"
}

Passing Input Parameters from one Step Function to another

I do have an Step Function A - which executes a lambda and pull some results.
Same Function has a Map Stage which is iterating over results and should call another Step Function from Map State.
While calling another Step Function B from the map state i am not able to pass the parameter or that one record as Input to Step Function B.
Please suggest how can i use Input for second step function.
Below is the example i am using , orderServiceResponse has a List of orders which I need to iterate and pass that one order to next step function.
"Validate-All" : {
"Type" : "Map",
"InputPath" : "$.orderServiceResponse",
"ItemsPath" : "$.orders",
"MaxConcurrency" : 5,
"ResultPath" : "$.orders",
"Iterator" : {
"StartAt" : "Validate" ,
"States" :{
"Validate" : {
"Type" : "Task"
"Resource" : "arn:aws:states:::states:startExecution.sync:2",
"Parameters" {
"Input" : "$.orders",
"StateMachineArn" : "{arn of Step Function B }
},
"End" : true
}
}
TL;DR Use Parameters with Map Context to add the full input object to each Map element iteration.
You have an array of data you want to process elementwise in a Map State. By default, Map only passes
the array element's data to the map iterator. But we can add additional context to each iteration.
Here is an example - the important bits are commented:
{
"StartAt": "MapState",
"States": {
"MapState": {
"Type": "Map",
"ResultPath": "$.MapResult",
"Next": "Success",
// the map's elements of each get the following:
"Parameters": {
"Index.$": "$$.Map.Item.Index", // the array element's data (we only get this by default)
"Order.$": "$$.Map.Item.Value", // the array element's index 0,1,2...
"FullInput.$": "$" // a copy of the the full input object <-- this is what you were looking for
},
"Catch": [{ "ErrorEquals": ["States.ALL"], "Next": "Fail" }],
// substitute your iterator:
"Iterator": {
"StartAt": "MockTask",
"States": {
"MockTask": {
"End": true,
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:xxxxxxxxxxxx",
"Parameters": {
"expression": "`Order ${$.Order.OrderID} was ordered by ${$.FullInput.CustomerName}`",
"expressionAttributeValues": {
"$.Order.OrderID.$": "$.Order.OrderID",
"$.FullInput.CustomerName.$": "$.FullInput.CustomerName"
}
}
}
}
},
"ItemsPath": "$.Orders"
},
"Success": { "Type": "Succeed" },
"Fail": { "Type": "Fail" }
}
}
Execution Input, 3 Orders:
{
"CustomerID": 1,
"CustomerName": "Morgan Freeman",
"OtherInfo": { "Foo": "Bar" },
"Orders": [{ "OrderID": "A", "Status": "Fulfilled" }, { "OrderID": "B", "Status": "Pending" }, { "OrderID": "C", "Status": "Cancelled" }]
}
Map Iteration 0 Input:
{
"Order": { "OrderID": "A", "Status": "Fulfilled" },
"Index": 0,
"FullInput": { "CustomerID": 1, "CustomerName": "Morgan Freeman", "OtherInfo": { "Foo": "Bar" }, "Orders": [{...
Execution Output MapResult key
{
"MapResult": [
"Order A was ordered by Morgan Freeman",
"Order B was ordered by Morgan Freeman",
"Order C was ordered by Morgan Freeman"
]
...
}

AWS StepFunctions - Merge and flatten the task output combined with the original input

How do we use Parameters, ResultPath and ResultSelector to combine the results of a Task with the original input in the same JSON level?
I checked the documentation on AWS, but it seems that ResultSelector always create a new dictionary which puts it in 1-level below on the result.
Example input
{
"status": "PENDING",
"uuid": "00000000-0000-0000-0000-000000000000",
"first_name": "John",
"last_name": "Doe",
"email": "john.doe#email.com",
"orders": [
{
"item_uuid": "11111111-1111-1111-1111-111111111111",
"quantities": 2,
"price": 2.38,
"created_at": 16049331038000
}
]
}
State Machine definition
"Review": {
"Type": "Task",
"Resource": "arn:aws:states:us-east-1:123456789012:activity:Review",
"ResultPath": null,
"Next": "Processing",
"Parameters": {
"task_name": "REVIEW_REQUIRED",
"uuid.$": "$.uuid"
}
},
Example output from Review Activity
{
"review_status": "APPROVED"
}
Question
How do I update the State Machine definition to combined the result of Review Activity and the original input to something as below?
{
"status": "PENDING",
"uuid": "00000000-0000-0000-0000-000000000000",
"first_name": "John",
"last_name": "Doe",
"email": "john.doe#email.com",
"orders": [
{
"item_uuid": "11111111-1111-1111-1111-111111111111",
"quantities": 2,
"price": 2.38,
"created_at": 16049331038000
}
],
"review_status": "APPROVED"
}
NOTE
I don't have access to the Activity code, just the definition file.
I recommend NOT doing the way suggested above as you will drop all data that you do not include. It's not a long term approach, you can more easily do it like this:
Step Input
{
"a": "a_value",
"b": "b_value",
"c": {
"c": "c_value"
}
}
In your state-machine.json
"Flatten And Keep All Other Keys": {
"Type": "Pass",
"InputPath": "$.c.c",
"ResultPath": "$.c",
"Next": "Some Other State"
}
Step Output
{
"a": "a_value",
"b": "b_value",
"c": "c_value"
}
While Step Function does not allow you to do so, you can create a Pass state that flattens the input as a workaround.
Example Input:
{
"name": "John Doe",
"lambdaResult": {
"age": "35",
"location": "Eastern Europe"
}
}
Amazon State Language:
"Flatten": {
"State": "Pass",
"Parameters": {
"name.$" : "$.name",
"age.$" : "$.lambdaResult.age",
"location.$": "$.lambdaResult.location"
},
"Next": "MyNextState"
}
Output:
{
"name": "John Doe",
"age": "35",
"location": "Eastern Europe"
}
It's tedious, but it gets the job done.
Thanks for your question.
It looks like you don't necessarily need to manipulate the output in any way, and are looking for a way to combine the state's output with its input before passing it on to the next state. The ResultPath field allows you to combine a task result with task input, or to select one of these. The path you provide to ResultPath controls what information passes to the output.

AWS Step Function: check for null

Step Function is defined like that:
{
"StartAt": "Decision_Maker",
"States": {
"Decision_Maker":{
"Type": "Choice",
"Choices": [
{
"Variable": "$.body.MyData",
"StringEquals": "null", //that doesn't work :(
"Next": "Run_Task1"
}],
"Default": "Run_Task2"
},
"Run_Task1": {
"Type": "Task",
"Resource": "url_1",
"Next": "Run_Task2"
},
"Run_Task2": {
"Type": "Task",
"Resource": "url_2",
"End": true
}
}
}
Basically it's a choice between 2 tasks.
Input data is like this:
{
"body": {
"prop1": "value1",
"myData": {
"otherProp": "value"
}
}
}
The problem is that sometimes there's no myData in JSON. So input may come like this:
{
"body": {
"prop1": "value1",
"myData": null
}
}
How do I check whether or not myData is null?
As of August 2020, Amazon States Language now has an isNull and isPresent Choice Rule. Using these you can natively check for null or the existence of a key in the state input inside a Choice state.
Example:
{ "Variable": "$.possiblyNullValue", "IsNull": true }
https://docs.aws.amazon.com/step-functions/latest/dg/amazon-states-language-choice-state.html#amazon-states-language-choice-state-rules
The order matters. Set the "IsPresent": false first, then the "IsNull": true, finally the scalar comparison last.
"Check MyValue": {
"Comment": "Check MyValue",
"Type": "Choice",
"Default": "ContinueWithMyValue",
"Choices": [
{
"Or": [
{
"Variable": "$.MyValue",
"IsPresent": false
},
{
"Variable": "$.MyValue",
"IsNull": true
},
{
"Variable": "$.MyValue",
"BooleanEquals": false
}
],
"Next": "HaltProcessing"
},
{
"Variable": "$.MyValue",
"BooleanEquals": true,
"Next": "ContinueWithMyValue"
}
]
},
As per my experience, Choice Type can't deal nulls. Best way could be pre-processing your input using a lambda in the very first state and return the event formatting it as "null". Below code snippet might help.
def lambda_handler(event, context):
if event['body']['MyData']:
return event
else:
event['body']['MyData']="null"
return event
Note: This also handles empty string.

Can AWS Step Function describe this kind of dataflow?

It can not be described with Parallel State in AWS Step Function.
B and C should be in parallel.
C sends messages to both D and E.
D and E should be in parallel.
{
"StartAt": "A",
"States": {
"A": {
"Type": "Pass",
"Next": "Parallel State 1"
},
"Parallel State 1": {
"Type": "Parallel",
"Branches": [{
"StartAt": "B",
"States": {
"B": {
"Type": "Pass",
"End": true
}
}
},
{
"StartAt": "C",
"States": {
"C": {
"Type": "Pass",
"End": true
}
}
}
],
"Next": "Parallel State 2"
},
"Parallel State 2": {
"Type": "Parallel",
"Branches": [{
"StartAt": "D",
"States": {
"D": {
"Type": "Pass",
"End": true
}
}
},
{
"StartAt": "E",
"States": {
"E": {
"Type": "Pass",
"End": true
}
}
}
],
"Next": "F"
},
"F": {
"Type": "Pass",
"End": true
}
}
}
Answer is No , inside step function no state can set multiple states (invokes both successors)to its Next task. As per AWS step function cannot start State Machine as StartAt by providing multiple State names.
You can tweak your logic and use The Parallel state and achive same ,If you share your usecase may be help to solve problems.
How to specify multiple result path values in AWS Step Functions
A Parallel state provides each branch with a copy of its own input
data (subject to modification by the InputPath field). It generates
output that is an array with one element for each branch, containing
the output from that branch.
https://aws.amazon.com/blogs/aws/new-step-functions-support-for-dynamic-parallelism/
Example of state function
{
"Comment": "An example of the Amazon States Language using a choice state.",
"StartAt": "FirstState",
"States": {
"FirstState": {
"Type": "Task",
"Resource": "arn:aws:lambda:REGION:ACCOUNT_ID:function:FUNCTION_NAME",
"Next": "ChoiceState"
},
"ChoiceState": {
"Type" : "Choice",
"Choices": [
{
"Variable": "$.foo",
"NumericEquals": 1,
"Next": "FirstMatchState"
},
{
"Variable": "$.foo",
"NumericEquals": 2,
"Next": "SecondMatchState"
}
],
"Default": "DefaultState"
},
"FirstMatchState": {
"Type" : "Task",
"Resource": "arn:aws:lambda:REGION:ACCOUNT_ID:function:OnFirstMatch",
"Next": "NextState"
},
"SecondMatchState": {
"Type" : "Task",
"Resource": "arn:aws:lambda:REGION:ACCOUNT_ID:function:OnSecondMatch",
"Next": "NextState"
},
"DefaultState": {
"Type": "Fail",
"Error": "DefaultStateError",
"Cause": "No Matches!"
},
"NextState": {
"Type": "Task",
"Resource": "arn:aws:lambda:REGION:ACCOUNT_ID:function:FUNCTION_NAME",
"End": true
}
}
}
https://docs.aws.amazon.com/step-functions/latest/dg/connect-to-resource.html#connect-wait-example
https://sachabarbs.wordpress.com/2018/10/30/aws-step-functions/
As I answered in How to simplify complex parallel branch interdependencies for Step Functions, what you asked is better to be modeled as DAG but not state machine.
Depends on your use case, you might be able to workaround it (just as #horatiu-jeflea 's answer), but it's a workaround (not the straightforward way) anyway.