Is there a way to interpolate OutputPath's JsonPath using state's input in AWS step function? - amazon-web-services

Basically, i have the following input:
{
"name": "abc",
"choice": "choice1"
}
My dynamoDB table has the following structure:
Partition key - "name"
Complex json with choices:
{
"choices":
{
"choice1": ......,
"choice2": ......
}
}
I want to directly read from dynamodb, and get a subitem under the relevant choice:
{
"StartAt": "Read Next Message from DynamoDB",
"States": {
"Read Next Message from DynamoDB": {
"Type": "Task",
"Resource": "arn:aws:states:::dynamodb:getItem",
"Parameters": {
"TableName": "my_table",
"Key": {
"customerName": {"S.$": "$.name"}
}
},
"OutputPath": "$.Item.choices.M.choice1.M.myvalue.S",
"Next": "World"
},
"World": {
"Type": "Pass",
"End": true
}
}
}
basically i want to do something like "$.Item.choices.M.{$.choice}.M.myvalue.S", and take one of the output's keys from the input. is this possible?

I think what you're looking for is JsonPath interpolation, but that is not supported as per this thread on AWS forums.
As far as I know Step Functions allow only path reference through $, . and [] operators (Reference Path).
I don't know how much control you have on the DynamoDB table's data but I think your problem can be solved easily if your choice types are modeled in following way
{
"choices": [{
"choiceType": "choice1",
........
},
{
"choiceType": "choice2",
........
}]
}
Now you can use the map state to iterate over the choices array. Note that don't forget to pass the expected choiceType to each iteration.
First state of the map iterator can be a choice state which compares choiceType and moves to appropriate next state. So, basically your rest of the workflow is modeled as iterator of the map state in step 1.
Now, if you don't have the control over DynamoDB table, then you can process the query result in an AWS Lambda.

Related

Step Functions - Access State from previous Map Iteration

How can I get the results from previous Map Iterations in the next iteration when using MaxConcurrency: 1 in Amazon Step Functions?
Here's an example of the code I have
{
"StartAt": "UploadUsers",
"States": {
"UploadUsers": {
"Type": "Map",
"MaxConcurrency": 1,
"ItemsPath": "$.data.users",
"Parameters": {
"data.$": "$$.Map.Item.Value.data",
"friends.$": "$.?????? Get created users ids"
},
"Iterator": {
"StartAt": "UploadUser",
"States": {
"UploadUser": {
"End": true,
"Parameters": {
"FunctionName": "${FnUploadUser}",
"Payload": {
"data.$": "$.user_data",
"friends.$": "$.??????"
}
},
"Resource": "arn:aws:states:::lambda:invoke.waitForTaskToken",
"ResultPath": "$.data. ???",
"Type": "Task"
}
}
},
"End": true,
"ResultPath": "$.data.UploadUsers",
"ResultSelector": {
"result.$": "$"
}
}
}
}
Suppose FnUploadUser is a lambda that returns the id of the created user.
And I want to get the ids of the previously created users and use that value for the next user I'm about to create.
You can't. Map State iterations don't share state. Two workarounds:
(1) Manage the shared state externally: Each Map iteration writes and reads from, say, a DynamoDB table.
(2) Refactor to a "for" loop and keep the shared state in the execution output.
Instead of using Map, insert a Choice State (after UploadUser) that checks for a "done" condition. If "done", finish, else loop back to UploadUser.
UploadUser accepts the user_data array as input. It appends its output to, say, the uploaded output array.
Each UploadUser iteration identifies the next user_data item by comparing it to the uploaded array. The iteration that processes the last item can also output done: true to signal to Choice that work is done.
The Choice State loops back to UploadUser while there are more to process (i.e. while done is not present).
There are other ways to build steps 2-3. For instance, you could add next_item and total_items keys on the output to keep track of progress. The important point is that Choice loops until an exit condition is met.

Why do I get 'Item is required' when updating an item in DynamoDB using AWS Step Functions?

I'm trying to push a DynamoDB record through Step Functions while setting up a conditional expression but for some reason, I'm getting the error:
There are Amazon States Language errors in your state machine definition. Fix the errors to continue.
The field 'Item' is required but was missing (at /States/MyStep/Parameters)
I don't want to push an Item. I want to use an update expression.
Here's my code:
{
"StartAt": "MyStep",
"States": {
"MyStep": {
"Type": "Task",
"Resource": "arn:aws:states:::dynamodb:putItem",
"Parameters": {
"TableName.$": "$.table_name",
"Key": {
"test_id_path_method_executor_id": {"S.$": "$.update_key.test_id_path_method_executor_id"},
"result_timestamp": {"S.$": "$.update_key.result_timestamp"}
},
"ConditionExpression": "#max_db < :max_values",
"ExpressionAttributeValues": {
":max_values": {"N.$": "$.result_value"}
},
"ExpressionAttributeNames": {
"#max_db": "max"
},
"UpdateExpression": "SET #max_db = :max_values"
},
"Next": "EndSuccess"
},
"EndSuccess": {
"Type":"Succeed"
}
}
}
What's the issue?
There are 2 main DynamoDB APIs for modifying items:
PutItem
UpdateItem
In a nutshell, PutItem 'updates' the item by replacing it & because of this, it requires the replacement item to be passed. This is the API call you're using and is why you're getting The field 'Item' is required but was missing. It's correct, Item is required when using PutItem.
Instead, you need to use UpdateItem which does not require the new item in full & will modify the attributes of an item based on an update expression (which is what you have).
In your step function definition, replace:
"Resource": "arn:aws:states:::dynamodb:putItem",
With:
"Resource": "arn:aws:states:::dynamodb:updateItem",

Is it possible to iterate through a DynamoDB table within a step function's map state?

Just what the title says, basically. I have read through the documentation:
https://docs.aws.amazon.com/step-functions/latest/dg/connect-ddb.html
This describes how to get a single item of information out of a DynamoDB table from a step function. What I would like to do is iterate through the entire table and start execution of another state machine for each item. Each new state machine would have an individual item as input. I have attempted the following code, which unfortunately is not functional:
{
"StartAt": "OuterFunction",
"States": {
"OuterFunction": {
"Type": "Map",
"Iterator": {
"StartAt": "InnerFunction",
"States": {
"InnerFunction": {
"Type": "Task",
"Resource": "arn:aws:states:::dynamodb:getItem.sync",
"Parameters": {
"StateMachineArn":"other-state-machine-arn",
"TableName": "TestTable"
},
"End": true
}
}
},
"End": true
}
}
}
Is it actually possible to iterate through a DynamoDB table in this way?
You are now able to call DynamoDB directly from step functions. This includes the query and scan operations. With the result, you can then iterate through the items. The one less convenient, caveat is that it does not use the document client, so the results are in the dynamodb json format.
https://docs.aws.amazon.com/step-functions/latest/dg/connect-ddb.html
No, getItem is designed to fetch particular DynamoDB document. You need to write custom Lambda that will .query() or .scan() your table and then use Map step to iterate over results (most likely you won't need getItem at that time, because you can load all data with the query/scan operation).

Pass multiple inputs into Map State in AWS Step Function

I am trying to use AWS Step Functions to trigger operations many S3 files via Lambda. To do this I am invoking a step function with an input that has a base S3 key of the file and part numbers each file (each parallel iteration would operate on a different S3 file). The input looks something like
{
"job-spec": {
"base_file_name": "some_s3_key-",
"part_array": [
"part-0000.tsv",
"part-0001.tsv",
"part-0002.tsv", ...
]
}
}
My Step function is very simple, takes that input and maps it out, however I can't seem to get both the file and the array as input to my lambda. Here is my step function definition
{
"Comment": "An example of the Amazon States Language using a map state to process elements of an array with a max concurrency of 2.",
"StartAt": "Map",
"States": {
"Map": {
"Type": "Map",
"ItemsPath": "$.job-spec",
"ResultPath": "$.part_array",
"MaxConcurrency": 2,
"Next": "Final State",
"Iterator": {
"StartAt": "My Stage",
"States": {
"My Stage": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "arn:aws:lambda:us-east-1:<>:function:some-lambda:$LATEST",
"Payload": {
"Input.$": "$.part_array"
}
},
"End": true
}
}
}
},
"Final State": {
"Type": "Pass",
"End": true
}
}
}
As written above it complains that that job-spec is not an array for the ItemsPath. If I change that to $.job-spec.array I get the array I'm looking for in my lambda but the base key is missing.
Essentially I want each python lambda to get the base file key and one entry from the array to stitch together the complete file name. I can't just put the complete file names in the array due to the limit limit of how much data I can pass around in Step Functions and that also seems like a waste of data
It looks like the Parameters value can be used for this but I can't quite get the syntax right
Was able to finally get the syntax right.
"ItemsPath": "$.job-spec.part_array",
"Parameters": {
"part_name.$": "$$.Map.Item.Value",
"base_file_name.$": "$.job-spec.base_file_name"
},
It seems that Parameters can be used to create custom inputs for each stage. The $$ is accessing the context of the stage and not the actual input. It appears that ItemsPath takes the array and puts it into a context which can be used later.
UPDATE Here is some AWS Documentation showing this being used from the comments below

Passthrough input to output in AWS Step Functions

How can I passthrough the input to a Task state in an AWS Step Functions to the output?
After reading the Input and Output Processing page in the AWS docs, I have played with various combinations of InputPath, ResultPath and OutputPath.
State definition:
"First State": {
"Type": "Task",
"Resource": "[My Lambda ARN]",
"Next": "Second State",
"InputPath": "$.someKey",
"OutputPath": "$"
}
Input:
{
"someKey": "someValue"
}
Expected Result
I would like the output of the First State (and thus the input of Second State) to be
{
"someKey": "someValue"
}
Actual Result
[empty]
What if the input is more complicated, e.g.
{
"firstKey": "firstValue",
"secondKey": "secondValue"
}
I would like to forward all of it without worrying about (sub) paths.
In the Amazon States Language spec it is stated that:
If the value of ResultPath is null, that means that the state’s own raw output is discarded and its raw input becomes its result.
Consequently, I updated my state definition to
"First State": {
"Type": "Task",
"Resource": "[My Lambda ARN]",
"Next": "Second State",
"ResultPath": null
}
As a result, when passing the input example Task input payload will be copied to the output, even for rich objects like:
{
"firstKey": "firstValue",
"secondKey": "secondValue"
}
For those who find themselves here using CDK, the solution is to use the explicit aws_stepfunctions.JsonPath.DISCARD enum rather than None/null.
from aws_cdk import (
aws_stepfunctions,
aws_stepfunctions_tasks,
)
aws_stepfunctions_tasks.LambdaInvoke(
self,
"my_function",
lambda_function=lambda_function,
result_path=aws_stepfunctions.JsonPath.DISCARD,
)
https://docs.aws.amazon.com/cdk/api/latest/docs/#aws-cdk_aws-stepfunctions.JsonPath.html#static-discard
I was looking for a solution from passing input from one parallel state to another parallel state and the above option worked really good.
For example my step function is like this...tas1->parallel task2 -> parallel trask3 -> task4. So when it start with parallel task3, the input values are wiped out, so ptask3 is failing. With the above option, i was able to pass in same input from ptask2 to ptas3.