Is it possible to iterate through a DynamoDB table within a step function's map state? - amazon-web-services

Just what the title says, basically. I have read through the documentation:
https://docs.aws.amazon.com/step-functions/latest/dg/connect-ddb.html
This describes how to get a single item of information out of a DynamoDB table from a step function. What I would like to do is iterate through the entire table and start execution of another state machine for each item. Each new state machine would have an individual item as input. I have attempted the following code, which unfortunately is not functional:
{
"StartAt": "OuterFunction",
"States": {
"OuterFunction": {
"Type": "Map",
"Iterator": {
"StartAt": "InnerFunction",
"States": {
"InnerFunction": {
"Type": "Task",
"Resource": "arn:aws:states:::dynamodb:getItem.sync",
"Parameters": {
"StateMachineArn":"other-state-machine-arn",
"TableName": "TestTable"
},
"End": true
}
}
},
"End": true
}
}
}
Is it actually possible to iterate through a DynamoDB table in this way?

You are now able to call DynamoDB directly from step functions. This includes the query and scan operations. With the result, you can then iterate through the items. The one less convenient, caveat is that it does not use the document client, so the results are in the dynamodb json format.
https://docs.aws.amazon.com/step-functions/latest/dg/connect-ddb.html

No, getItem is designed to fetch particular DynamoDB document. You need to write custom Lambda that will .query() or .scan() your table and then use Map step to iterate over results (most likely you won't need getItem at that time, because you can load all data with the query/scan operation).

Related

Step Functions - Access State from previous Map Iteration

How can I get the results from previous Map Iterations in the next iteration when using MaxConcurrency: 1 in Amazon Step Functions?
Here's an example of the code I have
{
"StartAt": "UploadUsers",
"States": {
"UploadUsers": {
"Type": "Map",
"MaxConcurrency": 1,
"ItemsPath": "$.data.users",
"Parameters": {
"data.$": "$$.Map.Item.Value.data",
"friends.$": "$.?????? Get created users ids"
},
"Iterator": {
"StartAt": "UploadUser",
"States": {
"UploadUser": {
"End": true,
"Parameters": {
"FunctionName": "${FnUploadUser}",
"Payload": {
"data.$": "$.user_data",
"friends.$": "$.??????"
}
},
"Resource": "arn:aws:states:::lambda:invoke.waitForTaskToken",
"ResultPath": "$.data. ???",
"Type": "Task"
}
}
},
"End": true,
"ResultPath": "$.data.UploadUsers",
"ResultSelector": {
"result.$": "$"
}
}
}
}
Suppose FnUploadUser is a lambda that returns the id of the created user.
And I want to get the ids of the previously created users and use that value for the next user I'm about to create.
You can't. Map State iterations don't share state. Two workarounds:
(1) Manage the shared state externally: Each Map iteration writes and reads from, say, a DynamoDB table.
(2) Refactor to a "for" loop and keep the shared state in the execution output.
Instead of using Map, insert a Choice State (after UploadUser) that checks for a "done" condition. If "done", finish, else loop back to UploadUser.
UploadUser accepts the user_data array as input. It appends its output to, say, the uploaded output array.
Each UploadUser iteration identifies the next user_data item by comparing it to the uploaded array. The iteration that processes the last item can also output done: true to signal to Choice that work is done.
The Choice State loops back to UploadUser while there are more to process (i.e. while done is not present).
There are other ways to build steps 2-3. For instance, you could add next_item and total_items keys on the output to keep track of progress. The important point is that Choice loops until an exit condition is met.

Why do I get 'Item is required' when updating an item in DynamoDB using AWS Step Functions?

I'm trying to push a DynamoDB record through Step Functions while setting up a conditional expression but for some reason, I'm getting the error:
There are Amazon States Language errors in your state machine definition. Fix the errors to continue.
The field 'Item' is required but was missing (at /States/MyStep/Parameters)
I don't want to push an Item. I want to use an update expression.
Here's my code:
{
"StartAt": "MyStep",
"States": {
"MyStep": {
"Type": "Task",
"Resource": "arn:aws:states:::dynamodb:putItem",
"Parameters": {
"TableName.$": "$.table_name",
"Key": {
"test_id_path_method_executor_id": {"S.$": "$.update_key.test_id_path_method_executor_id"},
"result_timestamp": {"S.$": "$.update_key.result_timestamp"}
},
"ConditionExpression": "#max_db < :max_values",
"ExpressionAttributeValues": {
":max_values": {"N.$": "$.result_value"}
},
"ExpressionAttributeNames": {
"#max_db": "max"
},
"UpdateExpression": "SET #max_db = :max_values"
},
"Next": "EndSuccess"
},
"EndSuccess": {
"Type":"Succeed"
}
}
}
What's the issue?
There are 2 main DynamoDB APIs for modifying items:
PutItem
UpdateItem
In a nutshell, PutItem 'updates' the item by replacing it & because of this, it requires the replacement item to be passed. This is the API call you're using and is why you're getting The field 'Item' is required but was missing. It's correct, Item is required when using PutItem.
Instead, you need to use UpdateItem which does not require the new item in full & will modify the attributes of an item based on an update expression (which is what you have).
In your step function definition, replace:
"Resource": "arn:aws:states:::dynamodb:putItem",
With:
"Resource": "arn:aws:states:::dynamodb:updateItem",

AWS Step Function get all Range keys with a single primary key in DynamoDB

I'm building AWS Step Function state machines. My goal is to read all Items from a DynamoDB table with a specific value for the Hash key (username) and any (wildcard) Sort keys (order_id).
Basically something that would be done from SQL with:
SELECT username,order_id FROM UserOrdersTable
WHERE username = 'daffyduck'
I'm using Step Functions to get configurations for AWS Glue jobs from a DynamoDB table and then spawn a Glue job via step function's Map for each dynamodb item (with parameters read from the database).
"Read Items from DynamoDB": {
"Type": "Task",
"Resource": "arn:aws:states:::dynamodb:getItem",
"Parameters": {
"TableName": "UserOrdersTable",
"Key": {
"username": {
"S": "daffyduck"
},
"order_id": {
"S": "*"
}
}
},
"ResultPath": "$",
"Next": "Invoke Glue jobs"
}
But I can't bring the state machine to read all order_id's for the user daffyduck in the step function task above. No output is displayed using the code above, apart from http stats.
Is there a wildcard for order_id ? Is there another way of getting all order_ids? The query customization seems to be rather limited inside step functions:
https://docs.amazonaws.cn/amazondynamodb/latest/APIReference/API_GetItem.html#API_GetItem_RequestSyntax
Basically I'm trying to accomplish what can be done from the command line like so:
$ aws dynamodb query \
--table-name UserOrdersTable \
--key-condition-expression "Username = :username" \
--expression-attribute-values '{
":username": { "S": "daffyduck" }
}'
Any ideas? Thanks
I don't think that is possible with Step functions Dynamodb Service yet.
currently supports get, put, delete & update Item, not query or scan.
For GetItem we need to pass entire KEY (Partition + Range Key)
For the primary key, you must provide all of the attributes. For
example, with a simple primary key, you only need to provide a value
for the partition key. For a composite primary key, you must provide
values for both the partition key and the sort key.
We need to write a Lambda function to query Dynamo and return a map and invoke the lambda function from step.

Is there a way to interpolate OutputPath's JsonPath using state's input in AWS step function?

Basically, i have the following input:
{
"name": "abc",
"choice": "choice1"
}
My dynamoDB table has the following structure:
Partition key - "name"
Complex json with choices:
{
"choices":
{
"choice1": ......,
"choice2": ......
}
}
I want to directly read from dynamodb, and get a subitem under the relevant choice:
{
"StartAt": "Read Next Message from DynamoDB",
"States": {
"Read Next Message from DynamoDB": {
"Type": "Task",
"Resource": "arn:aws:states:::dynamodb:getItem",
"Parameters": {
"TableName": "my_table",
"Key": {
"customerName": {"S.$": "$.name"}
}
},
"OutputPath": "$.Item.choices.M.choice1.M.myvalue.S",
"Next": "World"
},
"World": {
"Type": "Pass",
"End": true
}
}
}
basically i want to do something like "$.Item.choices.M.{$.choice}.M.myvalue.S", and take one of the output's keys from the input. is this possible?
I think what you're looking for is JsonPath interpolation, but that is not supported as per this thread on AWS forums.
As far as I know Step Functions allow only path reference through $, . and [] operators (Reference Path).
I don't know how much control you have on the DynamoDB table's data but I think your problem can be solved easily if your choice types are modeled in following way
{
"choices": [{
"choiceType": "choice1",
........
},
{
"choiceType": "choice2",
........
}]
}
Now you can use the map state to iterate over the choices array. Note that don't forget to pass the expected choiceType to each iteration.
First state of the map iterator can be a choice state which compares choiceType and moves to appropriate next state. So, basically your rest of the workflow is modeled as iterator of the map state in step 1.
Now, if you don't have the control over DynamoDB table, then you can process the query result in an AWS Lambda.

Pass multiple inputs into Map State in AWS Step Function

I am trying to use AWS Step Functions to trigger operations many S3 files via Lambda. To do this I am invoking a step function with an input that has a base S3 key of the file and part numbers each file (each parallel iteration would operate on a different S3 file). The input looks something like
{
"job-spec": {
"base_file_name": "some_s3_key-",
"part_array": [
"part-0000.tsv",
"part-0001.tsv",
"part-0002.tsv", ...
]
}
}
My Step function is very simple, takes that input and maps it out, however I can't seem to get both the file and the array as input to my lambda. Here is my step function definition
{
"Comment": "An example of the Amazon States Language using a map state to process elements of an array with a max concurrency of 2.",
"StartAt": "Map",
"States": {
"Map": {
"Type": "Map",
"ItemsPath": "$.job-spec",
"ResultPath": "$.part_array",
"MaxConcurrency": 2,
"Next": "Final State",
"Iterator": {
"StartAt": "My Stage",
"States": {
"My Stage": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "arn:aws:lambda:us-east-1:<>:function:some-lambda:$LATEST",
"Payload": {
"Input.$": "$.part_array"
}
},
"End": true
}
}
}
},
"Final State": {
"Type": "Pass",
"End": true
}
}
}
As written above it complains that that job-spec is not an array for the ItemsPath. If I change that to $.job-spec.array I get the array I'm looking for in my lambda but the base key is missing.
Essentially I want each python lambda to get the base file key and one entry from the array to stitch together the complete file name. I can't just put the complete file names in the array due to the limit limit of how much data I can pass around in Step Functions and that also seems like a waste of data
It looks like the Parameters value can be used for this but I can't quite get the syntax right
Was able to finally get the syntax right.
"ItemsPath": "$.job-spec.part_array",
"Parameters": {
"part_name.$": "$$.Map.Item.Value",
"base_file_name.$": "$.job-spec.base_file_name"
},
It seems that Parameters can be used to create custom inputs for each stage. The $$ is accessing the context of the stage and not the actual input. It appears that ItemsPath takes the array and puts it into a context which can be used later.
UPDATE Here is some AWS Documentation showing this being used from the comments below