Why do I get 'Item is required' when updating an item in DynamoDB using AWS Step Functions? - amazon-web-services

I'm trying to push a DynamoDB record through Step Functions while setting up a conditional expression but for some reason, I'm getting the error:
There are Amazon States Language errors in your state machine definition. Fix the errors to continue.
The field 'Item' is required but was missing (at /States/MyStep/Parameters)
I don't want to push an Item. I want to use an update expression.
Here's my code:
{
"StartAt": "MyStep",
"States": {
"MyStep": {
"Type": "Task",
"Resource": "arn:aws:states:::dynamodb:putItem",
"Parameters": {
"TableName.$": "$.table_name",
"Key": {
"test_id_path_method_executor_id": {"S.$": "$.update_key.test_id_path_method_executor_id"},
"result_timestamp": {"S.$": "$.update_key.result_timestamp"}
},
"ConditionExpression": "#max_db < :max_values",
"ExpressionAttributeValues": {
":max_values": {"N.$": "$.result_value"}
},
"ExpressionAttributeNames": {
"#max_db": "max"
},
"UpdateExpression": "SET #max_db = :max_values"
},
"Next": "EndSuccess"
},
"EndSuccess": {
"Type":"Succeed"
}
}
}
What's the issue?

There are 2 main DynamoDB APIs for modifying items:
PutItem
UpdateItem
In a nutshell, PutItem 'updates' the item by replacing it & because of this, it requires the replacement item to be passed. This is the API call you're using and is why you're getting The field 'Item' is required but was missing. It's correct, Item is required when using PutItem.
Instead, you need to use UpdateItem which does not require the new item in full & will modify the attributes of an item based on an update expression (which is what you have).
In your step function definition, replace:
"Resource": "arn:aws:states:::dynamodb:putItem",
With:
"Resource": "arn:aws:states:::dynamodb:updateItem",

Related

Is it possible to iterate through a DynamoDB table within a step function's map state?

Just what the title says, basically. I have read through the documentation:
https://docs.aws.amazon.com/step-functions/latest/dg/connect-ddb.html
This describes how to get a single item of information out of a DynamoDB table from a step function. What I would like to do is iterate through the entire table and start execution of another state machine for each item. Each new state machine would have an individual item as input. I have attempted the following code, which unfortunately is not functional:
{
"StartAt": "OuterFunction",
"States": {
"OuterFunction": {
"Type": "Map",
"Iterator": {
"StartAt": "InnerFunction",
"States": {
"InnerFunction": {
"Type": "Task",
"Resource": "arn:aws:states:::dynamodb:getItem.sync",
"Parameters": {
"StateMachineArn":"other-state-machine-arn",
"TableName": "TestTable"
},
"End": true
}
}
},
"End": true
}
}
}
Is it actually possible to iterate through a DynamoDB table in this way?
You are now able to call DynamoDB directly from step functions. This includes the query and scan operations. With the result, you can then iterate through the items. The one less convenient, caveat is that it does not use the document client, so the results are in the dynamodb json format.
https://docs.aws.amazon.com/step-functions/latest/dg/connect-ddb.html
No, getItem is designed to fetch particular DynamoDB document. You need to write custom Lambda that will .query() or .scan() your table and then use Map step to iterate over results (most likely you won't need getItem at that time, because you can load all data with the query/scan operation).

Is there a way to interpolate OutputPath's JsonPath using state's input in AWS step function?

Basically, i have the following input:
{
"name": "abc",
"choice": "choice1"
}
My dynamoDB table has the following structure:
Partition key - "name"
Complex json with choices:
{
"choices":
{
"choice1": ......,
"choice2": ......
}
}
I want to directly read from dynamodb, and get a subitem under the relevant choice:
{
"StartAt": "Read Next Message from DynamoDB",
"States": {
"Read Next Message from DynamoDB": {
"Type": "Task",
"Resource": "arn:aws:states:::dynamodb:getItem",
"Parameters": {
"TableName": "my_table",
"Key": {
"customerName": {"S.$": "$.name"}
}
},
"OutputPath": "$.Item.choices.M.choice1.M.myvalue.S",
"Next": "World"
},
"World": {
"Type": "Pass",
"End": true
}
}
}
basically i want to do something like "$.Item.choices.M.{$.choice}.M.myvalue.S", and take one of the output's keys from the input. is this possible?
I think what you're looking for is JsonPath interpolation, but that is not supported as per this thread on AWS forums.
As far as I know Step Functions allow only path reference through $, . and [] operators (Reference Path).
I don't know how much control you have on the DynamoDB table's data but I think your problem can be solved easily if your choice types are modeled in following way
{
"choices": [{
"choiceType": "choice1",
........
},
{
"choiceType": "choice2",
........
}]
}
Now you can use the map state to iterate over the choices array. Note that don't forget to pass the expected choiceType to each iteration.
First state of the map iterator can be a choice state which compares choiceType and moves to appropriate next state. So, basically your rest of the workflow is modeled as iterator of the map state in step 1.
Now, if you don't have the control over DynamoDB table, then you can process the query result in an AWS Lambda.

Pass multiple inputs into Map State in AWS Step Function

I am trying to use AWS Step Functions to trigger operations many S3 files via Lambda. To do this I am invoking a step function with an input that has a base S3 key of the file and part numbers each file (each parallel iteration would operate on a different S3 file). The input looks something like
{
"job-spec": {
"base_file_name": "some_s3_key-",
"part_array": [
"part-0000.tsv",
"part-0001.tsv",
"part-0002.tsv", ...
]
}
}
My Step function is very simple, takes that input and maps it out, however I can't seem to get both the file and the array as input to my lambda. Here is my step function definition
{
"Comment": "An example of the Amazon States Language using a map state to process elements of an array with a max concurrency of 2.",
"StartAt": "Map",
"States": {
"Map": {
"Type": "Map",
"ItemsPath": "$.job-spec",
"ResultPath": "$.part_array",
"MaxConcurrency": 2,
"Next": "Final State",
"Iterator": {
"StartAt": "My Stage",
"States": {
"My Stage": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "arn:aws:lambda:us-east-1:<>:function:some-lambda:$LATEST",
"Payload": {
"Input.$": "$.part_array"
}
},
"End": true
}
}
}
},
"Final State": {
"Type": "Pass",
"End": true
}
}
}
As written above it complains that that job-spec is not an array for the ItemsPath. If I change that to $.job-spec.array I get the array I'm looking for in my lambda but the base key is missing.
Essentially I want each python lambda to get the base file key and one entry from the array to stitch together the complete file name. I can't just put the complete file names in the array due to the limit limit of how much data I can pass around in Step Functions and that also seems like a waste of data
It looks like the Parameters value can be used for this but I can't quite get the syntax right
Was able to finally get the syntax right.
"ItemsPath": "$.job-spec.part_array",
"Parameters": {
"part_name.$": "$$.Map.Item.Value",
"base_file_name.$": "$.job-spec.base_file_name"
},
It seems that Parameters can be used to create custom inputs for each stage. The $$ is accessing the context of the stage and not the actual input. It appears that ItemsPath takes the array and puts it into a context which can be used later.
UPDATE Here is some AWS Documentation showing this being used from the comments below

How does the MaxConcurrency attribute work for the Map Task in AWS Step Functions?

Update: Creating a step function from the Map State step template and running that also throws an error. This is strong evidence that the MaxConcurrency attribute together with the Parameters value is not working.
I am not able to use the MaxConcurrency attribute successfully in the step function definition.
This can be demonstrated by using the example provided in the documentation for the Map Task (new as of 18 sept 2019):
{
"StartAt": "ExampleMapState",
"States": {
"ExampleMapState": {
"Type": "Map",
"MaxConcurrency": 2,
"Parameters": {
"ContextIndex.$": "$$.Map.Item.Index",
"ContextValue.$": "$$.Map.Item.Value"
},
"Iterator": {
"StartAt": "TestPass",
"States": {
"TestPass": {
"Type": "Pass",
"End": true
}
}
},
"End": true
}
}
}
By executing the step function with the following input:
[
{
"who": "bob"
},
{
"who": "meg"
},
{
"who": "joe"
}
]
We can observe in the Execution event history that we get:
ExecutionStarted
MapStateEntered
MapStateStarted
MapIterationStarted (index 0)
MapIterationStarted (index 1)
PassStateEntered (index 0)
PassStateExited (index 0)
MapIterationSucceeded (index 0)
ExecutionFailed
The step function fails.
The ExecutionFailed step has the following output (execution id omitted):
{
"error": "States.Runtime",
"cause": "Internal Error (omitted)"
}
Trying to catch the error with a Catch step has no effect.
What am I doing wrong here? Is this a bug?
Response to a private ticket submitted to AWS this morning;
Thank you for contacting AWS Premium Support. My name is Akanksha and
I will be assisting you with this case.
I understand that you have been working with the new Map state feature
of step functions and have noticed that when we use Parameters along
with MaxConcurrency set to lower value than the number of iterations
(with only first iteration successful) it fails with ‘States.Runtime’
and looks like a bug with the functionality.
Thank you for providing the details. It helped me during
troubleshooting. In order to confirm the behavior, I used the below
state machine example with Pass:
{
"StartAt": "Map State",
"TimeoutSeconds": 3600,
"States": {
"Map State": {
"Type": "Map”,
"Parameters": {
“ContextValue.$”: "$$.Map.Item.Value"
},
"MaxConcurrency": 1,
"Iterator": {
"StartAt": "Run Task",
"States": {
"Run Task": {
"Type": "Pass",
"End": true
}
}
},
"Next": "Final State"
},
"Final State": {
"Type": "Pass",
"End": true
}
} }
I tested with multiple input lists and MaxConcurrency values and below
are my observations:
Input size list: 4 MaxConcurrency:1/2/3 - Fails and MaxConcurrency:0/4/5 or above - Works
Input size list: 3 MaxConcurrency: 1/2 - Fails and MaxConcurrency:0/3/4 or above - Works
Similarly, I performed tests by removing the parameters from state machine as well and could see that it works as expected with different
MaxConcurrency values.
I also tested the same by changing the Task type of “Pass” with “Lambda” and observed the same behavior.
Hence, I can confirm that the state machine fails when we have
parameters in the code and specify MaxConcurrency value as anything
other than zero or the number greater than or equal to the list size.
After doing some research regarding this behavior to check if this is
intended, I could not find much information regarding the same as this
is a new feature. So, I will be reaching out to the internal team with
all the details and the example state machine that you have provided.
Thank you for bringing this to our notice. I will get back to you as
soon as I have an update from the internal team. Please be assured
that I will regularly follow up with the team and work with them to
investigate further.
Meanwhile, if you have any other queries or concerns, please do let me
know.
Have a great day ahead!
I will update here when I get more information.

How can I update a FireHose delivery stream's DataFormatConversionConfiguration using the AWS SDK?

Does anyone have a working example of using firehose.update_destination to set an S3 destination's DataFormatConversionConfiguration? I'm following the guidance in Is it possible to specify data format conversion in AWS Cloudformation?, using boto3 (the AWS Python SDK), but I've not been successful. When I include a DFCC in an ExtendedS3DestinationConfiguration argument, it fails with the following error:
Exception during processing: An error occurred (InvalidArgumentException) when calling the UpdateDestination operation: RoleArn must not be null or empty
If I pass the original destination configuration (as returned by describe_delivery_stream) unchanged, the update succeeds. I can also change other config options, e.g. BufferingHints. The only time it fails is when DataFormatConversionConfiguration is non-null.
For example, passing this works:
{
"RoleARN": "arn:aws:iam::1234567:role/MyExecutionRole",
"BucketARN": "arn:aws:s3:::my-bucket",
"Prefix": "databases/tables/requests/",
"BufferingHints": {
"SizeInMBs": 64,
"IntervalInSeconds": 120
},
"CompressionFormat": "UNCOMPRESSED",
"EncryptionConfiguration": {
"NoEncryptionConfig": "NoEncryption"
},
"CloudWatchLoggingOptions": {
"Enabled": false
},
"S3BackupMode": "Disabled"
}
but passing this fails:
{
"RoleARN": "arn:aws:iam::1234567:role/MyExecutionRole",
"BucketARN": "arn:aws:s3:::my-bucket",
"Prefix": "databases/tables/requests/",
"BufferingHints": {
"SizeInMBs": 64,
"IntervalInSeconds": 120
},
"CompressionFormat": "UNCOMPRESSED",
"EncryptionConfiguration": {
"NoEncryptionConfig": "NoEncryption"
},
"CloudWatchLoggingOptions": {
"Enabled": false
},
"S3BackupMode": "Disabled",
"DataFormatConversionConfiguration": {
"InputFormatConfiguration": {
"Deserializer": {
"OpenXJsonSerDe": {
}
}
},
"SchemaConfiguration": {
"TableName": "requests",
"DatabaseName": "mydb"
},
"OutputFormatConfiguration": {
"Serializer": {
"OrcSerDe": {
}
}
}
}
}
The only difference is the DataFormatConversionConfiguration element.
Am I overlooking something obvious? Perhaps the DFCC element is malformed? I've not been able to find any working examples, so I'm going purely from documentation.
I'm also rather surprised by the use of RoleARN and BucketARN in the input element, vs the usual convention of RoleArn and BucketArn, but not sure if it's germane.
As you suspected, your DataFormatConversionConfiguration is malformed.
Perhaps confusingly I think the RoleArn it's complaining about being missing is DataFormatConversionConfiguration.SchemaConfiguration.RoleARN.
I'm not going to copy it all here, but I find looking at the service documentation is the best way to find deeper information about the types used by the SDK: https://docs.aws.amazon.com/firehose/latest/APIReference/API_DataFormatConversionConfiguration.html