AWS State Machine - Update DynamoDB table is replacing the ID with "$.id" - amazon-web-services

I have a step where I want to update a object on a DynamoDB table.
Everything works except its creating a new object with the ID value of "$.id", instead of updating where the ID I pass in.
This is my first state machine attempt so what have I done wrong here?
"update-table-processing": {
"Type": "Task",
"Resource": "arn:aws:states:::dynamodb:updateItem",
"ResultPath": "$.updateResult",
"Parameters": {
"TableName": "Projects",
"Key": {
"id": {
"S": "$.id"
}
},
"UpdateExpression": "SET step = :updateRef",
"ExpressionAttributeValues": {
":updateRef": {
"S": "processing"
}
},
"ReturnValues": "ALL_NEW"
},
"Next": "create-project"
},
Do I somehow need to tell DynamoDB to evaluate "$.id" rather than treating it as a "S", or is this happening because I've not mapped the input correctly that the "$.id" value is empty?
My input looks like:
{
"id": "f8185735-c90d-4d4e-8689-cec68a48b1bc"
}

In order to specify data from your input you have to use a Key-Value pair, with the key value ending in a ".$". So to fix this you need to change it to:
"Key": {
"id": {
"S.$": "$.id"
}
},
Using the above it should correctly resolve to the value from your input instead of the string value "$.id".
References - https://docs.aws.amazon.com/step-functions/latest/dg/input-output-inputpath-params.html#input-output-parameters

Related

What's the best practice for unmarshalling data returned from a dynamo operation in aws step functions?

I am running a state machine running a dynamodb query (called using CallAwsService). The format returned looks like this:
{
Items: [
{
"string" : {
"B": blob,
"BOOL": boolean,
"BS": [ blob ],
"L": [
"AttributeValue"
],
"M": {
"string" : "AttributeValue"
},
"N": "string",
"NS": [ "string" ],
"NULL": boolean,
"S": "string",
"SS": [ "string" ]
}
}
]
}
I would like to unmarshall this data efficiently and would like to avoid using a lambda call for this
The CDK code we're currently using for the query is below
interface FindItemsStepFunctionProps {
table: Table
id: string
}
export const FindItemsStepFunction = (scope: Construct, props: FindItemStepFunctionProps): StateMachine => {
const { table, id } = props
const definition = new CallAwsService(scope, 'Query', {
service: 'dynamoDb',
action: 'query',
parameters: {
TableName: table.tableName,
IndexName: 'exampleIndexName',
KeyConditionExpression: 'id = :id',
ExpressionAttributeValues: {
':id': {
'S.$': '$.path.id',
},
},
},
iamResources: ['*'],
})
return new StateMachine(scope, id, {
logs: {
destination: new LogGroup(scope, `${id}LogGroup`, {
logGroupName: `${id}LogGroup`,
removalPolicy: RemovalPolicy.DESTROY,
retention: RetentionDays.ONE_WEEK,
}),
level: LogLevel.ALL,
},
definition,
stateMachineType: StateMachineType.EXPRESS,
stateMachineName: id,
timeout: Duration.minutes(5),
})
}
Can you unmarshall the data downstream? I'm not too well versed on StepFunctions, do you have the ability to import utilities?
Unmarshalling DDB JSON is as simple as calling the unmarshall function from DynamoDB utility:
https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/modules/_aws_sdk_util_dynamodb.html
You may need to do so downstream as StepFunctions seems to implement the low level client.
Step functions still don't make it easy enough to call DynamoDB directly from a step in a state machine without using a Lambda function. The main missing parts are the handling of the different cases of finding zero, one or more records in a query, and the unmarshaling of the slightly complicated format of DynamoDB records. Sadly the $utils library is still not supported in step functions.
You will need to implement these two in specific steps in the graph.
Here is a diagram of the steps that we use as DynamoDB query template:
The first step is used to provide parameters to the query. This step can be omitted and define the parameters in the query step:
"Set Query Parameters": {
"Type": "Pass",
"Next": "DynamoDB Query ...",
"Result": {
"tableName": "<TABLE_NAME>",
"key_value": "<QUERY_KEY>",
"attribute_value": "<ATTRIBUTE_VALUE>"
}
}
The next step is the actual query to DynamoDB. You can also use GetItem instead of Query if you have the record keys.
"Type": "Task",
"Parameters": {
"TableName": "$.tableName",
"IndexName": "<INDEX_NAME_IF_NEEDED>",
"KeyConditionExpression": "#n1 = :v1",
"FilterExpression": "#n2.#n3 = :v2",
"ExpressionAttributeNames": {
"#n1": "<KEY_NAME>",
"#n2": "<ATTRIBUTE_NAME>",
"#n3": "<NESTED_ATTRIBUTE_NAME>"
},
"ExpressionAttributeValues": {
":v1": {
"S.$": "$.key_value"
},
":v2": {
"S.$": "$.attribute_value"
}
},
"ScanIndexForward": false
},
"Resource": "arn:aws:states:::aws-sdk:dynamodb:query",
"ResultPath": "$.ddb_record",
"ResultSelector": {
"result.$": "$.Items[0]"
},
"Next": "Check for DDB Object"
}
The above example seems a bit complicated, using both ExpressionAttributeNames and ExpressionAttributeValues. However, it makes it possible to query on nested attributes such as item.id.
In this example, we only take the first item response with $.Items[0]. However, you can take all the results if you need more than one.
The next step is to check if the query returned a record or not.
"Check for DDB Object": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.ddb_record.result",
"IsNull": false,
"Comment": "Found Context Object",
"Next": "Parse DDB Object"
}
],
"Default": "Do Nothing"
}
And lastly, to answer your original question, we can parse the query result, in case that we have one:
"Parse DDB Object": {
"Type": "Pass",
"Parameters": {
"string_object.$": "$.ddb_record.result.string_object.S",
"bool_object.$": "$.ddb_record.result.bool_object.Bool",
"dict_object": {
"nested_dict_object.$": "$.ddb_record.result.item.M.name.S",
},
"dict_object_full.$": "States.StringToJson($.ddb_record.result.JSON_object.S)"
},
"ResultPath": "$.parsed_ddb_record",
"End": true
}
Please note that:
Simple strings are easily converted by "string_object.$": "$.ddb_record.result.string_object.S"
The same for numbers or booleans by "bool_object.$": "$.ddb_record.result.bool_object.Bool")
Nested objects are parsing the map object ("item.name.$": "$.ddb_record.result.item.M.name.S", for example)
Creation of a JSON object can be achieved by using States.StringToJson
The parsed object is added as a new entry on the flow using "ResultPath": "$.parsed_ddb_record"

Access an Array Item by index in AWS Dynamodb Query Results "Items" in Step Function

I have this dynamodb:Query in my step function:
{
"Type": "Task",
"Resource": "arn:aws:states:::aws-sdk:dynamodb:query",
"Next": "If nothing returned by query Or Study not yet Zipped",
"Parameters": {
"TableName": "TEST-StudyProcessingTable",
"ScanIndexForward": false,
"Limit": 1,
"KeyConditionExpression": "OrderID = :OrderID",
"FilterExpression": "StudyID = :StudyID",
"ExpressionAttributeValues": {
":OrderID": {
"S.$": "$.body.order_id"
},
":StudyID": {
"S.$": "$.body.study_id"
}
}
},
"ResultPath": "$.processed_files"
}
The results comes in as an array called Items which is nested under my ResultPath
processed_files.Items:
{
"body": {
"order_id": "1001",
"study_id": "1"
},
"processed_files": {
"Count": 1,
"Items": [
{
"Status": {
"S": "unzipped"
},
"StudyID": {
"S": "1"
},
"ZipFileS3Key": {
"S": "path/to/the/file"
},
"UploadSet": {
"S": "4"
},
"OrderID": {
"S": "1001"
},
"UploadSet#StudyID": {
"S": "4#1"
}
}
],
"LastEvaluatedKey": {
"OrderID": {
"S": "1001"
},
"UploadSet#StudyID": {
"S": "4#1"
}
},
"ScannedCount": 1
}
}
My question is how do i access the items inside this array from a choice state in a step function?
I need to query then decide something based on the results by checking the item in a condition in a choice state.
The problem is that since this is an array I can't access it using regular JsonPath (like with Items.item), and in my next step the choice condition does NOT accept an index like processed_files.Items['0'].Status
Ok so the answer was so simple all you need to do is use a number instead of string for the array index like this.
processed_files.Items[0].Status
I was originally mislead by an error I received which said that it expected a ' or '[' after the first '['. I mistakenly thought this meant it only accepts strings.
I was wrong, it works like any other array.
I hope this helps somebody one day.

AWS StepFunctions - Merge and flatten the task output combined with the original input

How do we use Parameters, ResultPath and ResultSelector to combine the results of a Task with the original input in the same JSON level?
I checked the documentation on AWS, but it seems that ResultSelector always create a new dictionary which puts it in 1-level below on the result.
Example input
{
"status": "PENDING",
"uuid": "00000000-0000-0000-0000-000000000000",
"first_name": "John",
"last_name": "Doe",
"email": "john.doe#email.com",
"orders": [
{
"item_uuid": "11111111-1111-1111-1111-111111111111",
"quantities": 2,
"price": 2.38,
"created_at": 16049331038000
}
]
}
State Machine definition
"Review": {
"Type": "Task",
"Resource": "arn:aws:states:us-east-1:123456789012:activity:Review",
"ResultPath": null,
"Next": "Processing",
"Parameters": {
"task_name": "REVIEW_REQUIRED",
"uuid.$": "$.uuid"
}
},
Example output from Review Activity
{
"review_status": "APPROVED"
}
Question
How do I update the State Machine definition to combined the result of Review Activity and the original input to something as below?
{
"status": "PENDING",
"uuid": "00000000-0000-0000-0000-000000000000",
"first_name": "John",
"last_name": "Doe",
"email": "john.doe#email.com",
"orders": [
{
"item_uuid": "11111111-1111-1111-1111-111111111111",
"quantities": 2,
"price": 2.38,
"created_at": 16049331038000
}
],
"review_status": "APPROVED"
}
NOTE
I don't have access to the Activity code, just the definition file.
I recommend NOT doing the way suggested above as you will drop all data that you do not include. It's not a long term approach, you can more easily do it like this:
Step Input
{
"a": "a_value",
"b": "b_value",
"c": {
"c": "c_value"
}
}
In your state-machine.json
"Flatten And Keep All Other Keys": {
"Type": "Pass",
"InputPath": "$.c.c",
"ResultPath": "$.c",
"Next": "Some Other State"
}
Step Output
{
"a": "a_value",
"b": "b_value",
"c": "c_value"
}
While Step Function does not allow you to do so, you can create a Pass state that flattens the input as a workaround.
Example Input:
{
"name": "John Doe",
"lambdaResult": {
"age": "35",
"location": "Eastern Europe"
}
}
Amazon State Language:
"Flatten": {
"State": "Pass",
"Parameters": {
"name.$" : "$.name",
"age.$" : "$.lambdaResult.age",
"location.$": "$.lambdaResult.location"
},
"Next": "MyNextState"
}
Output:
{
"name": "John Doe",
"age": "35",
"location": "Eastern Europe"
}
It's tedious, but it gets the job done.
Thanks for your question.
It looks like you don't necessarily need to manipulate the output in any way, and are looking for a way to combine the state's output with its input before passing it on to the next state. The ResultPath field allows you to combine a task result with task input, or to select one of these. The path you provide to ResultPath controls what information passes to the output.

Extract specific users comments from a list using Wikipedia API and Python 2.7

I am using the wikipedia API - wikitools package to extract some data from Wikipedia. I get the output of the format shown below and now I want to extract the timestamp and the comment for revisions made of specific user for several pages. Let's say I just want the comments made by TechBot, then I figured that I can do something like:
for revision in res["query"]["pages"]["7940378"]["revisions"]:
if revision["user"] = "Techbot":
do.something()
But the problem is ["7940378"] because this is a unique page id and will change for every page and I dont know how to get the pageid. Is there another way of doing this?
[{
"query": {
"pages": {
"7940378": {
"ns": 0,
"pageid": 7940378,
"revisions": [
{
"comment": "robot Modifying: [[az:T\u00fcrk Tarixi]]",
"timestamp": "2009-01-03T19:47:11Z",
"user": "TechBot"
},
{
"comment": "",
"timestamp": "2009-02-14T02:07:49Z",
"anon": "",
"user": "88.231.237.130"
},
{
"comment": "fixing recent deletion by merging it with the next paragraph",
"timestamp": "2009-04-03T14:49:27Z",
"user": "Soap"
},
{
"comment": "robot Modifying: [[az:T\u00fcrk tarixi]]",
"timestamp": "2009-04-09T14:35:19Z",
"user": "RibotBOT"
},
{
"comment": "Repairing link to disambiguation page - [[Wikipedia:Disambiguation pages with links|You can help!]]",
"timestamp": "2009-06-12T23:55:55Z",
"user": "J04n"
}
],
"title": "History of the Turkic peoples"
}
}
},
"continue": {
"rvcontinue": "20090807172715|306635892",
"continue": "||"
},
"warnings": {
"main": {
"*": "Unrecognized parameter: 'user'"
}
}
}]
Instead of using a single for loop. you can split up into two loops, where the outer loop gets the pages, and with the inner loop you can get to the revisions.
for pageid, pagedetails in res["query"]["pages"].iteritems():
for revision in pagedetails["revisions"]:
if revision["user"] == "TechBot":
do.something()

Elasticsearch : es.index() changes the Mapping when message is pushed

I am trying to push some messages like this to elasticsearch
id=1
list=asd,bcv mnmn,kjkj, pop asd dgf
so, each message has an id field which is a string, and a list field that contains a list of string values
when i push this into elastic and try to create charts in kibana, the default analyzer kicks in and splits my list by the space character. Hence it breaks up my values. I tried to create a mapping for my index as
mapping='''
{
"test":
{
"properties": {
"DocumentID": {
"type": "string"
},
"Tags":{
"type" : "string",
"index" : "not_analyzed"
}
}
}
}'''
es = Elasticsearch([{'host': server, 'port': port}])
indexName = "testindex"
es.indices.create(index=indexName, body=mapping)
so this should create the index with the mapping i defined. Now , i push the messages by simply
es.index(indexName, docType, messageBody)
but even now, Kibana breaks up my values! why was the mapping not applied ?
and when i do
GET /testindex/_mapping/test
i get
{
"testindex": {
"mappings": {
"test": {
"properties": {
"DocumentID": {
"type": "string"
},
"Tags": {
"type": "string"
}
}
}
}
}
}
why did the mapping change? How can i specify the mapping type when i do
es.index()
You were very close. You need to provide the root mappings object while creating the index and you dont need it when using _mapping end point and that is the reason put_mapping worked and create did not. You can see that in api.
mapping = '''
{
"mappings": {
"test": {
"properties": {
"DocumentID": {
"type": "string"
},
"Tags": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
'''
Now this will work as expected
es.indices.create(index=indexName, body=mapping)
Hope this helps
i was able to get the correct mapping to work by
es.indices.create(index=indexName)
es.indices.put_mapping(docType, mapping, indexName)
i dont understand why
es.indices.create(index=indexName, body=mapping)
did not work. this should have worked as per the API.