Dynamodb Update Multiple Return Values - amazon-web-services

Is there a way to make a Dynamodb Update return Old and New values?
something like:
updateItemSpec
.withPrimaryKey("id", id)
.withUpdateExpression(myUpdateExpression)
.withNameMap(nameMap)
.withValueMap(valueMap)
.withReturnValues("UPDATED_NEW, UPDATED_OLD");

There isn't.
It should be easy for you to simulate this by returning UPDATED_OLD. You already have the new values as you set them in the update, so get the updated old values, and use that to extract your new values from your value map.

Depending on where you want to use the data, if you don't need it in the body of code where you update a DynamoDB record, you can capture table activity using DynamoDB streams. You can configure an AWS lambda trigger on the table so it invokes the lambda when a specified event occurs, passing this event (in our case, the stream) to the lambda. From this, depending on how you have set up the stream, you can access the old and new versions of the record.

Related

Update an attribute in DynamoDB if another attribute exists

In DynamoDB, is it possible to accomplish the following via an UpdateItem operation?
set attributeA to valueX if the attribute does not exist
set attributeB to valueX if attributeA exists
Both of these must be in the same UpdateItem operation. I'm aware of things like attribute_not_exists and if_not_exists, but I couldn't find an if_exists condition.
How can I accomplish the above task using the low-level APIs?
Sorry, you cannot in one UpdateItem.
You can fetch the item, modify it on the client, then push a new copy. Use optimistic locking to ensure the item wasn't modified in between.

DynamoDB insert timestamps with trigger vs in a put/post request

I have two small dynamodb tables with about 10 attributes, I want to add "CreatedDate" and "ModifiedDate" attributes to these. I am trying to decide what would be the best practice to do it with the lowest cost and highest performance, reusability.
First, I was thinking to create a trigger and add these attributes when there is an update or create operation in the table. I like this way because it will be centralized. However, I am not sure if this is the cheapest way to do it, because, after a new item written in the table, this trigger will do another write operation to insert the dates.
Second, just sending these values in the "PUT" request as new attributes. That way, I will have to do only one write operation. The downside of doing this, I will need to update each function writes an item to these tables.
Which way I should go in that case? Are there any better ways to do it or anything I am missing?

DynamoDB streams - write data back into table

Consider the following architecture:
write -> DynamoDB table -> stream -> Lambda -> write metadata item to same table
It could be used for many, many awsome situations, e.g table and item level aggregations. I've seen this architecture promoted in several tech talks by official AWS engineers.
But doesn't writing metadata item add new item to stream and run Lambda again?
How to avoid infinite loop? Is there a way to avoid metadata write appearing in stream?
Or is spending 2 stream and Lambda requests inevitable with this architecture? (we're charged per request) I.e exit Lambda function early if it's metadata item.
As triggering an AWS Lambda function from a DynamoDB stream is a binary option (on/off), it's not possible to only trigger the AWS Lambda function for certain writes to the table. So your AWS Lambda function will be called again for the items it just wrote to the DynamoDB table. The important bit is to have logic in place in your AWS Lambda function to detect that it wrote that data and to not write data in that case again. Otherwise you'd get the mentioned infinite loop, which would be a really unfortunate situation, especially if it would went unnoticed.
Currently dynamo DB does not offer condition based subscription to stream, so yes Dynamo DB will execute your lambda function in an infinite loop, currently the only solution is to limit the time your lambda function execute, you can use multiple lambda functions, one lambda function would be there just to check whether a metadata was written or not, I'm sharing a cloud architecture diagram of how you can achieve it,
A bit late but hopefully people looking for a more demonstrative answer will find this useful.
Suppose you want to process records where you want to add to an item up to a certain threshold, you could have an if condition that checks that and processes or skips the record, e.g.
This code assumes you have an attribute "Type" for each of your entities / object types - this was recommended to me by Rick Houlihan himself but you could also check if an attribute exists i.e. "<your-attribute>" in record["dynamodb"]["NewImage"] - and you are designing with PK and SK as generic primary and sort key names.
threshold = (os.environ.get("THRESHOLD"))
def get_value():
response = table.query(KeyConditionExpression=Key('PK').eq(<your-pk>))
value = response['Items']['<your-attribute>'] if 'Items' in response else 0
return value
def your_aggregation_function():
# Your aggregation logic here
# Write back to the table with a put_item call once done
def lambda_handler(event, context):
for record in event['Records']:
if record['eventName'] != "REMOVE" and record["dynamodb"]["NewImage"]["Type'] == <your-entity-type>:
# Query the table to extract the attribute value
attribute_value = get_value(record["dynamodb"]["Keys"]["PK"]["S"])
if attribute_value < threshold:
# Send to your aggregation function
Having the conditions in place in the lambda handler (or you could change where to suit your needs) prevents the infinite loop mentioned.
You may want additional checks in the update expression to make sure two (or more) concurrent lambda are not writing the same object. I suggest you use a date = # timestamp defined in the lambda and add this in the SK, or if you cant, have an "EventDate" attribute in your item so that yo ucould add ConditionExpression or UpdateExpression SET if_not_exists(#attribute, :date)
The above will guarantee that your lambda is idempotent.

Best way to retrieve all data in DynamoDB table and subsequently clear the table

I am interested in receiving all the data located in my DynamoDB table and most efficiently clear the table afterwards.
I have seen how to delete an item and retrieve it concurrently in concurrency - DynamoDB - how to retrieve and delete (pop) an item?. I have also seen batch deletion in database - What is the recommended way to delete a large number of items from DynamoDB?.
Ideally I would like to concurrently clear the table and retrieve the data. Is there a better way to do this?
If you subscribe a lambda function to the dynamo db stream for the table, you can process items as they are deleted (technically, it will show up in the stream shortly after the item is deleted).
You could retrieve all rows from the table, put it in a list and then send the list to dynamodbs delete method.
This works for me:
public async Task DeleteAllReadModelEntitiesInTable()
{
List<ReadModelEntity> readModels;
var conditions = new List<ScanCondition>();
readModels = await _context.ScanAsync<ReadModelEntity>(conditions).GetRemainingAsync();
var batchWork = _context.CreateBatchWrite<ReadModelEntity>();
batchWork.AddDeleteItems(readModels);
await batchWork.ExecuteAsync();
}

Dynamo DB Optimistic Locking Behavior during Save Action

Scenario: We have a Dynamo DB table supporting Optimistic Locking with Version Number. Two concurrent threads are trying to save two different entries with the same primary key value to that Table.
Question: Will ConditionalCheckFailedException be thrown for the latter save action?
Yes, the second thread which tries to insert the same data would throw ConditionalCheckFailedException.
com.amazonaws.services.dynamodbv2.model.ConditionalCheckFailedException
As soon as the item is saved in database, the subsequent updates should have the version matching with the value on DynamoDB table (i.e. server side value).
save — For a new item, the DynamoDBMapper assigns an initial version
number 1. If you retrieve an item, update one or more of its
properties and attempt to save the changes, the save operation
succeeds only if the version number on the client-side and the
server-side match. The DynamoDBMapper increments the version number
automatically.
We had a similar use case in past but in our case, multiple threads reading first from the dynamoDB and then trying to update the values.
So finally there will be change in version by the time they read and they try to update the document and if you don't read the latest value from the DynamoDB then intermediate update will be lost(which is known as update loss issue refer aws-docs for more info).
I am not sure, if you have this use-case or not but if you have simply 2 threads trying to update the value and then if one of them get different version while their request reached to DynamoDB then you will get ConditionalCheckFailedException exception.
More info about this error can be found here http://grepcode.com/file/repo1.maven.org/maven2/com.michelboudreau/alternator/0.10.0/com/amazonaws/services/dynamodb/model/ConditionalCheckFailedException.java