Which is faster in performance putItem v/s UpdateItem in dynamodb? - amazon-web-services

I am using transaction (transact write) which has the series of following operations:-
Updating primary key - delete operation
adding a new record with the updated primary key:- This actually means a new record so here I am confused whether to use put or update as both will insert a new record according to docs.
will the performance be the same in this case?
Note i am not looking for the difference in terms of how they perform as mentioned in this question Difference between DynamoDb PutItem vs UpdateItem?

Related

clear dynamo DB table without specifying any key

I want to truncate dynamodb table which can have up to 3 millions to 4 millions of records. what is the best way?
Right now I am using scan which does not give good performance(I have tried to delete only for few records: 3):
DynamoDB dynamoDB = new DynamoDB(amazonDynamoDBClient);
Table table = dynamoDB.getTable("table-test");
ItemCollection<ScanOutcome> resultItems = table.scan();
Iterator<Item> itemsItr = resultItems.iterator();
while(itemsItr.hasNext()){
Item item = itemsItr.next();
String itemPk = (String) item.get("PK");
String itemSk = (String) item.get("SK");
DeleteItemSpec deleteItemSpec = new DeleteItemSpec().withPrimaryKey("PK", itemPk, "SK", itemSk);
table.deleteItem(deleteItemSpec);
}
The best way is to delete your table, and create new one of the same name. This is how clearing all data from DynamoDB is usually performed.
As Marcin already answered, the best way is to delete your table and create a new one. It is certainly the cheapest way - because any other way would require scanning the entire table and paying for the read capacity units required to do it.
In some cases, however, you might want to delete old items while the table is still actively used. In that case you can use a Scan like you wanted, but can do it much more efficiently than you did: First, don't run individual DeleteItem requests sequentially, waiting for one delete to complete before asking for the next one... You can send batches of 25 deletes in one BatchWriteItem request. You can also send multiple BatchWriteItem requests in parallel. Finally, for even faster deletion, you can parallelize your Scan to multiple threads or even machines - see the parallel scan section of the DynamoDB documentation. Just don't forget that if you delete items while the table is still actively written to, you need a way to tell old items which you want to delete, from new items that you don't want to delete - as the scan may start producing these new items as well.
Finally, if you find yourself often clearing old data from a table - you should consider whether you can use DynamoDB's TTL feature, where DynamoDB automatically looks for expired items (based on an expiration-time attribute on each item) and deletes them - at no cost to you.

Efficient way to get and store count of items in dynamo db

I have a dynamo db table with following structure
partitionKey - userId+keyName
sortKey - keyName+itemId
itemData - any object
createdAt - long value
updatedAt - long value
In this table I want to save list of items lets say all unique eatable items found in a shop. As per the requirement I need to find out the count of items in a particular shop. As per my findings I came across three ways to do this
Use Query to fetch count as per this link without explicitly saving count value.
Use transactions while saving items and store/update count explicitly. [We want to add/remove multiple items in a single request]. And later get count using GetItem api.
Use dynamo db streams to trigger SNS and eventually store explicit count in the same table/different table. And later get count using GetItem api.
Note
Latency is important here along with the cost.
You can assume this dynamo db table can have millions of items.
Eventual consistency is fine.
In my view 3rd option looks more efficient in terms of cost, latency. But want to know if my thoughts are correct
Using Dynamo streams to write aggregate data back to Dynamo is definitely the way to go!
This will of course be eventually consistent by its nature, as updating your item and waiting for the stream to update the aggregate are two different non-atomic operations.
A fourth option would be to have something like an ElasticSearch index updated (also by using streams), which allows you to do arbitrary ad-hoc queries.
If you need consistency for your aggregates, you have to use transactions for this, with all the limitations imposed.

DynamoDB Concurrency Issue

I'm building a system in which many DynamoDB (NoSQL) tables all contain data and data in one table accesses data in another table.
Multiple processes are accessing the same item in a table at the same time. I want to ensure that all of the processes have updated data and aren't trying to access that item at the exact same time because they are all updating the item with different data.
I would love some suggestions on this as I am stuck right now and don't know what to do. Thanks in advance!
Optimistic locking is a strategy to ensure that the client-side item that you are updating (or deleting) is the same as the item in Amazon DynamoDB. If you use this strategy, your database writes are protected from being overwritten by the writes of others, and vice versa.
With optimistic locking, each item has an attribute that acts as a version number. If you retrieve an item from a table, the application records the version number of that item. You can update the item, but only if the version number on the server side has not changed. If there is a version mismatch, it means that someone else has modified the item before you did. The update attempt fails, because you have a stale version of the item. If this happens, you simply try again by retrieving the item and then trying to update it. Optimistic locking prevents you from accidentally overwriting changes that were made by others. It also prevents others from accidentally overwriting your changes.
To support optimistic locking, the AWS SDK for Java provides the #DynamoDBVersionAttribute annotation. In the mapping class for your table, you designate one property to store the version number, and mark it using this annotation. When you save an object, the corresponding item in the DynamoDB table will have an attribute that stores the version number. The DynamoDBMapper assigns a version number when you first save the object, and it automatically increments the version number each time you update the item. Your update or delete requests succeed only if the client-side object version matches the corresponding version number of the item in the DynamoDB table.
ConditionalCheckFailedException is thrown if:
You use optimistic locking with #DynamoDBVersionAttribute and the version value on the server is different from the value on the client side.
You specify your own conditional constraints while saving data by using DynamoDBMapper with DynamoDBSaveExpression and these constraints failed.
Note
DynamoDB global tables use a “last writer wins” reconciliation between concurrent updates. If you use global tables, last writer policy wins. So in this case, the locking strategy does not work as expected.

Dynamo DB Optimistic Locking Behavior during Save Action

Scenario: We have a Dynamo DB table supporting Optimistic Locking with Version Number. Two concurrent threads are trying to save two different entries with the same primary key value to that Table.
Question: Will ConditionalCheckFailedException be thrown for the latter save action?
Yes, the second thread which tries to insert the same data would throw ConditionalCheckFailedException.
com.amazonaws.services.dynamodbv2.model.ConditionalCheckFailedException
As soon as the item is saved in database, the subsequent updates should have the version matching with the value on DynamoDB table (i.e. server side value).
save — For a new item, the DynamoDBMapper assigns an initial version
number 1. If you retrieve an item, update one or more of its
properties and attempt to save the changes, the save operation
succeeds only if the version number on the client-side and the
server-side match. The DynamoDBMapper increments the version number
automatically.
We had a similar use case in past but in our case, multiple threads reading first from the dynamoDB and then trying to update the values.
So finally there will be change in version by the time they read and they try to update the document and if you don't read the latest value from the DynamoDB then intermediate update will be lost(which is known as update loss issue refer aws-docs for more info).
I am not sure, if you have this use-case or not but if you have simply 2 threads trying to update the value and then if one of them get different version while their request reached to DynamoDB then you will get ConditionalCheckFailedException exception.
More info about this error can be found here http://grepcode.com/file/repo1.maven.org/maven2/com.michelboudreau/alternator/0.10.0/com/amazonaws/services/dynamodb/model/ConditionalCheckFailedException.java

DynamoDB query() versus getItem() for single-item retrieval based on the index

If I'm retrieving a single item from my table based on the indexed hash key, is there a performance difference between query() or getItem()?
getItem will be faster
getItem retrieve via hash and range key is a 1:1 fit, the time it takes (hence performance) to retrieve it is limited by the hash and sharding internally.
Query results in a search on "all" range keys. It adds computational work, thus considered slower.
Edit: Just for quick comparison adding the following picture from this blog entry.
In Amazon's DynamoDB, your performances are guaranteed whatever the access method. (you pay for it).
There may be a couple a milliseconds differences on the DynamoDB servers themselves as suggested by Chen Harel but these are negligible because of the HTTP request RTT.
This said, it's a good practice to issue a GET instead of QUERY when you have enough informations to do so.
As suggested by aws employee in one of the discussion, I quote:
The latency of GetItem vs Query with limit=1 will be equivalent.
AWS discussion link
There is no performance difference between the two. The hash calculation in both the queries are done 1 by 1.
The latter, i.e., get item is just provided as an analogy to the JPA repository/spring findOne/findById to make wiring in Spring Bean wiring/ Hibernate configs easier.