Dynamo DB Optimistic Locking Behavior during Save Action - amazon-web-services

Scenario: We have a Dynamo DB table supporting Optimistic Locking with Version Number. Two concurrent threads are trying to save two different entries with the same primary key value to that Table.
Question: Will ConditionalCheckFailedException be thrown for the latter save action?

Yes, the second thread which tries to insert the same data would throw ConditionalCheckFailedException.
com.amazonaws.services.dynamodbv2.model.ConditionalCheckFailedException
As soon as the item is saved in database, the subsequent updates should have the version matching with the value on DynamoDB table (i.e. server side value).
save — For a new item, the DynamoDBMapper assigns an initial version
number 1. If you retrieve an item, update one or more of its
properties and attempt to save the changes, the save operation
succeeds only if the version number on the client-side and the
server-side match. The DynamoDBMapper increments the version number
automatically.

We had a similar use case in past but in our case, multiple threads reading first from the dynamoDB and then trying to update the values.
So finally there will be change in version by the time they read and they try to update the document and if you don't read the latest value from the DynamoDB then intermediate update will be lost(which is known as update loss issue refer aws-docs for more info).
I am not sure, if you have this use-case or not but if you have simply 2 threads trying to update the value and then if one of them get different version while their request reached to DynamoDB then you will get ConditionalCheckFailedException exception.
More info about this error can be found here http://grepcode.com/file/repo1.maven.org/maven2/com.michelboudreau/alternator/0.10.0/com/amazonaws/services/dynamodb/model/ConditionalCheckFailedException.java

Related

DynamoDB use case handling

I am using dynamoDB for a project. I have a use case where I maintain timeline for objects i.e. start and end time for an object and start time for next object. New objects can be added in between two existing objects(o1 & o2) in which I will have to update start time for next object in o1 and start time for next object in new object as start time of o2. This can cause problem in case two new objects are being added in between two objects and would probably require transactions. Can someone suggest how this can be handled?
Update: My data model looks like this:
objectId(Hash Key), startTime(Sort Key), endTime, nextStartTime
1, 1, 5, 4
1, 4, 6, 8
1, 8, 10, 9
So, it's possible a new entry comes in whose start time is 5. So, in transaction I will have to update nextStartTime for second entry to 5 and insert a new entry after the second entry which contains nextStartTime as start time of third entry. During this another entry might come in which also has start time between second and third entry(say 7 for eg.). Now I want the two transactions to be isolated of each other. In traditional SQL DBs it would be possible as second entry would be locked for the duration of transaction but Dynamo doesn't lock the items. So, I am wondering if I use transaction would the two transactions protect the data integrity.
DynamoDB supports optimistic locking. This is achieved via conditional writes.
You can do it manually by introducing a version attribute or you can use the one provided (hopefully) by your SDK. Here is a link to AWS docs.
TLDR
two objects have to update the same timeline at the same time
one will succeed the other will fail with a specific error
you will have to retry the failing one
Dynamo also has transactions. However, they are limited to 25 elements and consume 2x capacity units. If you can get away with an optimistic lock go for it.
Hope this was helpful
Update with more info on transactions
From this doc
Error Handling for Writing Write transactions don't succeed under the
following circumstances:
When a condition in one of the condition expressions is not met.
When a transaction validation error occurs because more than one
action in the same TransactWriteItems operation targets the same item.
When a TransactWriteItems request conflicts with an ongoing
TransactWriteItems operation on one or more items in the
TransactWriteItems request. In this case, the request fails with a
TransactionCanceledException.
When there is an insufficient provisioned capacity for the transaction to
be completed.
When an item size becomes too large (larger than 400 KB), or a local
secondary index (LSI) becomes too large, or a similar validation error
occurs because of changes made by the transaction.
When there is a user error, such as an invalid data format.
They claim that if there are two ongoing transactions on the same item, one will fail.
Why store the nextStartTime in the item? The nextStartTime is simply the start time of the next item, right? Seems like it'd be much easier to just pull the item as well as the next item to get the full picture at read-time. With a Query you can do this in one call, and so long as items are less than 2 KB in size it wouldn't even consume more RCUs than a get item would.
Simpler design, no cost for transactional writes, no need to do extensive testing on thread safety.

DynamoDB version control using sort keys

Anyone that has implemented versioning using sort keys as stated in https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-sort-keys.html?
Trying to implement this using typescript for building a database with versions of the items. Is there any way of doing this using updateItem or is it a get + put operation needed?
Any sample to get me started or help is much appreciated!
The concept of versioning using sort key involves the creation of a completely new item that uses same Partition Key and different Sort Key.
DynamoDB offers some operations that allow to update values within an object in an atomic way, this use case is perfect for when you have something like a counter or a quantity and you want to decrease/increase it without having to read its value first. - Docs here.
In the case you're trying to achieve, as mentioned, you are essentially creating a new object. DynamoDB, by itself, doesn't have any concept of versioning and what this pattern does is to cleverly leverage the relation between Partition Key and Sort Key and the fact that a PK can have multiple SK associated with it, to correlate multiple rows of the same table.
To answer your question, if your only source of truth (or data store) is DynamoDB, then yes, your client will have to first query the table to know which was the last version of the item being updated and then insert the new version.
In case you are recording this information elsewhere and are using DynamoDB only to store these versions, then no, one put operation will be enough but again, this assumes you can retrieve this info somewhere else.
In terms of samples, the official documentation of the AWS SDK is always a good start, in your case I assume you'll want to use the Javascript one which you can find here.
At a very high level, you'll have to do the following:
Create an AWS.DynamoDB() client.
Execute a query using the dynamodb.query() method and specifying the PK of the item you want to update.
Go through the items (rows) returned from the previous query and find the one with the bigger version number as SK.
Put a new item using the dynamodb.putItem() method passing an item with the incremented version number as SK and same PK.
You can do the technique described by Amazon with a read and then a write, or more accurately, a read and then two writes (since they want to update both v0 and a new v4!). Often, you need the extra read because you want to build the new version v4 based on data you read from v3 (or equivalently, v0) - but in case you don't need that, the read is not necessary, and two writes are enough:
You first do an UpdateItem to v0 which increments the "Latest" attribute, sets whatever attributes you want to set in the new version, and uses the ReturnValues parameter to ask the update operation to return the new "Latest" attribute.
Then you write with PutItem the new row for v4 (where 4 is the "Latest" you just read).
This approach is safe in the sense that if two clients try to create two new versions at the same time, each one will pick a different "Latest", and both will appear on the version histories. However, it is not safe in the sense that if the client dies between step 1 and 2, you'll have a "hole" in the version history. However, I don't think there's any implementation of this technique that doesn't suffer from this problem.
After saying this, I want to reiterate what I said in the first paragraph: In most realistic use cases, the new version would be based on the old version, so your code anyway needs to read the old version first, then decide how to change it - and then write it (twice). You can't avoid the read in these cases. By the way, in this case the first write (to v0) would be a conditional update to verify that you only write the new version if the old version is still the same one ("Latest" is the same one you read during the read) - otherwise you'd be basing your modification on a non-current version. This is an example of optimistic locking.

Dealing with read eventual consistency by retrying GetItem

I building an API #1 that creates an item in DynamoDB. I'm building another API #2 that retrieves an item using GSI (input key may not exist). But GSI reads can only be eventually consistent, and I don't want the scenario where API #1 creates an item but API #2 doesn't get that item.
So I am thinking of this:
API #1 creates item via UpdateItem
API #1 tries to retrieve item using GSI via GetItem. Keeps retrying with exponential backoff until it gets the item. Once this happens, eventual consistency should be over.
API #2 retrieves item using same GSI as above via GetItem. Since API #1 already got the item, this should get the item on first try.
Note: I don't think API #2 can do the GetItem retries instead because its input key may not ever exist.
Would this work? Are there better solutions?
The property you are looking for is known in literature as monotonic read consistency - it's eventual consistency (after enough time you'll always read the new value), but additionally - when you read the new value once, further reads will not return the older value.
I couldn't find (and I tried to look hard...) any documentation guaranteeing that DynamoDB eventually-consistent reads have monotonic read consistency. Based on presentations I saw on DynamoDB's implementation (I don't have any inside knowledge), I believe that it in fact does not have monotonic read consistency:
From what I understood in those presentations, DynamoDB saves each piece of data on three nodes. One of the three nodes is the "leader" (for this piece of data) and writes go to it - and so do consistent reads. But eventually-consistent reads will go to one of the three nodes at random. So the following scenario is possible:
A write is supposed to update three copies of the GSI on three nodes - X, Y and Z - but at this point only X and Y were updated, Z wasn't yet.
API 1 reads from the GSI and randomly gets to ask node X and gets the new value.
Now API 2 reads from the GSI. It randomly gets node Z, and gets the old value!
So it will be possible that after your application finds the new value, another read will not find it :-(
If someone else can find better documentation for this issue than just my "what I understood from presentations" I'd love to read their answer too.

Scanning DynamoDB table while inserting

When we scan a DynamoDB table, we can/should use LastEvaluatedKey to track the progress so that we can resume in case of failures. The documentation says that
LastEvaluateKey is The primary key of the item where the operation stopped, inclusive of the previous result set. Use this value to start a new operation, excluding this value in the new request.
My question is if I start a scan, pause, insert a few rows and resume the scan from the previous LastEvaluatedKey, will I get those new rows after resuming the scan?
My guess is I might miss some of all of the new rows because the new keys will be hashed and the values could be smaller than LastEvaluatedKey.
Is my guess right? Any explanation or documentation links are appreciated.
It is going sequentially through your data, and it does not know about all items that were added in the process:
Scan operations proceed sequentially; however, for faster performance
on a large table or secondary index, applications can request a
parallel Scan operation by providing the Segment and TotalSegments
parameters.
Not only it can miss some of the items that were added after you've started scanning it can also miss some of the items that were added before the scan started if you are using eventually consistent read:
Scan uses eventually consistent reads when accessing the data in a
table; therefore, the result set might not include the changes to data
in the table immediately before the operation began.
If you need to keep track of items that were added after you've started a scan you can use DynamoDB streams for that.

DynamoDB Concurrency Issue

I'm building a system in which many DynamoDB (NoSQL) tables all contain data and data in one table accesses data in another table.
Multiple processes are accessing the same item in a table at the same time. I want to ensure that all of the processes have updated data and aren't trying to access that item at the exact same time because they are all updating the item with different data.
I would love some suggestions on this as I am stuck right now and don't know what to do. Thanks in advance!
Optimistic locking is a strategy to ensure that the client-side item that you are updating (or deleting) is the same as the item in Amazon DynamoDB. If you use this strategy, your database writes are protected from being overwritten by the writes of others, and vice versa.
With optimistic locking, each item has an attribute that acts as a version number. If you retrieve an item from a table, the application records the version number of that item. You can update the item, but only if the version number on the server side has not changed. If there is a version mismatch, it means that someone else has modified the item before you did. The update attempt fails, because you have a stale version of the item. If this happens, you simply try again by retrieving the item and then trying to update it. Optimistic locking prevents you from accidentally overwriting changes that were made by others. It also prevents others from accidentally overwriting your changes.
To support optimistic locking, the AWS SDK for Java provides the #DynamoDBVersionAttribute annotation. In the mapping class for your table, you designate one property to store the version number, and mark it using this annotation. When you save an object, the corresponding item in the DynamoDB table will have an attribute that stores the version number. The DynamoDBMapper assigns a version number when you first save the object, and it automatically increments the version number each time you update the item. Your update or delete requests succeed only if the client-side object version matches the corresponding version number of the item in the DynamoDB table.
ConditionalCheckFailedException is thrown if:
You use optimistic locking with #DynamoDBVersionAttribute and the version value on the server is different from the value on the client side.
You specify your own conditional constraints while saving data by using DynamoDBMapper with DynamoDBSaveExpression and these constraints failed.
Note
DynamoDB global tables use a “last writer wins” reconciliation between concurrent updates. If you use global tables, last writer policy wins. So in this case, the locking strategy does not work as expected.