DynamoDB Concurrency Issue - amazon-web-services

DynamoDB Concurrency Issue - amazon-web-services

I'm building a system in which many DynamoDB (NoSQL) tables all contain data and data in one table accesses data in another table.
Multiple processes are accessing the same item in a table at the same time. I want to ensure that all of the processes have updated data and aren't trying to access that item at the exact same time because they are all updating the item with different data.
I would love some suggestions on this as I am stuck right now and don't know what to do. Thanks in advance!

Optimistic locking is a strategy to ensure that the client-side item that you are updating (or deleting) is the same as the item in Amazon DynamoDB. If you use this strategy, your database writes are protected from being overwritten by the writes of others, and vice versa.
With optimistic locking, each item has an attribute that acts as a version number. If you retrieve an item from a table, the application records the version number of that item. You can update the item, but only if the version number on the server side has not changed. If there is a version mismatch, it means that someone else has modified the item before you did. The update attempt fails, because you have a stale version of the item. If this happens, you simply try again by retrieving the item and then trying to update it. Optimistic locking prevents you from accidentally overwriting changes that were made by others. It also prevents others from accidentally overwriting your changes.
To support optimistic locking, the AWS SDK for Java provides the #DynamoDBVersionAttribute annotation. In the mapping class for your table, you designate one property to store the version number, and mark it using this annotation. When you save an object, the corresponding item in the DynamoDB table will have an attribute that stores the version number. The DynamoDBMapper assigns a version number when you first save the object, and it automatically increments the version number each time you update the item. Your update or delete requests succeed only if the client-side object version matches the corresponding version number of the item in the DynamoDB table.
ConditionalCheckFailedException is thrown if:
You use optimistic locking with #DynamoDBVersionAttribute and the version value on the server is different from the value on the client side.
You specify your own conditional constraints while saving data by using DynamoDBMapper with DynamoDBSaveExpression and these constraints failed.
Note
DynamoDB global tables use a “last writer wins” reconciliation between concurrent updates. If you use global tables, last writer policy wins. So in this case, the locking strategy does not work as expected.

Related

DynamoDB version control using sort keys

Anyone that has implemented versioning using sort keys as stated in https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-sort-keys.html?
Trying to implement this using typescript for building a database with versions of the items. Is there any way of doing this using updateItem or is it a get + put operation needed?
Any sample to get me started or help is much appreciated!

The concept of versioning using sort key involves the creation of a completely new item that uses same Partition Key and different Sort Key.
DynamoDB offers some operations that allow to update values within an object in an atomic way, this use case is perfect for when you have something like a counter or a quantity and you want to decrease/increase it without having to read its value first. - Docs here.
In the case you're trying to achieve, as mentioned, you are essentially creating a new object. DynamoDB, by itself, doesn't have any concept of versioning and what this pattern does is to cleverly leverage the relation between Partition Key and Sort Key and the fact that a PK can have multiple SK associated with it, to correlate multiple rows of the same table.
To answer your question, if your only source of truth (or data store) is DynamoDB, then yes, your client will have to first query the table to know which was the last version of the item being updated and then insert the new version.
In case you are recording this information elsewhere and are using DynamoDB only to store these versions, then no, one put operation will be enough but again, this assumes you can retrieve this info somewhere else.
In terms of samples, the official documentation of the AWS SDK is always a good start, in your case I assume you'll want to use the Javascript one which you can find here.
At a very high level, you'll have to do the following:
Create an AWS.DynamoDB() client.
Execute a query using the dynamodb.query() method and specifying the PK of the item you want to update.
Go through the items (rows) returned from the previous query and find the one with the bigger version number as SK.
Put a new item using the dynamodb.putItem() method passing an item with the incremented version number as SK and same PK.

You can do the technique described by Amazon with a read and then a write, or more accurately, a read and then two writes (since they want to update both v0 and a new v4!). Often, you need the extra read because you want to build the new version v4 based on data you read from v3 (or equivalently, v0) - but in case you don't need that, the read is not necessary, and two writes are enough:
You first do an UpdateItem to v0 which increments the "Latest" attribute, sets whatever attributes you want to set in the new version, and uses the ReturnValues parameter to ask the update operation to return the new "Latest" attribute.
Then you write with PutItem the new row for v4 (where 4 is the "Latest" you just read).
This approach is safe in the sense that if two clients try to create two new versions at the same time, each one will pick a different "Latest", and both will appear on the version histories. However, it is not safe in the sense that if the client dies between step 1 and 2, you'll have a "hole" in the version history. However, I don't think there's any implementation of this technique that doesn't suffer from this problem.
After saying this, I want to reiterate what I said in the first paragraph: In most realistic use cases, the new version would be based on the old version, so your code anyway needs to read the old version first, then decide how to change it - and then write it (twice). You can't avoid the read in these cases. By the way, in this case the first write (to v0) would be a conditional update to verify that you only write the new version if the old version is still the same one ("Latest" is the same one you read during the read) - otherwise you'd be basing your modification on a non-current version. This is an example of optimistic locking.

Dynamodb missing updates with concurrent requests?

I'm having trouble updating a single item many times at once. If I try to update an item with new attributes many times like so:
UpdateExpression: 'SET attribute.#uniqueId = :newAttribute'
not all of the updates go through. I tried sending 20 updates with unique ids and this resulted in only 15 new attributes. This also occurs in my local dynamodb instance. I assume that the updates are somehow overwriting each other in a "last update wins" scenario but I'm not sure. How can I solve this?

DynamoDB is eventually consistent on update, so "race conditions" are possible. If you want more strict logic in writes, take a look at transactions
Items are not locked during a transaction. DynamoDB transactions
provide serializable isolation. If an item is modified outside of a
transaction while the transaction is in progress, the transaction is
canceled and an exception is thrown with details about which item or
items caused the exception.

Your observation is very interesting, and contradicts observations made in the past in Are DynamoDB "set" values CDRTs? and Concurrent updates in DynamoDB, are there any guarantees? - in those issues people observed that concurrent writes to different set items or to different top-level attributes seem to not get overwritten. Neither case is exactly the same as what you tested (nested attributes), though, so it's not a definitive proof there was something wrong with your test, but it's still surprising.
Presentations made in the past by the DynamoDB developers suggested that in DynamoDB writes happen on a single node (the designated "leader" of the partition), and that this node can serialize the concurrent writes. This serialization is needed to allow conditional updates, counter increments, etc., to work safely with concurrent writes. Presumably, the same serialization could have also allowed multiple sub-attributes to be modified concurrently safely. If it doesn't, it might mean that this serialization is deliberately disabled for certain updates, perhaps all unconditional updates (without a ConditionExpression). This is very surprising, and should have been documented by Amazon...

How snowflake internally performs updates?

As far as I know, underlying files (columnar format) is immutable. My question is, if files are immutable, how the updates are being performed. Do Snowflake maintains different versions of the same row, and returns the latest version based on key? or it inserts the data into new files behind the scene and deletes old files? How performance gets affected in these scenarios (querying current data), if time travel is set to 90 days as Snowflake need to maintain different version of the same row. But as Snowflake doesn't respect keys, how even different versions are detected. Any insights (document/video) on the detailed internals is appreciated.

It's a complex question, but a basic ideas are as follows (quite a bit simplified):
records are stored in immutable micro-partitions on S3
a table is a list of micro-partitions
when a record is modified
its old micro-partition is marked as inactive (from that moment),
a new micro-partition is created, containing the modified record, but also other records from that micro-partition.
the new micro-partition is added to the table's list (marked as active from that moment)
inactive micro-partitions are not deleted for some time, allowing time-travel
So Snowflake doesn't need a record key, as each record is stored in only one file active at a given time.
The impact of performing updates on querying is marginal, the only visible impact might be that the files need to be fetched from S3 and cached on the warehouses.
For more info, I'd suggest going to Snowflake forums and asking there.

DynamoDB ConsistentRead for Global Indexes

I have next table structure:
ID string `dynamodbav:"id,omitempty"`
Type string `dynamodbav:"type,omitempty"`
Value string `dynamodbav:"value,omitempty"`
Token string `dynamodbav:"token,omitempty"`
Status int `dynamodbav:"status,omitempty"`
ActionID string `dynamodbav:"action_id,omitempty"`
CreatedAt time.Time `dynamodbav:"created_at,omitempty"`
UpdatedAt time.Time `dynamodbav:"updated_at,omitempty"`
ValidationToken string `dynamodbav:"validation_token,omitempty"`
and I have 2 Global Secondary Indexes for Value(ValueIndex) filed and Token(TokenIndex) field. Later somewhere in the internal logic I perform the Update of this entity and immediate read of this entity by one of this indexes(ValueIndex or TokenIndex) and I see the expected problem that data is not ready(I mean not yet updated). I can't use ConsistentRead for this cases, because this is Global Secondary Index and it doesn't support this options. As a result I can't run my load tests over this logic, because data is not ready when tests go in 10-20-30 threads. So my question - is it possible to solve this problem somewhere? or should I reorganize my table and split it to 2-3 different tables and move filed like Value, Token to HASH key or SORT key?

GSIs are updated asynchronously from the table they are indexing. The updates to a GSI typically occur in well under a second. So, if you're after immediate read of a GSI after insert / update / delete, then there is the potential to get stale data. This is how GSIs work - nothing you can do about that. However, you need to be really mindful of three things:
Make sure you keep your GSI lean - that is, only project the absolute minimum attributes that you need. Less data to write will make it quicker.
Ensure that your GSIs have the correct provisioned throughput. If it doesn't, it may not be able to keep up with activity in the table and therefore you'll get long delays in the GSI being kept in sync.
If an update causes the keys in the GSI to be updated, you'll need 2 units of throughput provisioned per update. In essence, DynamoDB will delete the item then insert a new item with the keys updated. So, even though your table has 100 provisioned writes, if every single write causes an update to your GSI key, you'll need to provision 200 write units.
Once you've tuned your DynamoDB setup and you still absolutely cannot handle the brief delay in GSIs, you'll probably need to use different technology. For example, even if you decided to split your table into multiple tables, it'll have the same (if not worse) impact. You'll update one table, then try to read the data from another table and you haven't yet inserted the values into a different table.
I suspect that once you tune DynamoDB for your situation, you'll get pretty damn close you what you want.

Dynamo DB Optimistic Locking Behavior during Save Action

Scenario: We have a Dynamo DB table supporting Optimistic Locking with Version Number. Two concurrent threads are trying to save two different entries with the same primary key value to that Table.
Question: Will ConditionalCheckFailedException be thrown for the latter save action?

Yes, the second thread which tries to insert the same data would throw ConditionalCheckFailedException.
com.amazonaws.services.dynamodbv2.model.ConditionalCheckFailedException
As soon as the item is saved in database, the subsequent updates should have the version matching with the value on DynamoDB table (i.e. server side value).
save — For a new item, the DynamoDBMapper assigns an initial version
number 1. If you retrieve an item, update one or more of its
properties and attempt to save the changes, the save operation
succeeds only if the version number on the client-side and the
server-side match. The DynamoDBMapper increments the version number
automatically.

We had a similar use case in past but in our case, multiple threads reading first from the dynamoDB and then trying to update the values.
So finally there will be change in version by the time they read and they try to update the document and if you don't read the latest value from the DynamoDB then intermediate update will be lost(which is known as update loss issue refer aws-docs for more info).
I am not sure, if you have this use-case or not but if you have simply 2 threads trying to update the value and then if one of them get different version while their request reached to DynamoDB then you will get ConditionalCheckFailedException exception.
More info about this error can be found here http://grepcode.com/file/repo1.maven.org/maven2/com.michelboudreau/alternator/0.10.0/com/amazonaws/services/dynamodb/model/ConditionalCheckFailedException.java

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js