Consistent implementation of modifying the same item in DynamoDB table by Multiple machines - multiple threads - amazon-web-services

I have an item (number) in DynamoDB table. This value is read in a service, incremented and updated back to the table. There are multiple machines with multiple threads doing this simultaneously.
My problem here is to be able to read the correct consistent value, and update with the correct value.
I tried doing the increment and update in a java synchronized block.
However, I still noticed inconsistencies in the count in the end. It doesn't seem to be updating in a consistent manner.

"My problem here is to be able to read the correct consistent value, and update with the correct value."
To read/write the correct consistent value
Read Consistency in dynamodb (you can set it in your query as ConsistentRead parameter):
There are two types of read.
Eventually Consistent Read: if you read data after changes in table, that might be stale and should wait a bit to be consistent.
Strongly Consistent Data: it returns most up-to-date data, so, should not be worried about the stale data
ConditionExpression (specify in your query):
in your query you can specify that update the value if some conditions are true (for example current value in db is the same as the value you read before. meaning no one updated it in between) otherwise it returns ConditionalCheckFailedException and you need to handle it in your code to redo, ...
So, to answer your question, first you need to ready strongly consistent to get the current counter value in db. Then, to update it, your query should look like this (removed unnecessary parameters) and you should handle ConditionalCheckFailedException in your code:
"TableName": "counters",
"ReturnValues": "UPDATED_NEW",
"ExpressionAttributeValues": {
":a": currentValue,
":bb": newValue
},
"ExpressionAttributeNames": {
"#currentValue": "currentValue"
},
**// current value is what you ve read
// by Strongly Consistent **
"ConditionExpression": "(#currentValue = :a)",
"UpdateExpression": "SET #currentValue = :bb", // new counter value

With every record store a uuid (long random string) sort of value, whenever you are trying to update the record send with update request which should update only if uuid is equal the the value you read. And update the uuid value.
synchronised block will not work if you are trying to write from multiple machines together.

Related

Map reduce returns old data

I have the following map:
from doc in docs
select new {Name = doc.Name, Count = 1}
reduce
from result in results
group result by new {result.Name}
into g
select new {
Name = g.Key.Name,
Count = Enumerable.Sum(g, x => ((int) x.Count))
}
If I put a lock on the index folder and then save a document and then delete the document and resave the document to trigger a reindex the old document still appears in the index query results despite the index being reported as up to date. The last indexed date is also older than the date the document was updated so therefore the index should not contain any old results.
Any ideas what's going on? This is actually part of a large problem I've discovered on a production system. I'm not clear why it's happening but I've been able to reproduce a similar situation by locking the index so I suspect there's some process causing the lock. It means the index results return projections that are old.
How can I get the reduce to filter out results that are old?
If you disabled the index and the documents are updated/deleted. You'll get outdated results from the map-reduce index. This can happen even when the index isn't disabled.
The reason is that indexes are eventual consistent. You can read about it here:
https://ravendb.net/docs/article-page/3.5/Csharp/users-issues/understanding-eventual-consistency
You can use WaitForNonStaleResultsAsOfLastWrite:
https://ravendb.net/docs/article-page/2.5/Csharp/client-api/querying/stale-indexes#setting-cut-off-point
What you're describing is stale indexes: you update/create/delete a document and immediately queried for the document, but the query returns stale results.
The recommended way to fix this is by calling .WaitForIndexesAfterSaveChanges() during your create/update/delete calls:
// Inform Raven you'll wait for indexes when calling .SaveChanges
session.Advanced.WaitForIndexesAfterSaveChanges(
timeout: TimeSpan.FromSeconds(30),
throwOnTimeout: false);
// Do your update.
session.Store(new Employee
{
FirstName = "John",
LastName = "Doe"
});
// This won't return until affected indexes are updated.
session.SaveChanges();
// Now you can run a query against your index, and it will return the updated data.
...
This way, .SaveChanges will block until the indexes are updated. Run your query immediately after .SaveChanges and you'll see the updated results as expected.

DynamoDB Concurrency Issue

I'm building a system in which many DynamoDB (NoSQL) tables all contain data and data in one table accesses data in another table.
Multiple processes are accessing the same item in a table at the same time. I want to ensure that all of the processes have updated data and aren't trying to access that item at the exact same time because they are all updating the item with different data.
I would love some suggestions on this as I am stuck right now and don't know what to do. Thanks in advance!
Optimistic locking is a strategy to ensure that the client-side item that you are updating (or deleting) is the same as the item in Amazon DynamoDB. If you use this strategy, your database writes are protected from being overwritten by the writes of others, and vice versa.
With optimistic locking, each item has an attribute that acts as a version number. If you retrieve an item from a table, the application records the version number of that item. You can update the item, but only if the version number on the server side has not changed. If there is a version mismatch, it means that someone else has modified the item before you did. The update attempt fails, because you have a stale version of the item. If this happens, you simply try again by retrieving the item and then trying to update it. Optimistic locking prevents you from accidentally overwriting changes that were made by others. It also prevents others from accidentally overwriting your changes.
To support optimistic locking, the AWS SDK for Java provides the #DynamoDBVersionAttribute annotation. In the mapping class for your table, you designate one property to store the version number, and mark it using this annotation. When you save an object, the corresponding item in the DynamoDB table will have an attribute that stores the version number. The DynamoDBMapper assigns a version number when you first save the object, and it automatically increments the version number each time you update the item. Your update or delete requests succeed only if the client-side object version matches the corresponding version number of the item in the DynamoDB table.
ConditionalCheckFailedException is thrown if:
You use optimistic locking with #DynamoDBVersionAttribute and the version value on the server is different from the value on the client side.
You specify your own conditional constraints while saving data by using DynamoDBMapper with DynamoDBSaveExpression and these constraints failed.
Note
DynamoDB global tables use a “last writer wins” reconciliation between concurrent updates. If you use global tables, last writer policy wins. So in this case, the locking strategy does not work as expected.

Graceful handling of ConditionCheckFailedException with DynamoDB

When using ConditionExpression in a DynamoDB request, and the condition is not met, the entire request will fail. I am using conditional updates, and the fact that ConditionCheckFailedException doesn't contain any information about which condition failed is giving me a hard time.
For example consider this scenario: There's an item in a table like this:
{
state: 'ONGOING'
foo: 'FOO',
bar: 'BAR'
}
I then want to update this item, changing both foo and state:
ExpressionAttributeValues: {
:STATE_FINISHED: 'FINISHED',
:FOO: 'NEW FOO'
},
UpdateExpression: 'SET state=:STATE_FINISHED, foo=:FOO',
However, my application has a logical transition order of states, and to prevent concurrency issues where two requests concurrently modify an item and causing an inconsistent state, I add a condition to make sure only valid transitions of state are accepted:
ExpressionAttributeValues: {
:STATE_ONGOING: 'ONGOING'
},
ConditionExpression: 'state = :STATE_ONGOING'
This e.g. prevents two concurrent requests from modifying state into FINISHED and CANCELLED at the same time.
This is all fine when there's only one condition; if the request fails I know it was because an invalid state transition and I can choose whether to just fail the request, or to make a new request that only modifies FOO, whatever makes sense in my application. But if I have multiple conditions in one request, it seems impossible to find out which particular condition failed, which means I need to fail the entire request or divide it into multiple separate requests, updating one conditional value at a time. This can however raise new concurrency issues.
Has anyone found a decent solution to a similar problem?
Ideally what I'd want is to be able to make a UpdateExpression that modifies a certain attribute conditionally, otherwise ignoring it, or by using a custom function that returns the new value based on the old value and the suggested updated value, similar to an SQL UPDATE with an embedded SELECT .. CASE .... Is anything like this possible?
Or, is it at least possible to get more information out of a ConditionalCheckFailedException (such as which particular condition failed)?
As you mentioned, the DynamoDB doesn't provide granular level error message if there are multiple fields available on ConditionalExpression. I am not addressing this part of the question in my answer.
I would like to address the second part i.e. returning the old/new value.
The ReturnValues parameter can be used to get the desired values based on your requirement. You can set one of these values to get the required values.
New value - should already be available
Old value - To get old value, you can either use UPDATED_OLD or ALL_OLD
ReturnValues: NONE | ALL_OLD | UPDATED_OLD | ALL_NEW | UPDATED_NEW,
ALL_OLD - Returns all of the attributes of the item, as they appeared
before the UpdateItem operation.
UPDATED_OLD - Returns only the
updated attributes, as they appeared before the UpdateItem operation.
ALL_NEW - Returns all of the attributes of the item, as they appear
after the UpdateItem operation.
UPDATED_NEW - Returns only the updated
attributes, as they appear after the UpdateItem operation.

Dynamo DB Optimistic Locking Behavior during Save Action

Scenario: We have a Dynamo DB table supporting Optimistic Locking with Version Number. Two concurrent threads are trying to save two different entries with the same primary key value to that Table.
Question: Will ConditionalCheckFailedException be thrown for the latter save action?
Yes, the second thread which tries to insert the same data would throw ConditionalCheckFailedException.
com.amazonaws.services.dynamodbv2.model.ConditionalCheckFailedException
As soon as the item is saved in database, the subsequent updates should have the version matching with the value on DynamoDB table (i.e. server side value).
save — For a new item, the DynamoDBMapper assigns an initial version
number 1. If you retrieve an item, update one or more of its
properties and attempt to save the changes, the save operation
succeeds only if the version number on the client-side and the
server-side match. The DynamoDBMapper increments the version number
automatically.
We had a similar use case in past but in our case, multiple threads reading first from the dynamoDB and then trying to update the values.
So finally there will be change in version by the time they read and they try to update the document and if you don't read the latest value from the DynamoDB then intermediate update will be lost(which is known as update loss issue refer aws-docs for more info).
I am not sure, if you have this use-case or not but if you have simply 2 threads trying to update the value and then if one of them get different version while their request reached to DynamoDB then you will get ConditionalCheckFailedException exception.
More info about this error can be found here http://grepcode.com/file/repo1.maven.org/maven2/com.michelboudreau/alternator/0.10.0/com/amazonaws/services/dynamodb/model/ConditionalCheckFailedException.java

How to update multiple items in a DynamoDB table at once

I'm using DynamoDB and I need to update a specific attribute on multiple records. Writing my requirement in pseudo-language I would like to do an update that says "update table persons set relationshipStatus = 'married' where personKey IN (key1, key2, key3, ...)" (assuming that personKey is the KEY in my DynamoDB table).
In other words, I want to do an update with an IN-clause, or I suppose one could call it a batch update. I have found this link that asks explicitly if an operation like a batch update exists and the answer there is that it does not. It does not mention IN-clauses, however. The documentation shows that IN-clauses are supported in ConditionalExpressions (100 values can be supplied at a time). However, I am not sure if such an IN-clause is suitable for my situation because I still need to supply a mandatory KEY attribute (which expects a single value it seems - I might be wrong) and I am worried that it will do a full table scan for each update.
So my question is: how do I achieve an update on multiple DynamoDB records at the same time? At the moment it almost looks like I will have to call an update statement for each Key one-by-one and that just feels really wrong...
As you noted, DynamoDB does not support a batch update operation. You would need to query for, and obtain the keys for all the records you want to update. Then loop through that list, updating each item one at a time.
You can use TransactWriteItems action to update multiple records in DynamoDB table.
The official documentation available here, also you can see TransactWriteItems javascript/nodejs example here.
I don't know if it has changed since the answer was given but it's possible now
See the docs:
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_BatchWriteItem.html
I have used it like this in javascript (mapping the new blocks to an array of objects with the wanted structure:
let params = {}
let tableName = 'Blocks';
params.RequestItems[tableName] = _.map(newBlocks, block => {
return {
PutRequest: {
Item: {
'org_id': orgId,
'block_id': block.block_id,
'block_text': block.block_text
},
ConditionExpression: 'org_id <> :orgId AND block_id <> :block_id',
ExpressionAttributeValues: {
':orgId': orgId,
':block_id': block.block_id
}
}
}
})
docClient.batchWrite(params, function(err, data) {
.... and do stuff with the result
You can even mix puts and deletes
And if your using dynogels (you cant mix em due to dynogels support but what you can do is for updating (use create because behind the scenes it casts to the batchWrite function as put's)
var item1 = {email: 'foo1#example.com', name: 'Foo 1', age: 10};
var item2 = {email: 'foo2#example.com', name: 'Foo 2', age: 20};
var item3 = {email: 'foo3#example.com', name: 'Foo 3', age: 30};
Account.create([item1, item2, item3], function (err, acccounts) {
console.log('created 3 accounts in DynamoDB', accounts);
});
Note this from DynamoDB limitations (from the docs):
The BatchWriteItem operation puts or deletes multiple items in one or more tables. A single call to BatchWriteItem can write up to 16 MB of data, which can comprise as many as 25 put or delete requests. Individual items to be written can be as large as 400 KB.
If i remember correctly i think dynogels is chunking the requests into chucks of 25 before sending them off and then collecting them in one promise and returns (though im not 100% certain on this) - otherwise a wrapper function would be pretty simple to assemble
DynamoDb is not designed as relational DB to support the native transaction. It is better to design the schema to avoid the situation of multiple updates at the first place. Or if it is not practical in your case, please keep in mind you may improve it when restructuring the design.
The only way to update multiple items at the same time is use TransactionWrite operation provided by DynamoDB. But it comes with a limitation (25 at most for example). So keep in mind with that, you probable should do some limitation in your application as well. In spite of being very costly (because of the implementation involving some consensus algorithm), it is still mush faster than a simple loop. And it gives you ACID property, which is probably we need most. Think of a situation using loop, if one of the updates fails, how do you deal with the failure? Is it possible to rollback all changes without causing some race condition? Are the updates idempotent? It really depends on the nature of your application of cause. Be careful.
Another option is to use the thread pool to do the network I/O job, which can definitely save a lot of time, but it also has the same failure-and-rollback issue to think about.