Optimistic locking and re-try - optimistic-locking

I'm not sure about proper design of an approach.
We use optimistic locking using long incremented version placed on every entity. Each update of such entity is executed via compare-and-swap algorithm which just succeed or fail depending on whether some other client updates entity in the meantime or not. Classic optimistic locking as e.g. hibernate do.
We also need to adopt re-trying approach. We use http based storage (etcd) and it can happen that some update request is just timeouted.
And here it's the problem. How to combine optimistic locking and re-try. Here is the specific issue I'm facing.
Let say I have an entity having version=1 and I'm trying to update it. Next version is obviously 2. My client than executes conditional update. It's successfully executed only when the version in persistence is 1 and it's atomically updated to version=2. So far, so good.
Now, let say that a response for the update request does not arrive. It's impossible to say if it succeeded or not at this moment. The only thing I can do now is to re-try the update again. In memory entity still contains version=1 intending to update value to 2.
The real problem arise now. What if the second update fails because a version in persistence is 2 and not 1?
There is two possible reasons:
first request indeed caused the update - the operation was successful but the response got lost or my client timeout, whatever. It just did not arrived but it passed
some other client performed the update concurrently on the background
Now I can't say what is true. Did my client update the entity or some other client did? Did the operation passed or failed?
Current approach we use just compares persisted entity and the entity in main memory. Either as java equal or json content equality. If they are equal, the update methods is declared as successful. I'm not satisfied with the algorithm as it's not both cheap and reasonable for me.
Another possible approach is to do not use long version but timestamp instead. Every client generates own timestamp within the update operation in the meaning that potential concurrent client would generate other in high probability. The problem for me is the probability, especially when two concurrent updates would come from same machine.
Is there any other solution?

You can fake transactions in etcd by using a two-step protocol.
Algorithm for updating:
First phase: record the update to etcd
add an "update-lock" node with a fairly small TTL. If it exists, wait until it disappears and try again.
add a watchdog to your code. You MUST abort if performing the next steps takes longer than the lock's TTL (or if you fail to refresh it).
add a "update-plan" node with [old,new] values. Its structure is up to you, but you need to ensure that the old values are copied while you hold the lock.
add a "committed-update" node. At this point you have "atomically" updated the data.
Second phase: perform the actual update
read the "planned-update" node and apply the changes it describes.
If a change fails, verify that the new value is present.
If it's not, you have a major problem. Bail out.
delete the committed-update node
delete the update-plan node
delete the update-lock node
If you want to read consistent data:
While there is no committed-update node, your data are OK.
Otherwise, wait for it to get deleted.
Whenever committed-update is present but update-lock is not, initiate recovery.
Transaction recovery, if you find an update-plan node without a lock:
Get the update-lock.
if there is no committed-update node, delete the plan and release the lock.
Otherwise, continue at "Second phase", above.

IMHO, as etcd is built upon HTTP which is inherently an unsecure protocol, it will be very hard to have a bullet proof solution.
Classical SQL databases use connected protocols, transactions and journalisation to allow users to make sure that a transaction as a whole will be either fully committed or fully rollbacked, even in worst case of power outage in the middle of the operation.
So if 2 operations depend on each other (money transfert from one bank account to the other) you can make sure that either both are ok or none, and you can simply implement in the database a journal of "operations" with their status to be able to later see if a particuliar one was passed by consulting the journal, even if you were disconnected in the middle of the commit.
But I simply cannot imagine such a solution for etcd. So unless someone else finds a better way, you are left with two options
use a classical SQL database in the backend, using etcd (or equivalent) as a simple cache
accept the weaknesses of the protocol
BTW, I do not think that timestamp in lieue of long version number will strengthen the system, because in high load, the probability that two client transaction use same timestamp increases. Maybe you could try to add a unique id (client id or just technical uuid) to your fields, and when version is n+1 just compare the UUID that increased it : if it is yours, the transaction passed if not id did not.
But the really worse problem would arise if at the moment you can read the version, it is not at n+1 but already at n+2. If UUID is yours, you are sure your transaction passed, but if it is not nobody can say.

Related

DynamoDB use case handling

I am using dynamoDB for a project. I have a use case where I maintain timeline for objects i.e. start and end time for an object and start time for next object. New objects can be added in between two existing objects(o1 & o2) in which I will have to update start time for next object in o1 and start time for next object in new object as start time of o2. This can cause problem in case two new objects are being added in between two objects and would probably require transactions. Can someone suggest how this can be handled?
Update: My data model looks like this:
objectId(Hash Key), startTime(Sort Key), endTime, nextStartTime
1, 1, 5, 4
1, 4, 6, 8
1, 8, 10, 9
So, it's possible a new entry comes in whose start time is 5. So, in transaction I will have to update nextStartTime for second entry to 5 and insert a new entry after the second entry which contains nextStartTime as start time of third entry. During this another entry might come in which also has start time between second and third entry(say 7 for eg.). Now I want the two transactions to be isolated of each other. In traditional SQL DBs it would be possible as second entry would be locked for the duration of transaction but Dynamo doesn't lock the items. So, I am wondering if I use transaction would the two transactions protect the data integrity.
DynamoDB supports optimistic locking. This is achieved via conditional writes.
You can do it manually by introducing a version attribute or you can use the one provided (hopefully) by your SDK. Here is a link to AWS docs.
TLDR
two objects have to update the same timeline at the same time
one will succeed the other will fail with a specific error
you will have to retry the failing one
Dynamo also has transactions. However, they are limited to 25 elements and consume 2x capacity units. If you can get away with an optimistic lock go for it.
Hope this was helpful
Update with more info on transactions
From this doc
Error Handling for Writing Write transactions don't succeed under the
following circumstances:
When a condition in one of the condition expressions is not met.
When a transaction validation error occurs because more than one
action in the same TransactWriteItems operation targets the same item.
When a TransactWriteItems request conflicts with an ongoing
TransactWriteItems operation on one or more items in the
TransactWriteItems request. In this case, the request fails with a
TransactionCanceledException.
When there is an insufficient provisioned capacity for the transaction to
be completed.
When an item size becomes too large (larger than 400 KB), or a local
secondary index (LSI) becomes too large, or a similar validation error
occurs because of changes made by the transaction.
When there is a user error, such as an invalid data format.
They claim that if there are two ongoing transactions on the same item, one will fail.
Why store the nextStartTime in the item? The nextStartTime is simply the start time of the next item, right? Seems like it'd be much easier to just pull the item as well as the next item to get the full picture at read-time. With a Query you can do this in one call, and so long as items are less than 2 KB in size it wouldn't even consume more RCUs than a get item would.
Simpler design, no cost for transactional writes, no need to do extensive testing on thread safety.

Are DynamoDB conditional writes transactional?

I'm having trouble wrapping my head around the dichotomy of DDB providing Condition Writes but also being eventually consistent. These two truths seem to be at odds with each other.
In the classic scenario, user Bob updates key A and sets the value to "FOO". User Alice reads from a node that hasn't received the update yet, and so it gets the original value "BAR" for the same key.
If Bob and Alice write to different nodes on the cluster without condition checks, it's possible to have a conflict where Alice and Bob wrote to the same key concurrently and DDB does not know which update should be the latest. This conflict has to be resolved by the client on next read.
But what about when condition write are used?
User Bob sends their update for A as "FOO" if the existing value for A is "BAR".
User Alice sends their update for A as "BAZ" if the existing value for A is "BAR".
Locally each node can check to see if their node has the original "BAR" value and go through with the update. But the only way to know the true state of A across the cluster is to make a strongly consistent read across the cluster first. This strongly consistent read must be blocking for either Alice or Bob, or they could both make a strongly consistent read at the same time.
So here is where I'm getting confused about the nature of DDBs condition writes. It seems to me that either:
Condition writes are only evaluated locally. Merge conflicts can still occur.
Condition writes are evaluated cross cluster.
If it is #2, the only way I see that working is if:
A lock is created for the key.
A strongly consistent read is made.
Let's say it's #2. Now where does this leave Bob's update? The update was made to node 2 and sent to node 1 and we have a majority quorum. But to make those updates available to Alice when they do their own conditional write, those updates need to be flushed from WAL. So in a conditional write are the updates always flushed? Are writes always flushed in general?
There have been other questions like this here on SO but the answers were a repeat of, or a link to, the AWS documentation about this. The AWS documentation doesn't really explain this (or i missed it).
DynamoDB conditional writes are "transactional" writes but how they're done is not public information & is perhaps proprietary intellectual property.
DynamoDB developers are the only ones with this information.
Your issue is that you're looking at this from a node perspective - I have gone through every mention of node anywhere in DynamoDB documentation & it's just mentions of Node.js or DAX nodes not database nodes.
While there can be outdated reads - yes, that would indicate some form of node - there are no database nodes per such when doing conditional writes.
User Bob sends their update for A as "FOO" if the existing value for A is "BAR". User Alice sends their update for A as "BAZ" if the existing value for A is "BAR".
Whoever's request gets there first is the one that goes through first.
The next request will just fail, meaning you now need to make a new read request to obtain the latest value to then proceed with the 2nd later write.
The Amazon DynamoDB developer guide shows this very clearly.
Note that there are no nodes, replicas etc. - there is only 1 reference to the DynamoDB table:
Condition writes are probably evaluated cross-cluster & a strongly consistent read is probably made but Amazon has not made this information public.
Ermiya Eskandary is correct that the exact details of DynamoDB's implementation aren't public knowledge, and also subject to change in the future while preserving the documented guarantees of the API. Nevertheless, various documents and especially video presentations that the Amazon developers did in the past, made it relatively clear how this works under the hood - at least in broad strokes, and I'll try to explain my understanding here:
Note that this might not be 100% accurate, and I don't have any inside knowledge about DynamoDB.
DynamoDB keeps three copies of each item.
One of the nodes holding a copy of a specific item is designated the leader for this item (there isn't a single "leader" - it can be a different leader per item). As far as I know, we have no details on which protocol is used to choose this leader (of course, if nodes go down, the leader choice changes).
A write to an item is started on the leader, who serializes writes to the same item. Note how DynamoDB conditional updates can only read and update the same item, so the same node (the leader) can read and write the item with just a local lock. After the leader evaluates the codintion and decides to write, it also sends an update to the two other nodes - returning success to the user only after two of the three nodes successfully wrote the data (to ensure durablity).
As you probably know, DyanamoDB reads have two options consistent and eventually-consistent: An eventually-consistent read reads from one of the three replicas at random, and might not yet see the result of a successful write (if the write wrote two copies, but not yet the third one). A consistent read reads from the leader, so is guaranteed to read the previously-written data.
Finally you asked about DynamoDB's newer and more expensive "Transaction" support. This is a completely different feature. DynamoDB "Transactions" are all about reads and writes to multiple items in the same request. As I explained above, the older conditional-updates feature only allows a read-modify-write operation to involve a single item at a time, and therefore has a simpler implementation - where a single node (the leader) can serialize concurrent writes and make the decisions without a complex distributed algorithm (however, a complex distributed algorithm is needed to pick the leader).

Azure WebJobs - how to keep the state?

I have a need to implement some kind of orchestrator by a WebJob. So it needs to keep a state for some kind of internal queue.
Is there any way apart from static field and from using database to keep that state?
In general the idea is simple: I have a calculation job. It get's e.g. ProductId and is doing calculations for it. It takes some time, so when another message for same ProductId is coming I need to wait, until previous calculations will finish. But at the same time I can pick a message for another ProductId if there are no running calculations for that.
I haven't found any way to make sequential processing of messages based on a specific conditions. So end up with idea to implement a stateful orchestrator which will do the trick.
Am I doing it in a wrong way?

Dynamodb missing updates with concurrent requests?

I'm having trouble updating a single item many times at once. If I try to update an item with new attributes many times like so:
UpdateExpression: 'SET attribute.#uniqueId = :newAttribute'
not all of the updates go through. I tried sending 20 updates with unique ids and this resulted in only 15 new attributes. This also occurs in my local dynamodb instance. I assume that the updates are somehow overwriting each other in a "last update wins" scenario but I'm not sure. How can I solve this?
DynamoDB is eventually consistent on update, so "race conditions" are possible. If you want more strict logic in writes, take a look at transactions
Items are not locked during a transaction. DynamoDB transactions
provide serializable isolation. If an item is modified outside of a
transaction while the transaction is in progress, the transaction is
canceled and an exception is thrown with details about which item or
items caused the exception.
Your observation is very interesting, and contradicts observations made in the past in Are DynamoDB "set" values CDRTs? and Concurrent updates in DynamoDB, are there any guarantees? - in those issues people observed that concurrent writes to different set items or to different top-level attributes seem to not get overwritten. Neither case is exactly the same as what you tested (nested attributes), though, so it's not a definitive proof there was something wrong with your test, but it's still surprising.
Presentations made in the past by the DynamoDB developers suggested that in DynamoDB writes happen on a single node (the designated "leader" of the partition), and that this node can serialize the concurrent writes. This serialization is needed to allow conditional updates, counter increments, etc., to work safely with concurrent writes. Presumably, the same serialization could have also allowed multiple sub-attributes to be modified concurrently safely. If it doesn't, it might mean that this serialization is deliberately disabled for certain updates, perhaps all unconditional updates (without a ConditionExpression). This is very surprising, and should have been documented by Amazon...

Auto-increment on Azure Table Storage

I am currently developing an application for Azure Table Storage. In that application I have table which will have relatively few inserts (a couple of thousand/day) and the primary key of these entities will be used in another table, which will have billions of rows.
Therefore I am looking for a way to use an auto-incremented integer, instead of GUID, as primary key in the small table (since it will save lots of storage and scalability of the inserts is not really an issue).
There've been some discussions on the topic, e.g. on http://social.msdn.microsoft.com/Forums/en/windowsazure/thread/6b7d1ece-301b-44f1-85ab-eeb274349797.
However, since concurrency problems can be really hard to debug and spot, I am a bit uncomfortable with implementing this on own. My question is therefore if there is a well tested impelemntation of this?
For everyone who will find it in search, there is a better solution. Minimal time for table lock is 15 seconds - that's awful. Do not use it if you want to create a truly scalable solution. Use Etag!
Create one entity in table for ID (you can even name it as ID or whatever).
1) Read it.
2) Increment.
3) InsertOrUpdate WITH ETag specified (from the read query).
if last operation (InsertOrUpdate) succeeds, then you have a new, unique, auto-incremented ID. If it fails (exception with HttpStatusCode == 412), it means that some other client changed it. So, repeat again 1,2 and 3.
The usual time for Read+InsertOrUpdate is less than 200ms. My test utility with source on github.
See UniqueIdGenerator class by Josh Twist.
I haven't implemented this yet but am working on it ...
You could seed a queue with your next ids to use, then just pick them off the queue when you need them.
You need to keep a table to contain the value of the biggest number added to the queue. If you know you won't be using a ton of the integers, you could have a worker every so often wake up and make sure the queue still has integers in it. You could also have a used int queue the worker could check to keep an eye on usage.
You could also hook that worker up so if the queue was empty when your code needed an id (by chance) it could interupt the worker's nap to create more keys asap.
If that call failed you would need a way to (tell the worker you are going to do the work for them (lock), then do the workers work of getting the next id and unlock)
lock
get the last key created from the table
increment and save
unlock
then use the new value.
The solution I found that prevents duplicate ids and lets you autoincrement it is to
lock (lease) a blob and let that act as a logical gate.
Then read the value.
Write the incremented value
Release the lease
Use the value in your app/table
Then if your worker role were to crash during that process, then you would only have a missing ID in your store. IMHO that is better than duplicates.
Here is a code sample and more information on this approach from Steve Marx
If you really need to avoid guids, have you considered using something based on date/time and then leveraging partition keys to minimize the concurrency risk.
Your partition key could be by user, year, month, day, hour, etc and the row key could be the rest of the datetime at a small enough timespan to control concurrency.
Of course you have to ask yourself, at the price of date in Azure, if avoiding a Guid is really worth all of this extra effort (assuming a Guid will just work).