WF service 4.5 correlation: Either create new instance OR retrieve existing - appfabric

Is it possible to define a correlation (in a WF 4.5 service) in a such a way that the first operation (a Receive activity) in the WF service EITHER creates a new workflow instance (if the correlation criteria hasn't been received yet) OR retrieves an existing workflow instance (when the correlation criteria has already been received earlier) ?
When would the above be useful?
When there's an "EnqueueItem" operation that groups items based on some properties of each item (correlation criteria) and must create a NEW group each time that a new value-combination (correlation criteria) of the item properties is received.

I didn't find a way to define the operation in such a way (as described in the original question), so I implemented it differently.
Instead of having one operation that either starts new or retrieves existing, there are two operations, one being the 'EnqueueItem' (can be called many times until a criteria is met) and the other 'StartGroup' (can be called only once to start the WF instance).
When the client attempts to enqueue an item, it catches an expected InstanceNotFoundException (if the group has not yet been created) and calls 'StartGroup' in this case.

Related

Dealing with read eventual consistency by retrying GetItem

I building an API #1 that creates an item in DynamoDB. I'm building another API #2 that retrieves an item using GSI (input key may not exist). But GSI reads can only be eventually consistent, and I don't want the scenario where API #1 creates an item but API #2 doesn't get that item.
So I am thinking of this:
API #1 creates item via UpdateItem
API #1 tries to retrieve item using GSI via GetItem. Keeps retrying with exponential backoff until it gets the item. Once this happens, eventual consistency should be over.
API #2 retrieves item using same GSI as above via GetItem. Since API #1 already got the item, this should get the item on first try.
Note: I don't think API #2 can do the GetItem retries instead because its input key may not ever exist.
Would this work? Are there better solutions?
The property you are looking for is known in literature as monotonic read consistency - it's eventual consistency (after enough time you'll always read the new value), but additionally - when you read the new value once, further reads will not return the older value.
I couldn't find (and I tried to look hard...) any documentation guaranteeing that DynamoDB eventually-consistent reads have monotonic read consistency. Based on presentations I saw on DynamoDB's implementation (I don't have any inside knowledge), I believe that it in fact does not have monotonic read consistency:
From what I understood in those presentations, DynamoDB saves each piece of data on three nodes. One of the three nodes is the "leader" (for this piece of data) and writes go to it - and so do consistent reads. But eventually-consistent reads will go to one of the three nodes at random. So the following scenario is possible:
A write is supposed to update three copies of the GSI on three nodes - X, Y and Z - but at this point only X and Y were updated, Z wasn't yet.
API 1 reads from the GSI and randomly gets to ask node X and gets the new value.
Now API 2 reads from the GSI. It randomly gets node Z, and gets the old value!
So it will be possible that after your application finds the new value, another read will not find it :-(
If someone else can find better documentation for this issue than just my "what I understood from presentations" I'd love to read their answer too.

DynamoDB Concurrency Issue

I'm building a system in which many DynamoDB (NoSQL) tables all contain data and data in one table accesses data in another table.
Multiple processes are accessing the same item in a table at the same time. I want to ensure that all of the processes have updated data and aren't trying to access that item at the exact same time because they are all updating the item with different data.
I would love some suggestions on this as I am stuck right now and don't know what to do. Thanks in advance!
Optimistic locking is a strategy to ensure that the client-side item that you are updating (or deleting) is the same as the item in Amazon DynamoDB. If you use this strategy, your database writes are protected from being overwritten by the writes of others, and vice versa.
With optimistic locking, each item has an attribute that acts as a version number. If you retrieve an item from a table, the application records the version number of that item. You can update the item, but only if the version number on the server side has not changed. If there is a version mismatch, it means that someone else has modified the item before you did. The update attempt fails, because you have a stale version of the item. If this happens, you simply try again by retrieving the item and then trying to update it. Optimistic locking prevents you from accidentally overwriting changes that were made by others. It also prevents others from accidentally overwriting your changes.
To support optimistic locking, the AWS SDK for Java provides the #DynamoDBVersionAttribute annotation. In the mapping class for your table, you designate one property to store the version number, and mark it using this annotation. When you save an object, the corresponding item in the DynamoDB table will have an attribute that stores the version number. The DynamoDBMapper assigns a version number when you first save the object, and it automatically increments the version number each time you update the item. Your update or delete requests succeed only if the client-side object version matches the corresponding version number of the item in the DynamoDB table.
ConditionalCheckFailedException is thrown if:
You use optimistic locking with #DynamoDBVersionAttribute and the version value on the server is different from the value on the client side.
You specify your own conditional constraints while saving data by using DynamoDBMapper with DynamoDBSaveExpression and these constraints failed.
Note
DynamoDB global tables use a “last writer wins” reconciliation between concurrent updates. If you use global tables, last writer policy wins. So in this case, the locking strategy does not work as expected.

Dynamo DB Optimistic Locking Behavior during Save Action

Scenario: We have a Dynamo DB table supporting Optimistic Locking with Version Number. Two concurrent threads are trying to save two different entries with the same primary key value to that Table.
Question: Will ConditionalCheckFailedException be thrown for the latter save action?
Yes, the second thread which tries to insert the same data would throw ConditionalCheckFailedException.
com.amazonaws.services.dynamodbv2.model.ConditionalCheckFailedException
As soon as the item is saved in database, the subsequent updates should have the version matching with the value on DynamoDB table (i.e. server side value).
save — For a new item, the DynamoDBMapper assigns an initial version
number 1. If you retrieve an item, update one or more of its
properties and attempt to save the changes, the save operation
succeeds only if the version number on the client-side and the
server-side match. The DynamoDBMapper increments the version number
automatically.
We had a similar use case in past but in our case, multiple threads reading first from the dynamoDB and then trying to update the values.
So finally there will be change in version by the time they read and they try to update the document and if you don't read the latest value from the DynamoDB then intermediate update will be lost(which is known as update loss issue refer aws-docs for more info).
I am not sure, if you have this use-case or not but if you have simply 2 threads trying to update the value and then if one of them get different version while their request reached to DynamoDB then you will get ConditionalCheckFailedException exception.
More info about this error can be found here http://grepcode.com/file/repo1.maven.org/maven2/com.michelboudreau/alternator/0.10.0/com/amazonaws/services/dynamodb/model/ConditionalCheckFailedException.java

Amazon DynamoDB Mapper - limits to batch operations

I am trying to write a huge number of records into a dynamoDB and I would like to know what is the correct way of doing that. Currently, I am using the DynamoDBMapper to do the job in a one batchWrite operation but after reading the documentation, I am not sure if this is the correct way (especially if there are some limits concerning the size and number of the written items).
Let's say, that I have an ArrayList with 10000 records and I am saving it like this:
mapper.batchWrite(recordsToSave, new ArrayList<BillingRecord>());
The first argument is the list with records to be written and the second one contains items to be deleted (no such items in this case).
Does the mapper split this write into multiple writes and handle the limits or should it be handled explicitly?
I have only found examples with batchWrite done with the AmazonDynamoDB client directly (like THIS one). Is using the client directly for the batch operations the correct way? If so, what is the point of having a mapper?
Does the mapper split your list of objects into multiple batches and then write each batch separately? Yes, it does batching for you and you can see that it splits the items to be written into batches of up to 25 items here. It then tries writing each batch and some of the items in each batch can fail. An example of a failure is given in the mapper documentation:
This method fails to save the batch if the size of an individual object in the batch exceeds 400 KB. For more information on batch restrictions see, http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_BatchWriteItem.html
The example is talking about the size of one record (one BillingRecord instance in your case) exceeding 400KB, which at the time of writing this answer, is the maximum size of a record in DynamoDB.
In the case a particular batch fails, it moves on to the next batch (sleeping the thread for a bit in case the failure was because of throttling). In the end, all of the failed batches are returned in List of FailedBatch instances. Each FailedBatch instance contains a list of unprocessed items that weren't written to DynamoDB.
Is the snippet that you provided the correct way for doing batch writes? I can think of two suggestions. The BatchSave method is more appropriate if you have no items to delete. You might also want to think about what you want to do with the failed batches.
Is using the client directly the correct way? If so, what is the point of the mapper? The mapper is simply a wrapper around the client. The mapper provides you an ORM layer to convert your BillingRecord instances into the sort-of nested hash maps that the low-level client works with. There is nothing wrong with using the client directly and this does tend to happen in some special cases where additional functionality needed needs to be coded outside of the mapper.

Optimistic locking and re-try

I'm not sure about proper design of an approach.
We use optimistic locking using long incremented version placed on every entity. Each update of such entity is executed via compare-and-swap algorithm which just succeed or fail depending on whether some other client updates entity in the meantime or not. Classic optimistic locking as e.g. hibernate do.
We also need to adopt re-trying approach. We use http based storage (etcd) and it can happen that some update request is just timeouted.
And here it's the problem. How to combine optimistic locking and re-try. Here is the specific issue I'm facing.
Let say I have an entity having version=1 and I'm trying to update it. Next version is obviously 2. My client than executes conditional update. It's successfully executed only when the version in persistence is 1 and it's atomically updated to version=2. So far, so good.
Now, let say that a response for the update request does not arrive. It's impossible to say if it succeeded or not at this moment. The only thing I can do now is to re-try the update again. In memory entity still contains version=1 intending to update value to 2.
The real problem arise now. What if the second update fails because a version in persistence is 2 and not 1?
There is two possible reasons:
first request indeed caused the update - the operation was successful but the response got lost or my client timeout, whatever. It just did not arrived but it passed
some other client performed the update concurrently on the background
Now I can't say what is true. Did my client update the entity or some other client did? Did the operation passed or failed?
Current approach we use just compares persisted entity and the entity in main memory. Either as java equal or json content equality. If they are equal, the update methods is declared as successful. I'm not satisfied with the algorithm as it's not both cheap and reasonable for me.
Another possible approach is to do not use long version but timestamp instead. Every client generates own timestamp within the update operation in the meaning that potential concurrent client would generate other in high probability. The problem for me is the probability, especially when two concurrent updates would come from same machine.
Is there any other solution?
You can fake transactions in etcd by using a two-step protocol.
Algorithm for updating:
First phase: record the update to etcd
add an "update-lock" node with a fairly small TTL. If it exists, wait until it disappears and try again.
add a watchdog to your code. You MUST abort if performing the next steps takes longer than the lock's TTL (or if you fail to refresh it).
add a "update-plan" node with [old,new] values. Its structure is up to you, but you need to ensure that the old values are copied while you hold the lock.
add a "committed-update" node. At this point you have "atomically" updated the data.
Second phase: perform the actual update
read the "planned-update" node and apply the changes it describes.
If a change fails, verify that the new value is present.
If it's not, you have a major problem. Bail out.
delete the committed-update node
delete the update-plan node
delete the update-lock node
If you want to read consistent data:
While there is no committed-update node, your data are OK.
Otherwise, wait for it to get deleted.
Whenever committed-update is present but update-lock is not, initiate recovery.
Transaction recovery, if you find an update-plan node without a lock:
Get the update-lock.
if there is no committed-update node, delete the plan and release the lock.
Otherwise, continue at "Second phase", above.
IMHO, as etcd is built upon HTTP which is inherently an unsecure protocol, it will be very hard to have a bullet proof solution.
Classical SQL databases use connected protocols, transactions and journalisation to allow users to make sure that a transaction as a whole will be either fully committed or fully rollbacked, even in worst case of power outage in the middle of the operation.
So if 2 operations depend on each other (money transfert from one bank account to the other) you can make sure that either both are ok or none, and you can simply implement in the database a journal of "operations" with their status to be able to later see if a particuliar one was passed by consulting the journal, even if you were disconnected in the middle of the commit.
But I simply cannot imagine such a solution for etcd. So unless someone else finds a better way, you are left with two options
use a classical SQL database in the backend, using etcd (or equivalent) as a simple cache
accept the weaknesses of the protocol
BTW, I do not think that timestamp in lieue of long version number will strengthen the system, because in high load, the probability that two client transaction use same timestamp increases. Maybe you could try to add a unique id (client id or just technical uuid) to your fields, and when version is n+1 just compare the UUID that increased it : if it is yours, the transaction passed if not id did not.
But the really worse problem would arise if at the moment you can read the version, it is not at n+1 but already at n+2. If UUID is yours, you are sure your transaction passed, but if it is not nobody can say.